Archiving Guidelines

Below is a summary of the procedures and requirements that apply to archiving projects. Please read these guidelines prior to engaging NCEI with an archive request. Note that other requirements specific to associated programs or data themes may apply.

Plan Ahead

There are several factors that influence project schedule: the size and complexity of the data, level of support requested, existing connections with the data provider and providing system, compliance with project requirements, other project dependencies, etc. It is important that potential data providers contact NCEI early in the planning phase to ensure adequate time for archive planning and preparation. End-to-end management of the data for archiving should be planned at the start of research and considered throughout the project lifecycle. Including archiving in the project management plan is a critical step for ensuring adequate preparation for long-term archive support. Coordinating with NCEI during data development also allows opportunities for feedback to improve the data and metadata for archiving.

Selection of the appropriate data theme should be based on the type of data to be archived. See the NCEI website for more information. If unsure about what your data theme should be, NCEI will work with you to determine the appropriate theme.

Process

In general, there are two gates to pass through in sequence to get data into the NCEI archive. These are 1) the archive appraisal and approval step, and 2) finalizing the data submission agreement. The data provider assists with these activities by providing the necessary information and by reviewing the archive-generated documents. Submitting an archive request to NCEI initiates the archive appraisal and the involvement of NCEI. (ATRAC provides an archive request form for registered projects under Edit Projects.) Information provided in the request and follow-on conversations are used by NCEI to assess the archive value, feasibility and costs. NCEI's decision on supporting the data is based on an informed recommendation and is documented in a formal approval or disapproval communication from the archive.

For approved archive projects in ATRAC, the data provider and NCEI negotiate the details of the data model and the transfer logistics in a data submission agreement. The submission agreement is a charter for both the provider and NCEI on how the project will proceed during the operational transfer and through the end of archive support. A finalized submission agreement acts as the second and final gate to the archive. Even with a finalized submission agreement, providers are expected to maintain communications with NCEI through the data submission and, if possible, through the life of the data archive.

Documentation and Metadata

The NCEI archive requires sufficient information to read, understand and characterize the data in accordance with documentation standards. This documentation is crucial for the ability to use and understand the data independent of external assistance. Data documentation, at a minimum, should allow users to:

  • read the encoded data format
  • understand the data lineage
  • characterize the data quality
  • uniquely identify the data
  • and trust the data integrity

Metadata on data quality and lineage must be included with the data files. Information on the data lineage back to the source observational data is required for users who want to understand how the data were produced and what inputs were used.

Providers must also supply information sufficient for producing ISO standard metadata for the data collection(s). NCEI archive representatives can provide guidance for this effort, and the metadata form in ATRAC can be used to collect and initiate this type of metadata. Additional requirements that vary by project may include documentation on the algorithms, production source code (if provided), and encoding format for any non-standard file formats.

File Format Standards

Open file formats maintained by a standards organization are strongly recommended as opposed to proprietary or product-specific formats. Self-describing file formats, like netCDF and HDF, include valuable metadata that support data compatibility and long-term information preservation. Properties of netCDF files should follow the Climate and Forecast (CF) Conventions and the Attribute Conventions for Dataset Discovery. PDF or PDF for archiving (PDF/A) is the preferred file format for archived documents.

A common file naming convention is normally required for archived files. File name fields for file identification should include:

  • data type identifier
  • data version identifier
  • unique date/time stamp
  • appropriate file format extension
  • other applicable fields such as data source

All file name fields except for the file extension should be delimited by underscores '_'. The order of the fields in the file name should begin with the most static and end with the most dynamic. For example, a file name may begin with the less changing data type identifier field and end with the more changing time stamp field.

Data files may be aggregated and compressed in archive files for storage depending on the number of files, data volume, and other factors. Sets of files should be organized by common characteristics, including data type, format and temporal coverage. Data are stored as tar files cannot contain subdirectories or other tar files. Also, a README describing tar file contents should be included inside tar files with multiple file types though it is not needed for tar files with homogeneous content. An inventory or sample of the expected files can help NCEI archive representatives assess the most appropriate file names and data organization. Data packaging is usually discussed during the negotiation of the submission agreement.

Data Transfer

An FTP pull or SFTP push is preferred for most data transfers to NCEI. Other transfer protocols may be possible depending on the interface. Data providers may be required to produce and deliver a 32-digit MD5 checksum value for each submitted file in a submission manifest to ensure the integrity of the data received by the NCEI archive. The format of the submitted checksums is discussed through the submission agreement.

More Information