Data Management Planning
The archive process begins with data management planning, a preliminary stage for data providers to research user accessibility requirements, identify suitable data submission methods, and provide documentation. Scientists should review the NOAA Data Management Planning Resources before planning a data collection expedition to familiarize themselves with the best resources available. NCEI and its partners and affiliates offer researchers support and assistance throughout the Data Management lifecycle.
You can contact firstname.lastname@example.org at any point of the data management lifecycle. Please share specific archival requirements at the start of the project. Discussions with an NCEI Subject Matter Expert can help you in the process of understanding requirements and data management, including metadata creation or guidance on data formatting among other archive requirements and data management best practices.
Submitting Your Data
If you are already working with an NCEI Subject Matter Expert, discuss data submission methods with them before beginning the submission process.
NCEI has two primary web-based archive request options based on data volume and delivery frequency:
- Use Send2NCEI to submit non-repeating or single delivery data less than 20 GB. See S2N Data Submission Guidelines for more details.
- Use ATRAC (Advanced Tracking and Resource Tool for Archive Collections) to submit repeating or multiple delivery data, or data that exceeds 20 GB. See ATRAC Archiving Guidelines for more details.
For more information on the specifications of these two options or questions about which system to use, contact email@example.com.
Preferred File Formats
NCEI prefers open file formats maintained by a standards organization rather than proprietary or product-specific options. Self-describing formats like netCDF and HDF include valuable metadata that support data compatibility and long-term information preservation. PDF or PDF for archiving (PDF/A) are preferred for archived documents. Above all, choose a format that accommodates characteristics of the data being encoded. Additionally, consider the format’s long-term (decades into the future) compatibility with data access software and infrastructure. For example, tabular data in comma-separated UTF-8 or ASCII files are much more likely to be accessible to future operating systems than proprietary spreadsheet formats (e.g., Lotus 1-2-3).
NCEI recommends using the National Archives and Records Administration (NARA) transfer formats for digital data. Refer to the Library of Congress Sustainability of Digital Formats page when considering formats outside of the NARA list. All submissions are subject to review, and may need to undergo changes to format, filename, packaging, and other criteria to meet system or preservation requirements before being approved.
File Naming Conventions
The names of files submitted to NCEI must:
- Be human readable
- Exclude special characters (such as punctuation and symbols)
- Use appropriate file format extensions
Additional file naming and packaging rules may apply for data based on the submission pathway, or data that must adhere to specific program guidelines. NCEI personnel are available to review and provide feedback on files as needed.
An archive requires sufficient information to read, understand and characterize data holdings in accordance with documentation standards. This documentation contextualizes data, allowing current and future users to:
- Read the encoded format including descriptions of the data variables/content
- Understand data lineage, such as how it was created
- Characterize data quality to assess its usability
- Identify and distinguish data
- Trust the data integrity
Common supporting documents include user guides, format specifications, algorithm theoretical basis documents, and read-me files. Other forms of documentation may be appropriate for archiving to support data stewardship. NCEI may request additional documentation in order to meet these requirements.
The submitter is responsible for ensuring that documents comply with Federal Section 508 accessibility requirements. The NOAA Central Library Section 508 compliance page has many resources, such as how to create 508 compliant documents and tools for checking accessibility.
NCEI requires standardized metadata for every collection or dataset to support discoverability, understandability and interoperability. This documentation includes a standard description, instructions for accessing and citing the data, and other helpful or pertinent details. Collection metadata creation is supported through the various data submission methods.
Data Discovery and Access
NCEI has a number of search and visualization tools that allow users to find your data using custom queries that pull from the entire archive, as well as individual products and datasets. Following the archive process early in your data gathering and collection efforts can optimize its discoverability with these systems.
Data Identification and Attribution
NCEI may mint Digital Object Identifiers (DOIs) for data held in its archive in accordance with the NOAA Data Citation Procedural Directive. Once minted, a DOI provides a unique and persistent identifier that allows users to accurately cite and locate data obtained from NCEI. Data without an assigned DOI can be cited, but may not have a persistent URI or have citation metadata published in the DataCite discovery service. See FAQs for more information.
Data stewarded by NCEI must include metadata about provenance, authenticity, the technical environment necessary to use the data object, what preservation actions have been taken, and what intellectual property rights apply to the data object. To provide long term preservation for digital and analog data holdings, NCEI follows guidance from many best practice communities to ensure that data are accurate and authentic for the long term.
As mandated by Federal regulations, data maintained by NCEI are scheduled in accordance with the appropriate NOAA Records Schedules. These schedules identify many records series which indicate the requirements for how long those records (data and metadata) must be retained, and other conditions for retention and preservation. Each records series must be approved by the National Archives and Records Administration (NARA).
To support and ensure data preservation, NCEI routinely performs regular media migration, data integrity checks (e.g., comparing cryptographic hash values prior to and following file movement between storage media), daily and weekly media content backups, and other practices to ensure access to data objects. NCEI follows guidance from NARA for backup processes, storage media, etc.
How does NCEI decide if my data will be archived?
NCEI generally follows the guidance outlined in the NOAA Procedure for Scientific Records Appraisal and Archive Approval (2008). NCEI also follows a policy for receiving and using non-NOAA data issued by our parent organization, the NOAA National Environmental Satellite, Data, and Information Service (NESDIS). This policy establishes criteria to authenticate incoming non-NOAA data, ensure consistency and sustainability, and maintain security. Much of the appraisal process is based on the information you provide to NCEI before submitting your data. NCEI considers many factors, including (but not limited to)...
- What type of data are you asking NCEI to archive? For broad categories of types of data acquired by NCEI, please refer to the NCEI Archive Collecting Policy.
- Are you archiving original observation data and metadata, a "data product" developed from other data sources, or both? If a "data product", are the original observation data archived elsewhere and if so, where?
- Is this a one-time submission of a relatively small data product (less than about 20GB total) or would your product require a frequently-updated or repeating submission of new/revised data? This helps NCEI advise about the most appropriate data submission process for your data.
- How large (in MB/GB/TB/PB) is the data product and/or data that you are asking NCEI to archive?
- Do you use a consistent file naming convention for the files in your data? Please describe your file naming convention.
- What format is used to represent your distribution analysis product? NCEI prefers data in formats that do not require specific proprietary software for re-use.
- Do you have descriptive metadata for these data and/or product(s)? NCEI uses the ISO 19115 family of metadata standards, with NASA Global Change Master Directory (GCMD) keywords, to describe NCEI archival data holdings
- Would you want or require additional services to discover, access, visualize, or distribute your analysis product or would the standard suite of NCEI discovery and access tools (e.g., https, ftp, THREDDS, Live Access Server) be sufficient?
How should I submit my existing relational database (RDBMS) to NCEI for archiving?
Relational database management systems (RDBMS) present many unique challenges for long-term preservation and access. You are encouraged to contact firstname.lastname@example.org prior to submitting data in an RDBMS to determine what is the best solution for your RDBMS data.
What are the acceptable file formats for data archived at NCEI?
There are many "acceptable" file formats. There is no 'one size fits all' format, so it is best to use a standards-based, community-recognized, non-software specific file format for encoding your data. The NCEI list of preferred formats, which are based on guidance by the National Archives and Records Administration and the Library of Congress.
An 'ideal' format is:
- Not software specific (i.e., can only be used by a single, specific program or application),
- Based on well-documented, widely recognized national or international standards format specifications, and
- Suitable for the type of data that are encoded in the format.
Can I get a Digital Object Identifier (DOI) for data at NCEI?
Please notify NCEI while preparing your data for archiving that you would like to obtain a DOI. NCEI follows the guidance in the NOAA Data Citation Procedural Directive and will work with you to assign a DOI, as necessary.
Does NCEI accept "DNA data"?
Not at this time due to the archive storage resource constraints. NCEI is reviewing options to better support these data. NCEI may accept summarizations of genetic information, but contact email@example.com prior to submitting genetics data.
Can NCEI embargo access to data I submit until my paper is published?
Maybe. NCEI follows the guidance in the NOAA Data Access Procedural Directive for providing access to data provided by federal staff. NCEI follows the guidance in the NOAA Data and Publication Sharing Directive for NOAA Grants, Cooperative Agreements, and Contracts for providing access to data provided by non-federal staff.
Does NCEI archive software that I developed for use with my data?
NCEI does not currently archive software, scripts or other model/product generating technologies that was not written by NCEI staff.
|Frequently Used Formats|
|Generic Data Types||Preferred Format(s)||Accepted Format(s)||Comment(s)|
Comma Separated Values (.csv), Tab Separated Values (.tsv), netCDF (.nc), OpenOffice Calc (.ods)
ASCII or UTF/UTF8/UNICODE encoding; See additional guidance for using NetCDF for many kinds of observation data (i.e, profiles, time series, trajectories, etc.).
Geospatial Information Systems (GIS)
GeoTIFF, OpenGIS GML, HDF5, NetCDF
ESRI shapefiles (.shp, .dbf, .shx, .cpg, .prj, .sbx), ESRI GridFloat Output
Portable Document Format/Archival (.pdf), OpenOffice Document (.odt), ASCII (.txt)
Portable Document Format (.pdf), MicroSoft Word (.doc, .docx)
Text documents must meet NOAA accessibility guidelines for compliance with Section 508
ISO 19139 (.xml) Geospatial Metadata
FGDC Content Standard for Digital Geospatial Metadata (CSDGM), NASA Directory Interchange Format (DIF)
the preferred representation of metadata in these standard formats is to be in eXtensible Markup Language (xml)
To be determined
MPEG-2, MPEG-4, MPEG-2000
SVS, JPEG, BIL, HDF5, PNG
GRIdded Binary or General Regularly-distributed Information in Binary (GRIB), Binary Universal Form for the Representation of meteorological data (BUFR)
Polygon File Format (.ply) for 3-D data representations