NCEI receives data from many different sources, including individual researchers, other government organizations, and research institutions from all over the world. All the data submitted to the NCEI archive are subject to one of the following quality control procedures:
Data from NOAA scientists and contractors are subject to the NOAA Procedural Directives that provide guidelines for Data Management Planning (archiving responsibilities), Data Documentation (metadata requirements), Data Access (discoverability), and Data Citation (credit and citations).
NOAA Funded Projects
Data providers who received NOAA funding through a grant, cooperative agreement, or contract should review project-specific guidelines for data archive responsibilities and related peer-reviewed publications. Those guidelines can be found in the Data and Publication Sharing Directive for NOAA Grants, Cooperative Agreements, and Contracts.
Non-NOAA Funded Data
Submitting Your Data
NCEI follows a robust, well defined archive process to preserve data for future use. We accept data, metadata, and products that add value to the core NOAA data collection, and contribute to the long-term preservation of the global environmental data record.
Data Submission Format Guidance
NCEI prefers open file formats maintained by a standards organization rather than proprietary or product-specific options. Self-describing formats like netCDF and HDF include valuable metadata that support data compatibility and long-term information preservation. PDF or PDF for archiving (PDF/A) are preferred for archived documents. Above all, choose a format that accommodates characteristics of the data being encoded. Additionally, consider the format’s long-term (decades into the future) compatibility with data access software and infrastructure. For example, tabular data in comma-separated UTF-8 or ASCII files are much more likely to be accessible to future operating systems than proprietary spreadsheet formats (e.g., Lotus 1-2-3).
NCEI recommends using the National Archives and Records Administration (NARA) transfer formats for digital data. Refer to the Library of Congress Sustainability of Digital Formats page when considering formats outside of the NARA list. All submissions are subject to review, and may need to undergo changes to format, filename, packaging, and other criteria to meet system or preservation requirements before being approved.
Frequently Used Formats
|Generic Data Types||Preferred Format(s)||Comment(s)|
|Tabular Data||ASCII or UTF/UTF8/UNICODE encoding; See additional NetCDF guidance for many kinds of observation data (i.e, profiles, time series, trajectories, etc.)|
|Geospatial Information Systems (GIS)|
|Text||Text documents must meet NOAA accessibility guidelines for compliance with Section 508|
|Metadata||ISO 191xxx (series of international Geographic Metadata) (.xml)||The preferred representation of metadata in these standard formats is eXtensible Markup Language (xml)|
|Moving Images||Please contact firstname.lastname@example.org before submitting video data. Be sure to assess potential copyright and privacy issues before archiving.|
File Naming Conventions
The names of files submitted to NCEI must:
- Be human readable
- Exclude special characters (such as punctuation and symbols)
- Use appropriate file format extensions
Additional file naming and packaging rules may apply for data based on the submission pathway, or data that must adhere to specific program guidelines. NCEI personnel are available to review and provide feedback on files as needed.
An archive requires sufficient information to read, understand and characterize data holdings in accordance with documentation standards. This documentation contextualizes data, allowing current and future users to:
- Read the encoded format including descriptions of the data variables/content
- Understand data lineage, such as how it was created
- Characterize data quality to assess its usability
- Identify and distinguish data
- Trust the data integrity
Common supporting documents include user guides, format specifications, algorithm theoretical basis documents, and read-me files. Other forms of documentation may be appropriate for archiving to support data stewardship. NCEI may request additional documentation in order to meet these requirements.
The submitter is responsible for ensuring that documents comply with Federal Section 508 accessibility requirements. The NOAA Central Library Section 508 compliance page has many resources, such as how to create 508 compliant documents and tools for checking accessibility.
NCEI requires standardized metadata for every collection or dataset to support discoverability, understandability and interoperability. This documentation includes a standard description, instructions for accessing and citing the data, and other helpful or pertinent details. Collection metadata creation is supported through the various data submission methods.
Data Discovery and Access
NCEI has a number of search and visualization tools that allow users to find data using custom queries that pull from the entire archive, as well as individual products and datasets. Following the archive process early in your data gathering and collection efforts can optimize its discoverability with these systems.
Data Identification and Attribution Using DOIs
NCEI may mint Digital Object Identifiers (DOIs) for data held in its archive in accordance with the NOAA Data Citation Procedural Directive. Once minted, a DOI provides a unique and persistent identifier that allows users to accurately cite and locate data obtained from NCEI. Data without an assigned DOI can be cited, but may not have a persistent URI or have citation metadata published in the DataCite discovery service. See FAQs for more information.
Data stewarded by NCEI must include metadata about provenance, authenticity, the technical environment necessary to use the data object, what preservation actions have been taken, and what intellectual property rights apply to the data object. To provide long term preservation for digital and analog data holdings, NCEI follows guidance from many best practice communities to ensure that data are accurate and authentic for the long term.
As mandated by Federal regulations, data maintained by NCEI are scheduled in accordance with the appropriate NOAA Records Schedules. These schedules identify many records series which indicate the requirements for how long those records (data and metadata) must be retained, and other conditions for retention and preservation. Each records series must be approved by the National Archives and Records Administration (NARA).
To support and ensure data preservation, NCEI routinely performs regular media migration, data integrity checks (e.g., comparing cryptographic hash values prior to and following file movement between storage media), daily and weekly media content backups, and other practices to ensure access to data objects. NCEI follows guidance from NARA for backup processes, storage media, etc.
- Why should I archive my data?
- The goal of data preservation is to ensure that your data are independently understandable for future use and re-use. To assist you, NCEI follows a well defined archive process based on the NOAA Procedural Directives that contextualizes data and documentation to ensure that they remain valuable and functional for as long as possible. Because all data acquired, observed, processed or recovered by NOAA is subject to these guidelines, it’s important to incorporate them early in a project’s lifecycle to ensure data compliance and long term viability.
- How does NCEI decide if my data will be archived?
NCEI generally follows the guidance outlined in the NOAA Procedure for Scientific Records Appraisal and Archive Approval (2008). NCEI also follows a policy for receiving and using non-NOAA data issued by our parent organization, the NOAA National Environmental Satellite, Data, and Information Service (NESDIS). This policy establishes criteria to authenticate incoming non-NOAA data, ensure consistency and sustainability, and maintain security. Much of the appraisal process is based on the information you provide to NCEI before submitting data. NCEI considers many factors, including (but not limited to)...
- What type of data are you asking NCEI to archive? For broad categories of types of data acquired by NCEI, please refer to the NCEI Archive Collecting Policy.
- Are you archiving original observation data and metadata, a "data product" developed from other data sources, or both? If a "data product", are the original observation data archived elsewhere and if so, where?
- Is this a one-time submission of a relatively small data product (less than about 20GB total) or would your product require a frequently-updated or repeating submission of new/revised data? This helps NCEI advise about the most appropriate data submission process for your data.
- How large (in MB/GB/TB/PB) is the data product and/or data that you are asking NCEI to archive?
- Do you use a consistent file naming convention for the files in your data? Please describe your file naming convention.
- What format is used to represent your distribution analysis product? NCEI prefers data in formats that do not require specific proprietary software for re-use.
- Do you have descriptive metadata for these data and/or product(s)? NCEI uses the ISO 19115 family of metadata standards, with NASA Global Change Master Directory (GCMD) keywords, to describe NCEI archival data holdings.
- Would you want or require additional services to discover, access, visualize, or distribute your analysis product or would the standard suite of NCEI discovery and access tools (e.g., https, ftp, THREDDS, Live Access Server) be sufficient?
- How should I submit my existing relational database (RDBMS) to NCEI for archiving?
- Relational database management systems (RDBMS) present many unique challenges for long-term preservation and access. You are encouraged to contact email@example.com prior to submitting data in an RDBMS to determine what is the best solution for your RDBMS data.
- What are the acceptable file formats for data archived at NCEI?
There are many "acceptable” file formats. There is no 'one size fits all' format, so it is best to use a standards-based, community-recognized, non-software specific file format for encoding your data. The NCEI list of preferred formats, which are based on guidance by the National Archives and Records Administration and the Library of Congress.
An 'ideal' format is:
- Not software specific (i.e., can only be used by a single, specific program or application),
- Based on well-documented national or international standards based format specifications
- Suitable for the type of data that are encoded in the format.
- Can I get a Digital Object Identifier (DOI) for data at NCEI?
- Please notify NCEI while preparing your data for archiving that you would like to obtain a DOI. Your data must be archived with NCEI in order to be eligible for a NOAA Data DOI assignment. NCEI follows the guidance in the NOAA Data Citation Procedural Directive and will work with you to assign a DOI, as necessary.
- Does NCEI accept "DNA data"?
- Not at this time due to the archive storage resource constraints. NCEI is reviewing options to better support these data. NCEI may accept summarizations of genetic information, but contact firstname.lastname@example.org prior to submitting genetics data.
- Can NCEI embargo access to data I submit until my paper is published?
- Maybe. NCEI follows the guidance in the NOAA Data Access Procedural Directive for providing access to data provided by federal staff. NCEI follows the guidance in the NOAA Data and Publication Sharing Directive for NOAA Grants, Cooperative Agreements, and Contracts for providing access to data provided by non-federal staff.
- Does NCEI archive software that I developed for use with my data?
- NCEI does not currently archive software, scripts or other model/product generating technologies that was not written by NCEI staff.
- What is “Science Ready Data”?
- NCEI refers to 'science ready data' as data that have been processed from engineering units to units used in scientific analyses (e.g., voltages measuring the electrical conductivity of seawater would be engineering units, but salinity calculated from the voltages would be the 'science ready data'). Satellite data are often referred to as 'Level 0', 'Level 1', etc., so 'Level 0' are the initial values measured by the satellite based instrument and sent to Earth, where specific algorithms are applied to transform those 'engineering units' into 'science ready data'.