Due to consolidation, change ensures reliability
A clear, consistent methodology for measuring data volumes is critical to managing a digital archive and provides a consistent basis for long-term capacity forecasting and storage planning. To ensure appropriate planning for and reporting of its archiving systems, NCEI recently changed the way it calculates archive data volumes. Implemented October 1, 2019, this new calculation method provides volume metrics that are more objective, traceable, and repeatable across NCEI.
Consolidation of Centers Led to the New Reporting Method
Prior to 2015, NOAA’s geophysical, climatic, coastal, and oceanic data archives were maintained by separate data centers located across the United States. As separate entities using different archiving systems, each center was responsible for reporting its data archive volume. As such, each location developed its own reporting method.
In 2015, the data centers were consolidated into one entity—NCEI. One of NCEI’s chief operating principles is consistency in “data stewardship tools and practices across all science disciplines,” so the location-based reporting methods were reviewed.
For example, one location might report holdings in terms of uncompressed data using binary (base-2) numbers, whereas another might favor decimals (base-10) numbers and report volume of compressed data. Recognizing these differences, NCEI resolved to apply a consistent methodology across the centers.
After researching internal reporting methods and consulting with other scientific archives with large data holdings, such as NASA and USGS, a system-based data volume was selected, one that called for:
A base-10 numbering system;
Reporting of as-archived volumes based on inventories of archival systems;
No use of “standard values” or adjustments to estimate uncompressed data volume;
Reporting of only the archived volume, not data stored elsewhere;
Not including backup data in the count; and
Inclusion of both the primary and secure copy of data, since both are actively stewarded.
The new method was successfully implemented October 1, 2019. Initial volume reports proved to be objective, traceable, and repeatable, although there was an expected one-time total reported volume reduction from 38 petabytes to 32.6 petabytes, due to volume compression and removal of extra data in the reports. NCEI is achieving its goal of consistency in data stewardship tools and practices across all science disciplines.