Data Management and Documentation Plan


This document describes datastream documentation requirements and standard formatting and naming protocols for both data users and the infrastructure who produce the data.


Baseline Change Request (BCR):
Used by the ARM Infrastructure as a process to provide configuration control and for formally requesting and documenting changes within the ARM Infrastructure.
Data Object Description (DOD):
The basic information, definitions, and metadata required to process "raw" measurement data into netCDF files. The DOD becomes the header of the ARM netCDF files.
Data Stream:
A time sequenced series of like data files.
Often described as "information or data about the data." Typically refers to information about primary data, which is usually numerical, or information describing aspects of the primary data. Such information could include, instrument site information, environmental conditions under which the data were acquired, and any other data needed to understand the primary data.
Near-Real Time:
When referred to in textual references, the ARM conception of "near-real time" is "with a few hours delay."
Quality Assured Data:
Typically the final form of data to be submitted to the ARM data system. This includes data stream description documentation, fully calibrated data in commonly used geophysical units, quality flagged data files and all ancillary data (metadata) needed by a future user of the data stream to make full sense of it.
Quality Measurement Experiment (QME):
The regular intercomparison of two or more data sets intended to understand the individual data streams either as functions of the performance of an instrument or the accuracy of a model prediction.
Value-Added Product (VAP):
A new data stream generated by applying an algorithm or other transform to existing data.

Data Documentation Requirements

For all new data streams, measurements, VAPs, QMEs, and data reprocessing, several steps are required before approval as an addition to the ARM baseline:

  • A programmatic statement supporting the priority of adding or revising the ARM data product.
  • An assessment of applicability across ARM, either site specific or ARM-wide.
  • An assessment of computing infrastructure impact (e.g., big files or lots of files etc).
  • A new or revised DOD.
  • A full description of the instrument (or system), VAP technique, QME technique, or other method that results in the origination of or changes an existing data product (i.e., file content or name).
  • A valid ARM file name (description in the following section).
  • A data stream description.
  • A clear definition of new or revised "scientifically relevant" measurement names for inclusion in the relevant Archive Manager databases.
  • Information required to complete the expectations database.
  • An algorithm to allow individual data points to be assessed for quality and appropriately flagged or identified, and the "data color" algorithm.
  • A quick-look algorithm.
  • A Baseline Change Request (BCR).

Data Formatting and Naming Protocols

File Type/Format

NetCDF is the preferred data format because it supports efficient data storage and reliable/robust documentation of the data structure. More information about netCDF is available at ASCII and HDF formats are used for some "External Data Products." When using ASCII, a description of the file structure and its proposed documentation should be reviewed and approved by the External Data Center (XDC) and/or Archive data managers. HDF is the standard for most satellite data. More information about HDF is available at and

File Naming Conventions

Processed Data

An example netCDF data file name is depicted below:

The sgp5mwravgB4.c1.20040706.020415.cdf file contains 5-minute averaged microwave radiometer data from the Southern Great Plains Vici site from July 6, 2004. The data level is "c1" indicating the data was derived or calculated via Value-Added Processing (see Data Levels).

ARM netCDF files shall be named according to the following naming convention:



is the site identifier (e.g., sgp, twp, nsa)
is the data integration period (e.g., 1, 5, 15, 30, 1440)
is the instrument abbreviation (e.g., mwr, wsi, mpl)
is an optional qualifier that distinguishes these data from other data sets produced by the same instrument
is the facility designation (e.g., C1, E13, B4)
is the data level (e.g., a0, a1, b1, c1)

The length constraints are:
sss: 3 characters
Fn: 2 or 3 characters
dl: 2 characters

(sss)(nn)(inst)(qqq)(Fn).(dl): MUST be 33 characters or less.

"The TOTAL length of a filename sent to the Archive MUST be 61 characters or less."

Raw Data

Raw data files shall be named according to the following naming convention:



is the site identifier (e.g., sgp, twp, nsa)
is the base instrument abbreviation (e.g., mwr, wsi, mpl) [as with the processed data above]
is the facility designation (e.g., C1, E13, B4)
is the original raw data file name produced on the instrument

An example raw data file name is:


This file is from the North Slope of Alaska Barrow site. It contains raw microwave radiometer data for November 9, 2002, for the hour beginning 140000. Most raw instrument data are collected hourly resulting in 24 raw data files per day. These files are bundled into daily tar files before archival.

Tar bundles shall be named according to the following naming convention:



is the site identifier (e.g., sgp, twp, nsa)
is the base instrument abbreviation (e.g., mwr, wsi, mpl)
is the facility designation (e.g., C1, E13, B4)
is the extension from the original raw data file name, usually the format of the file or an instrument serial number.

The example raw file shown above will be archived in a tar bundle named


Guidelines for Original Raw File Naming

When possible, the original file name produced on the instrument or instrument data system should contain adequate information to determine the origin of the file including:

  • unique site/facility indicator
  • hhmmss, hhmm, or sequence number if more than one raw file per day
  • minimal indication of instrument type.

Under constraints of 8.3, it is probably not possible to include all this information. In these instances, it is important to include adequate header information inside the file to permit the user to determine the source/origin data and provide a reference date (including year) and time.

Data names are case sensitive. xxxxxx.DAT and xxxxxx.dat may be interpreted as two different names by ingests and bundling routines. Instruments should be consistent in the way the original file names are assigned, including case.

Other Data Formats

Processed ARM data may be stored in a format other than netCDF. The basic naming convention for processed files will not change, but the final extension will change accordingly:

ASCII data format
HDF data format (limited to satellite data)
PNG data format (standard ARM image format)
MPG data format (standard ARM movie format)

Other data formats (e.g., gifs, jpg) may also exist, but are not recommended for future development.

Data Levels

Data levels are based on the "level of processing" with the lowest level of data being designated as raw or "00" data. Each subsequent data level has minimum requirements and data level is not increased until ALL those requirements of that level as well as the requirements of all data levels below that level have been met.

raw data - primary raw data stream collected directly from instrument
raw data - redundant data stream or sneakernet data
converted to netCDF
calibration factors applied and converted to geophysical units
a2... to a9
further processing on a1 level data that does not merit b1 classification
QC checks applied to measurements
b2... to b9
further processing on b1 level data that does not merit c1 classification
intermediate value-added data product; this data level is always used as input to a higher level "VAP"
derived or calculated value-added data product (VAP) using one or more measured or modeled data (a0 to c1) as input
c2... to c9
further processing applied to a "c1" level data stream
summary file consisting of a subset of the parent .c1 file with simplified QC and known 'bad' values set to missing
summary file consisting of a further - processed s1 data.


  1. Not every data level need be produced for each instrument data set. For example, if conversion to netCDF and calibration and engineering units are applied in a single processing step, no "a0" data product would be produced.
  2. Data level .cN is restricted to data derived or calculated through value-added processing.

Graphic Data Formats

For formatted documents and graphics-rich documents, PDF file type is standard. For photographs, drawings, sketches, and data plots, PNG file type is standard. For movies, MPG file type is standard.

File Duration

To control the number of small files and to help facilitate the use of ARM data, the suggested file period is 24 hours. Very large data sets may be routinely split into two or more netCDF files per day to increase usability. Infrequently, daily data files may be split into two files when the global header information changes as a result of a maintenance action (e.g., instrument serial number or calibration change).

Measurement Metadata and Standard Measurement Names

A scientifically relevant "measurement description" is a structured description of a data stream; the description addresses why the data stream exists. Data streams also contain other information that is important in understanding or interpreting the data stream but are not considered significant for naming purposes. Examples include global information, such as location; calibration procedural information; QC checks and flags. If relevant, other instrument details can be included:

  • Orientation: downwelling, upwelling, or dependent on installation.
  • Key information to characterize the measurement (e.g., diffuse or direct).
  • Characterization of the spectra: number of spectra and over what range of wavelengths in nm.
  • Relative position or location (e.g., height).
  • Time interval information (e.g., averaging time and measurement intervals).
  • The instrument used for the measurement (occasionally important especially if it comes from a data stream containing results from several instruments).
  • An indication that the data is a best estimate data or a calculated value data stream. Unless indicated otherwise, it is implicit that the measurement is observed.