CDF-2 file format specification (PnetCDF C Interface Guide)

Appendix F CDF-2 file format specification

The netCDF 64-bit offset format differs from the classic format only in

the VERSION byte, ‘\x02’ instead of ‘\x01’, and
the OFFSET entity, a 64-bit instead of a 32-bit offset from the beginning of the file.

This small format change permits much larger files, but there are still some practical size restrictions. Each fixed-size variable and the data for one record’s worth of each record variable are still limited in size to a little less that 4 GiB. The rationale for this limitation is to permit aggregate access to all the data in a netCDF variable (or a record’s worth of data) on 32-bit platforms.

The specification of CDF classic file format (referred as CDF-1) is shown in black font.
The new features added in CDF-2 specification are colored in red. CDF-2 backward supports all specification of CDF-1.
(Note that "\x" used in this specification indicates the next two characters represent hexadecimal digits. Two hexadecimal digits, each of them 4 bits, make a byte.)

netcdf_file = header data header = magic numrecs dim_list gatt_list var_list magic = 'C' 'D' 'F' VERSION VERSION = \x01 | //classic format (CDF-1) \x02 //64-bit offset format (CDF-2) numrecs = NON_NEG | STREAMING //length of record dimension dim_list = ABSENT | NC_DIMENSION nelems [dim ...] gatt_list = att_list //global attributes att_list = ABSENT | NC_ATTRIBUTE nelems [attr ...] var_list = ABSENT | NC_VARIABLE nelems [var ...] ABSENT = ZERO ZERO //list is not present (CDF-1 and CDF-2) ZERO = \x00 \x00 \x00 \x00 //32-bit zero NC_DIMENSION = \x00 \x00 \x00 \x0A //tag for list of dimensions NC_VARIABLE = \x00 \x00 \x00 \x0B //tag for list of variables NC_ATTRIBUTE = \x00 \x00 \x00 \x0C //tag for list of attributes nelems = NON_NEG //number of elements in following sequence dim = name dim_length name = nelems namestring //Names a dimension, variable, or attribute. //Names should match the regular expression //([a-zA-Z0-9_]|{MUTF8})([^\x00-\x1F/\x7F-\xFF]|{MUTF8})* //For other constraints, see "Note on names" below. namestring = ID1 [IDN ...] padding ID1 = alphanumeric | '_' IDN = alphanumeric | special1 | special2 alphanumeric = lowercase | uppercase | numeric | MUTF8 lowercase = 'a'|'b'|'c'|'d'|'e'|'f'|'g'|'h'|'i'|'j'|'k'|'l'|'m'| 'n'|'o'|'p'|'q'|'r'|'s'|'t'|'u'|'v'|'w'|'x'|'y'|'z' uppercase = 'A'|'B'|'C'|'D'|'E'|'F'|'G'|'H'|'I'|'J'|'K'|'L'|'M'| 'N'|'O'|'P'|'Q'|'R'|'S'|'T'|'U'|'V'|'W'|'X'|'Y'|'Z' numeric = '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9' special1 = '_'|'.'|'|''+'|'-' //special1 chars have traditionally been //permitted in CDF-1. special2 = ' ' | '!' | '"' | '#' | '$' | '%' | '&' | '\'' | //special2 chars are recently permitted in '(' | ')' | '*' | ',' | ':' | ';' | '<' | '=' | //names (and require escaping in CDL). '>' | '?' | '[' | '\\' | ']' | '^' | '`' | '{' | //Note: '/' is not permitted. '|' | '}' | '~' MUTF8 = <multibyte UTF-8 encoded> //NFC-normalized Unicode character dim_length = NON_NEG //If zero, this is the record dimension. //There can be at most one record dimension. attr = name nc_type nelems [values ...] nc_type = NC_BYTE | NC_CHAR | NC_SHORT | NC_INT | NC_FLOAT | NC_DOUBLE var = name nelems [dimid ...] vatt_list nc_type vsize begin //nelems is the dimensionality (rank) of the variable //0 for scalar, 1 for vector, 2 for matrix, ... dimid = NON_NEG //Dimension ID (index into dim_list) for variable //shape. We say this is a "record variable" if and only //if the first dimension is the record dimension. vatt_list = att_list //Variable-specific attributes vsize = NON_NEG //Variable size. If not a record variable, the amount //of space in bytes allocated to the variable's data. //For record variables, it is the amount of space per //record. See "Note on vsize", below. begin = OFFSET //Variable start location. The offset in bytes (seek //index) in the file of the beginning of data for this. //variable. data = non_recs recs non_recs = [vardata ...] //The data for all non-record variables, stored //contiguously for each variable, in the same order the //variables occur in the header. vardata = [values ...] //All data for a non-record variable, as a block of //values of the same type as the variable, in row-major //order (last dimension varying fastest). recs = [record ...] //The data for all record variables are stored //interleaved at the end of the file. record = [varslab ...] //Each record consists of the n-th slab from each //record variable, for example x[n,...], y[n,...], //z[n,...] where the first index is the record number, //which is the unlimited dimension index. varslab = [values ...] //One record of data for a variable, a block of values //all of the same type as the variable in row-major //order (last index varying fastest). values = bytes | chars | shorts | ints | floats | doubles string = nelems [chars] bytes = [BYTE ...] padding chars = [CHAR ...] padding shorts = [SHORT ...] padding ints = [INT ...] floats = [FLOAT ...] doubles = [DOUBLE ...] padding = <0, 1, 2, or 3 bytes to next 4-byte boundary> //Header padding uses null (\x00) bytes. In data, padding //uses variable's fill value. See "Note on padding" below //for a special case. NON_NEG = <non-negative INT> STREAMING = \xFF \xFF \xFF \xFF //Indicates indeterminate record count, allows streaming //data OFFSET = <non-negative INT> | //for classic format (CDF-1) <non-negative INT64> //for CDF-2 format BYTE = <8-bit byte> //See "Note on byte data" below. CHAR = <8-bit byte> //See "Note on char data" below. SHORT = <16-bit signed integer, Bigendian, two's complement> INT = <32-bit signed integer, Bigendian, two's complement> INT64 = <64-bit signed integer, Bigendian, two's complement> FLOAT = <32-bit IEEE single-precision float, Bigendian> DOUBLE = <64-bit IEEE double-precision float, Bigendian> //following type tags are 32-bit integers NC_BYTE = \x00 \x00 \x00 \x01 //8-bit signed integers NC_CHAR = \x00 \x00 \x00 \x02 //text characters NC_SHORT = \x00 \x00 \x00 \x03 //16-bit signed integers NC_INT = \x00 \x00 \x00 \x04 //32-bit signed integers NC_FLOAT = \x00 \x00 \x00 \x05 //IEEE single precision floats NC_DOUBLE = \x00 \x00 \x00 \x06 //IEEE double precision floats //Default fill values for each type, may be overridden by //variable attribute named `_FillValue'. See //"Note on fill values" below. FILL_CHAR = \x00 //((char)0) null byte FILL_BYTE = \x81 //(signed char) -127 FILL_SHORT = \x80 \x01 //(short) -32767 FILL_INT = \x80 \x00 \x00 \x01 //(int) -2147483647 FILL_FLOAT = \x7C \xF0 \x00 \x00 //(float) 9.9692099683868690e+36 FILL_DOUBLE = \x47 \x9E \x00 \x00 \x00 \x00 \x00 \x00 //(double)9.9692099683868690e+36

Note on vsize:This number is the product of the dimension lengths (omitting the record dimension) and the number of bytes per value (determined from the type), increased to the next multiple of 4, for each variable. If a record variable, this is the amount of space per record (except that, for backward compatibility, it always includes padding to the next multiple of 4 bytes, even in the exceptional case noted below under “Note on padding”). The netCDF “record size” is calculated as the sum of the vsize’s of all the record variables.

The vsize field is actually redundant, because its value may be computed from other information in the header. The 32-bit vsize field is not large enough to contain the size of variables that require more than 2^32 - 4 bytes, so 2^32 - 1 is used in the vsize field for such variables.

Note on names:Earlier versions of the netCDF C-library reference implementation enforced a more restricted set of characters in creating new names, but permitted reading names containing arbitrary bytes. This specification extends the permitted characters in names to include multi-byte UTF-8 encoded Unicode and additional printing characters from the US-ASCII alphabet. The first character of a name must be alphanumeric, a multi-byte UTF-8 character, or ’_’ (reserved for special names with meaning to implementations, such as the “_FillValue” attribute). Subsequent characters may also include printing special characters, except for ’/’ which is not allowed in names. Names that have trailing space characters are also not permitted.

Implementations of the netCDF classic and 64-bit offset format must ensure that names are normalized according to Unicode NFC normalization rules during encoding as UTF-8 for storing in the file header. This is necessary to ensure that gratuitous differences in the representation of Unicode names do not cause anomalies in comparing files and querying data objects by name.

Note on streaming data:The largest possible record count, 2^32 - 1, is reserved to indicate an indeterminate number of records. This means that the number of records in the file must be determined by other means, such as reading them or computing the current number of records from the file length and other information in the header. It also means that the numrecs field in the header will not be updated as records are added to the file. [This feature is not yet implemented].

Note on padding: In the special case when there is only one record variable and it is of type character, byte, or short, no padding is used between record slabs, so records after the first record do not necessarily start on four-byte boundaries. However, as noted above under “Note on vsize”, the vsize field is computed to include padding to the next multiple of 4 bytes. In this case, readers should ignore vsize and assume no padding. Writers should store vsize as if padding were included.

Note on byte data:It is possible to interpret byte data as either signed (-128 to 127) or unsigned (0 to 255). When reading byte data through an interface that converts it into another numeric type, the default interpretation is signed. There are various attribute conventions for specifying whether bytes represent signed or unsigned data, but no standard convention has been established. The variable attribute “_Unsigned” is reserved for this purpose in future implementations.

Note on char data:Although the characters used in netCDF names must be encoded as UTF-8, character data may use other encodings. The variable attribute “_Encoding” is reserved for this purpose in future implementations.

Note on fill values:Because data variables may be created before their values are written, and because values need not be written sequentially in a netCDF file, default “fill values” are defined for each type, for initializing data values before they are explicitly written. This makes it possible to detect reading values that were never written. The variable attribute “_FillValue”, if present, overrides the default fill value for a variable. If _FillValue is defined then it should be scalar and of the same type as the variable.

Fill values are not required, however, because netCDF libraries have traditionally supported a “no fill” mode when writing, omitting the initialization of variable values with fill values. This makes the creation of large files faster, but also eliminates the possibility of detecting the inadvertent reading of values that haven’t been written.

Notes on Computing File Offsets

The offset (position within the file) of a specified data value in a classic format or 64-bit offset data file is completely determined by the variable start location (the offset in the begin field), the external type of the variable (the nc_type field), and the dimension indices (one for each of the variable’s dimensions) of the value desired.

The external size in bytes of one data value for each possible netCDF type, denoted extsize below, is:

`NC_BYTE`	1
`NC_CHAR`	1
`NC_SHORT`	2
`NC_INT`	4
`NC_FLOAT`	4
`NC_DOUBLE`	8

The record size, denoted by recsize below, is the sum of the vsize fields of record variables (variables that use the unlimited dimension), using the actual value determined by dimension sizes and variable type in case the vsize field is too small for the variable size.

To compute the offset of a value relative to the beginning of a variable, it is helpful to precompute a “product vector” from the dimension lengths. Form the products of the dimension lengths for the variable from right to left, skipping the leftmost (record) dimension for record variables, and storing the results as the product vector for each variable.

For example:

Non-record variable:

        dimension lengths:      [  5  3  2 7]
        product vector:         [210 42 14 7]

Record variable:

        dimension lengths:      [0  2  9 4]
        product vector:         [0 72 36 4]

At this point, the leftmost product, when rounded up to the next multiple of 4, is the variable size, vsize, in the grammar above. For example, in the non-record variable above, the value of the vsize field is 212 (210 rounded up to a multiple of 4). For the record variable, the value of vsize is just 72, since this is already a multiple of 4.

Let coord be the array of coordinates (dimension indices, zero-based) of the desired data value. Then the offset of the value from the beginning of the file is just the file offset of the first data value of the desired variable (its begin field) added to the inner product of the coord and product vectors times the size, in bytes, of each datum for the variable. Finally, if the variable is a record variable, the product of the record number, ’coord[0]’, and the record size, recsize, is added to yield the final offset value.

A special case: Where there is exactly one record variable, we drop the requirement that each record be four-byte aligned, so in this case there is no record padding.

Examples

By using the grammar above, we can derive the smallest valid netCDF file, having no dimensions, no variables, no attributes, and hence, no data. A CDL representation of the empty netCDF file is

netcdf empty { }

This empty netCDF file has 32 bytes. It begins with the four-byte “magic number” that identifies it as a netCDF version 1 file: ‘C’, ‘D’, ‘F’, ‘\x01’. Following are seven 32-bit integer zeros representing the number of records, an empty list of dimensions, an empty list of global attributes, and an empty list of variables.

Below is an (edited) dump of the file produced using the Unix command

od -xcs empty.nc

Each 16-byte portion of the file is displayed with 4 lines. The first line displays the bytes in hexadecimal. The second line displays the bytes as characters. The third line displays each group of two bytes interpreted as a signed 16-bit integer. The fourth line (added by human) presents the interpretation of the bytes in terms of netCDF components and values.

   4344    4601    0000    0000    0000    0000    0000    0000
  C   D   F 001  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
  17220   17921   00000   00000   00000   00000   00000   00000
[magic number ] [  0 records  ] [  0 dimensions   (ABSENT)    ]

   0000    0000    0000    0000    0000    0000    0000    0000
 \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
  00000   00000   00000   00000   00000   00000   00000   00000
[  0 global atts  (ABSENT)    ] [  0 variables    (ABSENT)    ]

As a less trivial example, consider the CDL

netcdf tiny {
dimensions:
        dim = 5;
variables:
        short vx(dim);
data:
        vx = 3, 1, 4, 1, 5 ;
}

which corresponds to a 92-byte netCDF file. The following is an edited dump of this file:

   4344    4601    0000    0000    0000    000a    0000    0001
  C   D   F 001  \0  \0  \0  \0  \0  \0  \0  \n  \0  \0  \0 001
  17220   17921   00000   00000   00000   00010   00000   00001
[magic number ] [  0 records  ] [NC_DIMENSION ] [ 1 dimension ]

   0000    0003    6469    6d00    0000    0005    0000    0000
 \0  \0  \0 003   d   i   m  \0  \0  \0  \0 005  \0  \0  \0  \0
  00000   00003   25705   27904   00000   00005   00000   00000
[  3 char name = "dim"        ] [ size = 5    ] [ 0 global atts

   0000    0000    0000    000b    0000    0001    0000    0002
 \0  \0  \0  \0  \0  \0  \0 013  \0  \0  \0 001  \0  \0  \0 002
  00000   00000   00000   00011   00000   00001   00000   00002
 (ABSENT)     ] [NC_VARIABLE  ] [ 1 variable  ] [ 2 char name =

   7678    0000    0000    0001    0000    0000    0000    0000
  v   x  \0  \0  \0  \0  \0 001  \0  \0  \0  \0  \0  \0  \0  \0
  30328   00000   00000   00001   00000   00000   00000   00000
 "vx"         ] [1 dimension  ] [ with ID 0   ] [ 0 attributes

   0000    0000    0000    0003    0000    000c    0000    0050
 \0  \0  \0  \0  \0  \0  \0 003  \0  \0  \0  \f  \0  \0  \0   P
  00000   00000   00000   00003   00000   00012   00000   00080
 (ABSENT)     ] [type NC_SHORT] [size 12 bytes] [offset:    80]

   0003    0001    0004    0001    0005    8001
 \0 003  \0 001  \0 004  \0 001  \0 005 200 001
  00003   00001   00004   00001   00005  -32767
[    3] [    1] [    4] [    1] [    5] [fill ]

netcdf_file	=	header data
header	=	magic numrecs dim_list gatt_list var_list
magic	=	'C' 'D' 'F' VERSION
VERSION	=	\x01 \|	//classic format (CDF-1)
		\x02	//64-bit offset format (CDF-2)
numrecs	=	NON_NEG \| STREAMING	//length of record dimension
dim_list	=	ABSENT \| NC_DIMENSION nelems [dim ...]
gatt_list	=	att_list	//global attributes
att_list	=	ABSENT \| NC_ATTRIBUTE nelems [attr ...]
var_list	=	ABSENT \| NC_VARIABLE nelems [var ...]
ABSENT	=	ZERO ZERO	//list is not present (CDF-1 and CDF-2)
ZERO	=	\x00 \x00 \x00 \x00	//32-bit zero
NC_DIMENSION	=	\x00 \x00 \x00 \x0A	//tag for list of dimensions
NC_VARIABLE	=	\x00 \x00 \x00 \x0B	//tag for list of variables
NC_ATTRIBUTE	=	\x00 \x00 \x00 \x0C	//tag for list of attributes
nelems	=	NON_NEG	//number of elements in following sequence
dim	=	name dim_length
name	=	nelems namestring
			//Names a dimension, variable, or attribute.
			//Names should match the regular expression
			//([a-zA-Z0-9_]\|{MUTF8})([^\x00-\x1F/\x7F-\xFF]\|{MUTF8})*
			//For other constraints, see "Note on names" below.
namestring	=	ID1 [IDN ...] padding
ID1	=	alphanumeric \| '_'
IDN	=	alphanumeric \| special1 \| special2
alphanumeric	=	lowercase \| uppercase \| numeric \| MUTF8
lowercase	=	'a'\|'b'\|'c'\|'d'\|'e'\|'f'\|'g'\|'h'\|'i'\|'j'\|'k'\|'l'\|'m'\|
		'n'\|'o'\|'p'\|'q'\|'r'\|'s'\|'t'\|'u'\|'v'\|'w'\|'x'\|'y'\|'z'
uppercase	=	'A'\|'B'\|'C'\|'D'\|'E'\|'F'\|'G'\|'H'\|'I'\|'J'\|'K'\|'L'\|'M'\|
		'N'\|'O'\|'P'\|'Q'\|'R'\|'S'\|'T'\|'U'\|'V'\|'W'\|'X'\|'Y'\|'Z'
numeric	=	'0'\|'1'\|'2'\|'3'\|'4'\|'5'\|'6'\|'7'\|'8'\|'9'
special1	=	'_'\|'.'\|'\|''+'\|'-'	//special1 chars have traditionally been
			//permitted in CDF-1.
special2	=	' ' \| '!' \| '"' \| '#' \| '$' \| '%' \| '&' \| '\'' \|	//special2 chars are recently permitted in
		'(' \| ')' \| '*' \| ',' \| ':' \| ';' \| '<' \| '=' \|	//names (and require escaping in CDL).
		'>' \| '?' \| '[' \| '\\' \| ']' \| '^' \| '`' \| '{' \|	//Note: '/' is not permitted.
		'\|' \| '}' \| '~'
MUTF8	=	<multibyte UTF-8 encoded>	//NFC-normalized Unicode character
dim_length	=	NON_NEG	//If zero, this is the record dimension.
			//There can be at most one record dimension.
attr	=	name nc_type nelems [values ...]
nc_type	=	NC_BYTE \|
		NC_CHAR \|
		NC_SHORT \|
		NC_INT \|
		NC_FLOAT \|
		NC_DOUBLE
var	=	name nelems [dimid ...] vatt_list nc_type vsize begin
			//nelems is the dimensionality (rank) of the variable
			//0 for scalar, 1 for vector, 2 for matrix, ...
dimid	=	NON_NEG	//Dimension ID (index into dim_list) for variable
			//shape. We say this is a "record variable" if and only
			//if the first dimension is the record dimension.
vatt_list	=	att_list	//Variable-specific attributes
vsize	=	NON_NEG	//Variable size. If not a record variable, the amount
			//of space in bytes allocated to the variable's data.
			//For record variables, it is the amount of space per
			//record. See "Note on vsize", below.
begin	=	OFFSET	//Variable start location. The offset in bytes (seek
			//index) in the file of the beginning of data for this.
			//variable.
data	=	non_recs recs
non_recs	=	[vardata ...]	//The data for all non-record variables, stored
			//contiguously for each variable, in the same order the
			//variables occur in the header.
vardata	=	[values ...]	//All data for a non-record variable, as a block of
			//values of the same type as the variable, in row-major
			//order (last dimension varying fastest).
recs	=	[record ...]	//The data for all record variables are stored
			//interleaved at the end of the file.
record	=	[varslab ...]	//Each record consists of the n-th slab from each
			//record variable, for example x[n,...], y[n,...],
			//z[n,...] where the first index is the record number,
			//which is the unlimited dimension index.
varslab	=	[values ...]	//One record of data for a variable, a block of values
			//all of the same type as the variable in row-major
			//order (last index varying fastest).
values	=	bytes \| chars \| shorts \| ints \| floats \| doubles
string	=	nelems [chars]
bytes	=	[BYTE ...] padding
chars	=	[CHAR ...] padding
shorts	=	[SHORT ...] padding
ints	=	[INT ...]
floats	=	[FLOAT ...]
doubles	=	[DOUBLE ...]
padding	=	<0, 1, 2, or 3 bytes to next 4-byte boundary>
			//Header padding uses null (\x00) bytes. In data, padding
			//uses variable's fill value. See "Note on padding" below
			//for a special case.
NON_NEG	=	<non-negative INT>
STREAMING	=	\xFF \xFF \xFF \xFF	//Indicates indeterminate record count, allows streaming
			//data
OFFSET	=	<non-negative INT> \|	//for classic format (CDF-1)
		<non-negative INT64>	//for CDF-2 format
BYTE	=	<8-bit byte>	//See "Note on byte data" below.
CHAR	=	<8-bit byte>	//See "Note on char data" below.
SHORT	=	<16-bit signed integer, Bigendian, two's complement>
INT	=	<32-bit signed integer, Bigendian, two's complement>
INT64	=	<64-bit signed integer, Bigendian, two's complement>
FLOAT	=	<32-bit IEEE single-precision float, Bigendian>
DOUBLE	=	<64-bit IEEE double-precision float, Bigendian>
			//following type tags are 32-bit integers
NC_BYTE	=	\x00 \x00 \x00 \x01	//8-bit signed integers
NC_CHAR	=	\x00 \x00 \x00 \x02	//text characters
NC_SHORT	=	\x00 \x00 \x00 \x03	//16-bit signed integers
NC_INT	=	\x00 \x00 \x00 \x04	//32-bit signed integers
NC_FLOAT	=	\x00 \x00 \x00 \x05	//IEEE single precision floats
NC_DOUBLE	=	\x00 \x00 \x00 \x06	//IEEE double precision floats
			//Default fill values for each type, may be overridden by
			//variable attribute named `_FillValue'. See
			//"Note on fill values" below.
FILL_CHAR	=	\x00	//((char)0) null byte
FILL_BYTE	=	\x81	//(signed char) -127
FILL_SHORT	=	\x80 \x01	//(short) -32767
FILL_INT	=	\x80 \x00 \x00 \x01	//(int) -2147483647
FILL_FLOAT	=	\x7C \xF0 \x00 \x00	//(float) 9.9692099683868690e+36
FILL_DOUBLE	=	\x47 \x9E \x00 \x00 \x00 \x00 \x00 \x00	//(double)9.9692099683868690e+36