Next: CDF-2 file format specification, Previous: Combined Index, Up: Top [Index]
We refer "CDF-1" as the identification string, "magic", occupying the first 4 bytes of a netCDF file. The string can be "CDF1", "CDF2", or "CDF5".
CDF-1 and CDF-2 are also referred by the ESDS Community Standard as NetCDF Classic and 64-bit Offset File Formats, respectively. See [ESDS-RFC-011v2.0]
The difference between CDF-1 and CDF-2 is only in the VERSION byte (\x01 vs. \x02) and the OFFSET entity, a 64-bit instead of a 32-bit offset from the beginning of the file. See CDF-2 file format specification for the detailed specifications of both CDF-1 and CDF-2.
Below is an older version of CDF file format specification used by NetCDF library through version 3.5.1. It is no longer referred as CDF-1. Readers are referred to [this URL] for the original specification (copied below).
netcdf_file := header data header := magic numrecs dim_array gatt_array var_array magic := 'C' 'D' 'F' VERSION_BYTE VERSION_BYTE := '\001' // the file format version number numrecs := NON_NEG dim_array := ABSENT | NC_DIMENSION nelems [dim ...] gatt_array := att_array // global attributes att_array := ABSENT | NC_ATTRIBUTE nelems [attr ...] var_array := ABSENT | NC_VARIABLE nelems [var ...] ABSENT := ZERO ZERO // Means array not present (equivalent to // nelems == 0). nelems := NON_NEG // number of elements in following sequence dim := name dim_size name := string dim_size := NON_NEG // If zero, this is the record dimension. // There can be at most one record dimension. attr := name nc_type nelems [values] nc_type := NC_BYTE | NC_CHAR | NC_SHORT | NC_LONG | NC_FLOAT | NC_DOUBLE var := name nelems [dimid ...] vatt_array nc_type vsize begin // nelems is the rank (dimensionality) of the // variable; 0 for scalar, 1 for vector, 2 for // matrix, ... vatt_array := att_array // variable-specific attributes dimid := NON_NEG // Dimension ID (index into dim_array) for // variable shape. We say this is a "record // variable" if and only if the first // dimension is the record dimension. vsize := NON_NEG // Variable size. If not a record variable, // the amount of space, in bytes, allocated to // that variable's data. This number is the // product of the dimension sizes times the // size of the type, padded to a four byte // boundary. If a record variable, it is the // amount of space per record. The netCDF // "record size" is calculated as the sum of // the vsize's of the record variables. begin := NON_NEG // Variable start location. The offset in // bytes (seek index) in the file of the // beginning of data for this variable. data := non_recs recs non_recs := [values ...] // Data for first non-record var, second // non-record var, ... recs := [rec ...] // First record, second record, ... rec := [values ...] // Data for first record variable for record // n, second record variable for record n, ... // See the note below for a special case. values := [bytes] | [chars] | [shorts] | [ints] | [floats] | [doubles] string := nelems [chars] bytes := [BYTE ...] padding chars := [CHAR ...] padding shorts := [SHORT ...] padding ints := [INT ...] floats := [FLOAT ...] doubles := [DOUBLE ...] padding := <0, 1, 2, or 3 bytes to next 4-byte boundary> // In header, padding is with 0 bytes. In // data, padding is with variable's fill-value. NON_NEG := <INT with non-negative value> ZERO := <INT with zero value> BYTE := <8-bit byte> CHAR := <8-bit ACSII/ISO encoded character> SHORT := <16-bit signed integer, Bigendian, two's complement> INT := <32-bit signed integer, Bigendian, two's complement> FLOAT := <32-bit IEEE single-precision float, Bigendian> DOUBLE := <64-bit IEEE double-precision float, Bigendian> // tags are 32-bit INTs NC_BYTE := 1 // data is array of 8 bit signed integer NC_CHAR := 2 // data is array of characters, i.e., text NC_SHORT := 3 // data is array of 16 bit signed integer NC_LONG := 4 // data is array of 32 bit signed integer NC_FLOAT := 5 // data is array of IEEE single precision float NC_DOUBLE := 6 // data is array of IEEE double precision float NC_DIMENSION := 10 NC_VARIABLE := 11 NC_ATTRIBUTE := 12
To calculate the offset (position within the file) of a specified data value, let external_sizeof
be the external size in bytes of one data value of the appropriate type for the specified variable, nc_type
:
NC_BYTE 1 NC_CHAR 1 NC_SHORT 2 NC_INT 4 NC_FLOAT 4 NC_DOUBLE 8
On open() (or endef()), scan through the array of variables, denoted
var_array
above, and sum the vsize
fields of "record"
variables to compute recsize
.
Form the products of the dimension sizes for the variable from right to
left, skipping the leftmost (record) dimension for record variables, and
storing the results in a product
array for each variable. For example:
Non-record variable: dimension lengths: [ 5 3 2 7] product vector: [210 42 14 7] Record variable: dimension lengths: [0 2 9 4] product vector: [0 72 36 4]
At this point, the leftmost product, when rounded up to the next
multiple of 4, is the variable size, vsize
, in the grammar above. For
example, in the non-record variable above, the value of the vsize
field is 212 (210 rounded up to a multiple of 4). For the record
variable, the value of vsize
is just 72, since this is already a
multiple of 4.
Let coord be the array of coordinates (dimension indices, zero-based)
of the desired data value. Then the offset of the value from the
beginning of the file is just the file offset of the first data value
of the desired variable (its begin
field) added to the inner product
of the coord and product vectors times the size, in bytes, of each
datum for the variable. Finally, if the variable is a record variable,
the product of the record number, ’coord[0]’, and the record size,
recsize
, is added to yield the final offset value.
In pseudo-C code, here’s the calculation of offset
:
for (innerProduct = i = 0; i < var.rank; i++) innerProduct += product[i] * coord[i] offset = var.begin; offset += external_sizeof * innerProduct if (IS_RECVAR(var)) offset += coord[0] * recsize;
So, to get the data value (in external representation):
lseek(fd, offset, SEEK_SET); read(fd, buf, external_sizeof);
A special case: Where there is exactly one record variable, we drop the requirement that each record be four-byte aligned, so in this case there is no record padding.
By using the grammar above, we can derive the smallest valid netCDF file, having no dimensions, no variables, no attributes, and hence, no data. A CDL representation of the empty netCDF file is
netcdf empty { }
This empty netCDF file has 32 bytes. It begins with the four-byte “magic number” that identifies it as a netCDF version 1 file: ‘C’, ‘D’, ‘F’, ‘\x01’. Following are seven 32-bit integer zeros representing the number of records, an empty list of dimensions, an empty list of global attributes, and an empty list of variables.
Below is an (edited) dump of the file produced using the Unix command
od -xcs empty.nc
Each 16-byte portion of the file is displayed with 4 lines. The first line displays the bytes in hexadecimal. The second line displays the bytes as characters. The third line displays each group of two bytes interpreted as a signed 16-bit integer. The fourth line (added by human) presents the interpretation of the bytes in terms of netCDF components and values.
4344 4601 0000 0000 0000 0000 0000 0000 C D F 001 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 17220 17921 00000 00000 00000 00000 00000 00000 [magic number ] [ 0 records ] [ 0 dimensions (ABSENT) ] 0000 0000 0000 0000 0000 0000 0000 0000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 00000 00000 00000 00000 00000 00000 00000 00000 [ 0 global atts (ABSENT) ] [ 0 variables (ABSENT) ]
As a less trivial example, consider the CDL
netcdf tiny { dimensions: dim = 5; variables: short vx(dim); data: vx = 3, 1, 4, 1, 5 ; }
which corresponds to a 92-byte netCDF file. The following is an edited dump of this file:
4344 4601 0000 0000 0000 000a 0000 0001 C D F 001 \0 \0 \0 \0 \0 \0 \0 \n \0 \0 \0 001 17220 17921 00000 00000 00000 00010 00000 00001 [magic number ] [ 0 records ] [NC_DIMENSION ] [ 1 dimension ] 0000 0003 6469 6d00 0000 0005 0000 0000 \0 \0 \0 003 d i m \0 \0 \0 \0 005 \0 \0 \0 \0 00000 00003 25705 27904 00000 00005 00000 00000 [ 3 char name = "dim" ] [ size = 5 ] [ 0 global atts 0000 0000 0000 000b 0000 0001 0000 0002 \0 \0 \0 \0 \0 \0 \0 013 \0 \0 \0 001 \0 \0 \0 002 00000 00000 00000 00011 00000 00001 00000 00002 (ABSENT) ] [NC_VARIABLE ] [ 1 variable ] [ 2 char name = 7678 0000 0000 0001 0000 0000 0000 0000 v x \0 \0 \0 \0 \0 001 \0 \0 \0 \0 \0 \0 \0 \0 30328 00000 00000 00001 00000 00000 00000 00000 "vx" ] [1 dimension ] [ with ID 0 ] [ 0 attributes 0000 0000 0000 0003 0000 000c 0000 0050 \0 \0 \0 \0 \0 \0 \0 003 \0 \0 \0 \f \0 \0 \0 P 00000 00000 00000 00003 00000 00012 00000 00080 (ABSENT) ] [type NC_SHORT] [size 12 bytes] [offset: 80] 0003 0001 0004 0001 0005 8001 \0 003 \0 001 \0 004 \0 001 \0 005 200 001 00003 00001 00004 00001 00005 -32767 [ 3] [ 1] [ 4] [ 1] [ 5] [fill ]
Next: CDF-2 file format specification, Previous: Combined Index, Up: Top [Index]