Sponsor:PnetCDF development was sponsored by the Scientific Data Management Center (SDM) under the DOE program of Scientific Discovery through Advanced Computing (SciDAC). It was also supported in part by National Science Foundation under the SDCI HPC program award numbers OCI-0724599 and HECURA program award numbers CCF-0938000. PnetCDF is currently supported in part by DOE Award Number DE-SC0007456.
Project Team Members:
- Wei-keng Liao
- Alok Choudhary
- Seung Woo Son
- Kui Gao (formally postdoc, now Dassault Systèmes Simulia Corp.
- Jianwei Li (since graduated, now Bloomberg L.P.)
Argonne National Laboratory
- Parallel netCDF features
- Interoperability with netCDF-4
- Download source codes
- User documents
- I/O benchmarking programs
- User community
About netCDF:NetCDF (Network Common Data Form) defines two sets of standards to support the creation, access, and sharing of scientific data.
- File format -- A self-describable, machine-independent file format is defined for representing scientific data. It is designed to store multi-dimensional array-oriented data together with its attributes (such as annotations.) The data layout in files follows the canonical order of the arrays.
- Application Programming Interface (API) -- A set of Fortran, C, and C++ functions for accessing array-oriented data stored in netCDF files. The APIs are used to define dimensions, variables, attributes of variables, and perform data read/write to netCDF files.
Starting from version 4, Unidata's netCDF supports parallel I/O either through PnetCDF or HDF5. Through PnetCDF, netCDF-4 can access files in CDF formats in parallel. Similarly, through HDF5, netCDF-4 can access files in HDF5 format.
Parallel netCDF features:PnetCDF contains a set of new APIs for accessing netCDF files in parallel. Figure 1 compares the data access from multiple processes between using the sequential netCDF and PnetCDF. The new APIs incorporate the parallel semantics following the Message Passing Interfaces (MPI) and provide backward compatibility with the netCDF file formats: CDF (or CDF-1), CDF-2, and CDF-5. Other features of PnetCDF are:
- Minimize the changes to the netCDF API syntax --
In order for easy code migration from sequential netCDF
to PnetCDF, PnetCDF APIs mimic the syntax of the netCDF APIs with only a few changes to add parallel I/O concept.
These changes are highlighted as follows.
- All parallel APIs are named after originals with prefix
of "ncmpi_" for C/C++, "nfmpi_" for Fortran 77, and "nf90mpi_" for Fortran 90.
int ncmpi_put_vara_float(int ncid, /* dataset ID */ int varid, /* variable ID */ const MPI_Offset start, /* [ndims] */ const MPI_Offset count, /* [ndims] */ float *buf) /* user buffer */
- An MPI communicator and an MPI_Info object are added to
the argument list of the open/create APIs.
The communicator defines the set of processes accessing
the netCDF file. The info object allows users to specify I/O
hints for PnetCDF and MPI-IO to further improve performance
(e.g. file alignment for file header size and variable's starting offset,
and the MPI-IO hints.)
An example is
int ncmpi_open(MPI_Comm comm, /* the group of MPI processes sharing the file */ const char *path, int omode, MPI_Info info, /* PnetCDF and MPI-IO hints */ int *ncidp)
- PnetCDF allows two I/O modes, collective and independent,
which correspond to MPI collective and independent I/O operations.
Similar to MPI naming convention, all collective APIs
carry an extra suffix "_all". The independent I/O mode
is wrapped by the calls of ncmpi_begin_indep_data() and
ncmpi_end_indep_data(). For example, the API ncmpi_put_vara_float()
above is an independent API and the corresponding collective API is
int ncmpi_put_var_float_all(int ncid, /* dataset ID */ int varid, /* variable ID */ const MPI_Offset start, /* [ndims] */ const MPI_Offset count, /* [ndims] */ float *buf) /* user buffer */
- PnetCDF changes the integer data types for all the API
arguments that are defined as size_t in netCDF to MPI_Offset.
For example, the arguments start and count in the above
APIs are of MPI_Offset data type vs. size_t in netCDF. Another
example is the API defining a dimension, given below. The
arguments of type ptrdiff_t are also changed to of type
MPI_Offset, such as stride and imap in vars and varm APIs.
int ncmpi_def_dim(int ncid, /* dataset ID */ const char *name, /* dimension name string */ const MPI_Offset len, /* length of dimension */ int *dimidp) /* returned dimension ID */
- All parallel APIs are named after originals with prefix of "ncmpi_" for C/C++, "nfmpi_" for Fortran 77, and "nf90mpi_" for Fortran 90.
- Support large files -- PnetCDF supports CDF-2 file format. With CDF-2 format, even on 32-bit platforms one can create netCDF files of size greater than 2GB. PnetCDF uses MPI_Offset, a 64-bit integer, for all file offset references.
- Support large variables -- PnetCDF supports CDF-5 file format. With CDF-5 format, large sized array variables with more than 4 billion elements can be created in a netCDF file. This is achieved by using MPI_Offset, a 64-bit integer, for all array index references.
- Support CDF-5 data types -- New data types introduced in CDF-5 are supported: NC_UBYTE, NC_USHORT, NC_UINT, NC_INT64, and NC_UINT64.
- Provide high performance -- PnetCDF is built on top of MPI-IO, which guarantees the portability across various platforms and high performance.
- Request aggregation -- Nonblocking APIs (ncmpi_iput_xxx/ncmpi_iget_xxx) are designed to aggregate smaller I/O requests into large ones for better performance. A common practice is to post multiple nonblocking calls and use ncmpi_wait_all() to complete the I/O transaction. User's I/O buffer should not be modified between the post and wait. The APIs can be used to aggregate requests to the same variables and/or across different variables. See examples/README for example programs.
- Buffer management -- Buffered write APIs (ncmpi_bput_xxx) is another set of nonblocking APIs that allows application programs to reuse the user buffers when the APIs return. User programs must first call ncmpi_buffer_attach() to specify an amount of internal buffer that can be used by PnetCDF to aggregate the write requests. Example programs can be found in examples/tutorial/pnetcdf-write-buffered.c and examples/tutorial/pnetcdf-write-bufferedf.f90.
- PnetCDF I/O hints -- PnetCDF I/O hints, nc_header_align_size and nc_var_align_size, allows users to set a customized file header size and starting file offsets of non-record variables. File layout alignment has been known to cause significant performance impact on parallel file system. The hint nc_header_align_size can be used to reserve a sufficiently large space for file header, in case more metadata is to be added to an existing netCDF file. The common practice for setting the hint nc_var_align_size is the file system striping size. An example program can be found in hints.c.
- APIs that allow non-contiguous I/O buffers --
In addition to the support of existing netCDF functionality, another new
set of APIs (called flexible API) are created to make use of MPI derived
datatypes to allow describing complex memory layout for user I/O buffers.
On the contrary, traditional netCDF APIs allow only contiguous data buffers.
Example programs can be found in
An example API is given below.
int ncmpi_put_vara_all(int ncid, int varid, const MPI_Offset start, /* [ndims] */ const MPI_Offset count, /* [ndims] */ void *buf, /* user I/O buffer */ MPI_Datatype bufcount, /* number of buftype elements in buf */ MPI_Datatype buftype) /* MPI derived data type */
- APIs that make requests across multiple variables --
Using PnetCDF request aggregation APIs can bypass this limitation in a nonblocking fashion, as described above.
If blocking APIs is preferred, i.e. users wish to make a single call to complete the requests, PnetCDF also provides a set of APIs for this feature (mput/mget for multiple requests to multiple variables.)
The APIs allow to make multiple subarray requests (cross- or same-variable) in a single function call.
One use case is to make a request that consists of a series of arbitrary array indices and lengths.
(See the example program mput.c).
An example API is
int ncmpi_mput_vara_all(int ncid, /* dataset ID */ int nvars, /* number of variables */ int varids, /* [nvars] list of variable IDs */ MPI_Offset* const starts, /* [nvars][ndims] list of start offsets */ MPI_Offset* const counts, /* [nvars][ndims] list of access counts */ void* bufs, /* [nvars] list of buffer pointers */ MPI_Offset bufcounts, /* [nvars] list of buffer counts */ MPI_Datatype buftypes); /* [nvars] MPI derived datatypes describing bufs */
- APIs that make multiple requests of arbitrary starts and counts --
Conventional netCDF APIs (i.e. var, var1, vara, vars, and varm) allow one request to a variable per API call.
A new set of APIs, named varn, is introduced to allow making multiple requests with arbitrary locations to a single variable.
(See the example program put_varn_float.c).
An example API is
int ncmpi_put_varn_float_all(int ncid, /* dataset ID */ int varid, /* variable ID */ int num, /* number of requests */ MPI_Offset* const starts, /* [num][ndims] list of start offsets */ MPI_Offset* const counts, /* [num][ndims] list of access counts */ float* bufs); /* [num] list of buffer pointers */
Interoperability with netCDF-4
- Starting from version 220.127.116.11, netCDF-4 program can perform parallel I/O on the classical CDF-1 and CDF-2 files through PnetCDF. This is done by passing file create mode NC_PNETCDF to nc_create_par(), for instance,
nc_create_par(filename, NC_PNETCDF, MPI_COMM_WORLD, info, &ncid);
- nc_pnc_put.c is an example program that makes a call to nc_put_vara_int() to write subarrays to a 2D integer array in parallel. In this example, the data partition pattern among processes is in a block fashion along both X and Y dimensions.
- nc_pnc_get.c is a read example program, the counterpart of nc4_pnc_put.c.
- Fortran version of the above examples can be found in nc4_pnc_put_vara.f and nc4_pnc_get_vara.f
- coll_perf_nc4.c is an I/O performance benchmark program that reports the aggregate bandwidth of writing 20 three-dimensional arrays of integer type in parallel. The data partitioning pattern used is in block-block-block along the three dimensions. This program also reports the performance of using HDF5+MPI-IO method.
Download Source Codes
- The latest stable version 1.5.0 was released on July 8, 2014.
- SVN repository for the source codes under development. Users are welcomed to try the new features under testing. The SVN URL is https://svn.mcs.anl.gov/repos/parallel-netcdf/trunk
- PnetCDF C Interface Guide. This user guide is developed based on the netCDF C interface Guide.
- PnetCDF Q&A contains a few tips for achieving better I/O performance.
- A tutorial with six use cases of popular parallel I/O strategies: 1) I/O from the master process; 2) one file per process; 3) parallel I/O on a shared file; 4) using non-contiguous I/O buffer; 5) using non-blocking I/O; and 6) using buffered APIs.
Under DevelopmentPnetCDF is constantly updated with new features and performance improvement methods to meet the high-performance computing demands. The following list some task currently under development.
- Subfiling -- Subfiling is a scheme that can divide a large multi-dimensional global array into smaller subarrays, each saved in a separate netCDF file, named subfile. The subfiling scheme can decrease the number of processes sharing a file, so to reduce the file access contention, an overhead the file system pays to maintain data consistency. Subfiling is available in PnetCDF release 1.4.1.
I/O Performance Benchmarking Programs
- PnetCDF in BTIO -- BTIO is the I/O part of NASA's NAS Parallel Benchmarks (NPB) suite.
- btio-pnetcdf.tar.gz (SHA1 checksum: 0bacbee5bb29ac3d58e3f492002e5d05e9b5b5ae)
- PnetCDF in S3D-IO -- S3D is a continuum scale first principles direct numerical simulation code developed at Sandia National Laboratory. S3D-IO is its I/O kernel.
- s3d-io-pnetcdf.tar.gz (SHA1 checksum: db59c996c18663ef627d142dbe8371a8fc25ce4e)
- PnetCDF in GCRM-IO -- Global Cloud Resolving Model (GCRM) developed at Colorado State University, is a climate application framework designed to simulate the circulations associated with large convective clouds. The I/O module in GCRM uses Geodesic I/O library (GIO) developed at Pacific Northwest National Laboratory. Note that the GCRM I/O kernel benchmark program is included in the GIO source release. The tar ball downloadable from this site contains C version of the benchmark converted from the GIO's Fortran version.
- gcrm-io-pnetcdf-1.0.0.tar.gz (SHA1 checksum: 32bd510faf4fcef3edeb564d3885edac21f8122d)
- PnetCDF in FLASH-IO -- FLASH is a block-structured adaptive mesh hydrodynamics code developed mainly for the study of nuclear flashes on neutron stars and white dwarfs. The PnetCDF method is developed based on the FLASH I/O benchmark suite and is included in the PnetCDF release starting from v1.4.0.
- Seung Woo Son, Saba Sehrish, Wei-keng Liao, Ron Oldfield, and Alok Choudhary. Dynamic File Striping and Data Layout Transformation on Parallel System with Fluctuating I/O Workload. In the Workshop on Interfaces and Architectures for Scientific Data Storage, September 2013.
- Rob Latham, Chris Daley, Wei-keng Liao, Kui Gao, Rob Ross, Anshu Dubey, and Alok Choudhary. A Case Study for Scientific I/O: Improving the FLASH Astrophysics Code. Computer and Scientific Discovery, 5, March 2012.
- Kui Gao, Chen Jin, Alok Choudhary, and Wei-keng Liao. Supporting Computational Data Model Representation with High-performance I/O in Parallel netCDF. In the EEE International Conference on High Performance Computing, December 2011.
- Kui Gao, Wei-keng Liao, Arifa Nisar, Alok Choudhary, Robert Ross, and Robert Latham. Using Subfiling to Improve Programming Flexibility and Performance of Parallel Shared-file I/O. In the Proceedings of the International Conference on Parallel Processing, Vienna, Austria, September 2009.
- Kui Gao, Wei-keng Liao, Alok Choudhary, Robert Ross, and Robert Latham. Combining I/O Operations for Multiple Array Variables in Parallel netCDF. In the Proceedings of the Workshop on Interfaces and Architectures for Scientific Data Storage, held in conjunction with the IEEE Cluster Conference, New Orleans, Louisiana, September 2009.
- Jianwei Li, Wei-keng Liao, Alok Choudhary, Robert Ross, Rajeev Thakur, William Gropp, Rob Latham, Andrew Siegel, Brad Gallagher, and Michael Zingale. Parallel netCDF: A Scientific High-Performance I/O Interface. In the Proceedings of Supercomputing Conference, November, 2003.
User Discuss Mailing List
- Participate the discussion and receive announcement by subscribing the mailing list.
- Mailing list archive.
- Unidata's netCDF
- Message Passing Interface Standard
- Parallel netCDF project web page maintained at Argonne National Laboratory, including software download, users documents, etc.
- Extreme Linux: Parallel netCDF - an article from Linux Magazine
- Philippe Wauteleta and Pierre Kestener. Parallel IO Performance and Scalability Study on the PRACE CURIE Supercomputer, white paper at Partnership For Advanced Computing in Europe (PRACE), September, 2012. This report compares the performance of PnetCDF, HDF5, and MPI-IO on CURIE supercomputer with Lustre parallel file system using IOR and RAMSES.
- Data Services for the Global Cloud Resolving Model (GCRM)
using Parallel netCDF, Fortran 90, C++
Organization: Pacific Northwest National Laboratory
People: Karen Schuchardt
Reference: B. Palmer, A. Koontz, K. Schuchardt, R. Heikes, and D. Randall. Efficient Data IO for a Parallel Global Cloud Resolving Model Environ. Model. Softw. 26, 12 (December 2011), 1725-1735. DOI=10.1016/j.envsoft.2011.08.007.
- NCAR Community Atmosphere Model (CAM)
using Parallel netCDF, ZioLib
Platforms: IBM SP3, SP4, SP5, BlueGene/L, Cray X1E
File systems: GPFS, PVFS2, NFS
Organization: Department of Atmospheric Sciences, National Taiwan University
People: Yu-heng Tseng (yhtseng at as.ntu.edu.tw)
Reference: Tseng, Y. H. and Ding, C.H.Q. Efficient Parallel I/O in Community Atmosphere Model (CAM), International Journal of High Performance Computing Applications, 22, 206-218, 2008.
- Astrophysical Thermonuclear Flashes (FLASH)
using Parallel netCDF, HDF5, C
Platforms: IBM SP, Linux Clusters
Organization: ASCI Flash Center, University of Chicago
People: Brad Gallagher, Katie Antypas
Reference: R. Latham, C. Daley, W. Liao, K. Gao, R. Ross, A. Dubey and A. Choudhary. A Case Study for Scientific I/O: Improving the FLASH Astrophysics Code. In the Computational Science and Discovery, vol. 5, 2012. DOI. 10.1088/1749-4699/5/1/015001
- ASPECT, parallel VTK
using Parallel netCDF, C
Platforms: Linux Clusters, Cray X
People: Nagiza Samatova
- Atmospheric Chemical Transport Model (ACTM)
using Parallel netCDF, FORTRAN
Organization: Center for Applied Scientific Computing, LLNL
People: John R. Tannahill
- PRogram for Integrated Earth System Modeling (PRISM) Support Initiative
supports netCDF and Parallel netCDF within the IO library as part of the OASIS4 coupler
using netCDF, Parallel netCDF (FORTRAN APIs)
Platforms: NEC SX, Linux Cluster, SGI and others
Organization: C&C Research Laboratories, NEC Europe Ltd.
Contacts for pnetcdf in PRISM: Reiner Vogelsang and Rene Redler
Contacts for pnetcdf users on NEC SX: Joachim Worringen and Rene Redler
- Weather Research and Forecast
(WRF) modeling system software
using Parallel netCDF, FORTRAN
Organization: National Center for Atmospheric Research (NCAR)
People: John Michalakes
Hurricane Force Supercomputing: Petascale Simulations of Sandy
- Stony Brook Parallel Ocean Model (sbPOM)
using Parallel netCDF, FORTRAN
sbPOM is a parallel, free-surface, sigma-coordinate, primitive equations ocean modeling code based on the Princeton Ocean Model (POM) People: Antoni Jordi (toniimedea.uib-csic.es) and Dong-Ping Wang (dpwangnotes.cc.sunysb.edu)
- WRF-ROMS (Regional Ocean Model System) I/O Module
using Parallel netCDF, FORTRAN
Organization: Scientific Data Technologies Group, NCSA
People: MuQun Yang, John Blondin
- Portable, Extensible Toolkit for Scientific Computation
- The Earth System Modeling Framework (ESMF)
platform: IBM Blue Gene / L
Organization: Scientific Computing Division
National Center for Atmospheric Research
Boulder, CO 80305
People: Nancy Collins, James P. Edwards
- Community Earth System Model
The program uses PIO library for parallel I/O operations, which includes PnetCDF method. CESM software prerequisites
- Parallel Ice Sheet Model (PISM)
The Parallel Ice Sheet Model is an open source, parallel, high-resolution ice sheet model.
We are grateful to the following people who provide valuable
comments/discussions to improve our implementation.
Yu-Heng Tseng (LBNL) Reiner Vogelsang (Silicon Graphics, Germany), Jon Rhoades (Information Systems & Technology ENSCO, Inc.), Kilburn Building (University Of Manchester), Foucar, James G (Sandia National Lab.), Drake, Richard R (Sandia National Lab.), Eileen Corelli (Senior Scientist, ENSCO Inc.), Roger Ting, Hao Yu, Raimondo Giammanco, John R. Tannahill (Lawrence Livermore Nattional. Lab.), Tyce Mclarty (Lawrence Livermore Nattional. Lab.), Peter Schmitt, Mike Dvorak (LCRC team, MCS ANL)