Sponsor:
PnetCDF development was sponsored by the Scientific Data Management Center (SDM) under the DOE program of Scientific Discovery through Advanced Computing (SciDAC). It was also supported in part by National Science Foundation under the SDCI HPC program award numbers OCI-0724599 and HECURA program award numbers CCF-0938000.Project Team Members:
Northwestern University
- Alok Choudhary
- Wei-keng Liao
- Seung Woo Son
- Kui Gao (formally postdoc, now Dassault Systèmes Simulia Corp.
- Jianwei Li (since graduated, now Bloomberg L.P.)
Argonne National Laboratory
- Rob Latham
- Rob Ross
- Rajeev Thakur
- William Gropp(now UIUC)
Parallel NetCDF
About netCDF:
NetCDF (Network Common Data Form) defines two sets of standards to support the creation, access, and sharing of scientific data.- File format -- A self-describable, machine-independent file format is defined for representing scientific data. It is designed to store multi-dimensional array-oriented data together with its attributes (such as annotations.) The data layout in files follows the canonical order of the arrays.
- Application Programming Interface (API) -- A set of Fortran, C, and C++ functions for accessing array-oriented data stored in netCDF files. The APIs are used to define dimensions, variables, attributes of variables, and perform data read/write to netCDF files.
Starting from version 4, Unidata's netCDF supports parallel I/O either through PnetCDF or HDF5. Through PnetCDF, netCDF-4 can access files in CDF formats in parallel. Similarly, through HDF5, netCDF-4 can access files in HDF5 format.
About Parallel netCDF:
Parallel netCDF (PnetCDF) is a collaborative work between Northwestern University and Argonne National Laboratory. This work is to design and develop a set of new APIs for accessing netCDF files in parallel. Figure 1 compares the data access from multiple processes between using the sequential netCDF and PnetCDF. The new APIs incorporate the parallel semantics following the Message Passing Interfaces (MPI) and provide backward compatability with the netCDF file formats: CDF (or CDF-1), CDF-2, and CDF-5. Other features of PnetCDF are:- Minimize the changes to the netCDF API syntax --
In order for easy code migration from sequential netCDF
to PnetCDF, PnetCDF APIs mimic the syntax of the netCDF APIs with only a few changes to add parallel I/O concept.
These changes are highlighted as follows.
- All parallel APIs are named after originals with prefix
of "ncmpi" for C/C++ and "nfmpi" for Fortran.
int ncmpi_put_vars_float(int ncid, /* dataset ID */ int varid, /* variable ID */ const MPI_Offset start[], /* [ndims] */ const MPI_Offset count[], /* [ndims] */ const MPI_Offset stride[], /* [ndims] */ const float *fp) /* user buffer */ - An MPI communicator and an MPI_Info object are added to
the argument list of the open/create APIs.
The communicator defines the set of processes accessing
the netCDF file. The info object allows users to specify I/O
hints for PnetCDF and MPI-IO to further improve performance
(e.g. file alignment for file header size and variable's starting offset,
and the MPI-IO hints.)
An example is
int ncmpi_open(MPI_Comm comm, /* the group of MPI processes sharing the file */ const char *path, int omode, MPI_Info info, /* PnetCDF and MPI-IO hints */ int *ncidp) - PnetCDF allows two I/O modes, collective and independent,
whcih correspond to MPI collective and independent I/O operations.
Similar to MPI naming convention, all collective APIs
carry an extra suffix "_all". The independent I/O mode
is wrapped by the calls of ncmpi_begin_indep_data() and
ncmpi_end_indep_data(). For example, the API ncmpi_put_vars_float()
above is an independent API and the corresponding collective API is
given below.
int ncmpi_put_vars_float_all(int ncid, /* dataset ID */ int varid, /* variable ID */ const MPI_Offset start[], /* [ndims] */ const MPI_Offset count[], /* [ndims] */ const MPI_Offset stride[], /* [ndims] */ const float *fp) /* user buffer */
- All parallel APIs are named after originals with prefix
of "ncmpi" for C/C++ and "nfmpi" for Fortran.
- Support large files -- PnetCDF supports CDF-2 file format. With CDF-2 format, even on 32-bit platforms one can create netCDF files of size greater than 2GB. PnetCDF uses MPI_Offset, a 64-bit integer, for all file offset references.
- Support large variables -- PnetCDF supports CDF-5 file format. With CDF-5 format, large sized array variables with more than 4 billion elements can be created in a netCDF file. This is achieved by using MPI_Offset, a 64-bit integer, for all array index references.
- Support CDF-5 data types -- New data types introduced in CDF-5 are supported: NC_UBYTE, NC_USHORT, NC_UINT, NC_INT64, and NC_UINT64.
- Provide high performance -- PnetCDF is built on top of MPI-IO, which guarentees the portability across various platforms and high performance.
- Request aggregation -- Nonblocking APIs (ncmpi_iput_xxx/ncmpi_iget_xxx) are designed to aggregate smaller I/O requests into large ones for better performance. A common practice is to post multiple nonblocking calls and use ncmpi_wait_all() to complete the I/O transaction. User's I/O buffer should not be modified between the post and wait. The APIs can be used to aggregate requests to the same variables and/or across different variables. See examples/README for example programs.
- Buffer management -- Buffered write APIs (ncmpi_bput_xxx) is another set of nonblocking APIs that allows application programs to reuse the user buffers when the APIs return. User programs must first call ncmpi_buffer_attach() to specify an amount of internal buffer that cen be used by PnetCDF to aggregate the write requests. Example programs can be found in examples/tutorial/pnetcdf-write-buffered.c and examples/tutorial/pnetcdf-write-bufferedf.F.
- PnetCDF I/O hints -- PnetCDF I/O hints, nc_header_align_size and nc_var_align_size, allows users to set a customized file header size and starting file offsets of non-record variables. File layout alignment has been known to cause significant performance impact on parallel file system. The hint nc_header_align_size can be used to reserve a sufficiently large space for file header, in case more metadata is to be added to an existing netCDF file. The common practice for setting the hint nc_var_align_size is the file system striping size. An example program can be found in examples/hints.c.
- APIs that allow non-contiguous I/O buffers --
In addition to the support of existing netCDF functionality, another new
set of APIs (called flexible API) are created to make use of MPI derived
datatypes to allow describing complex memory layout for user I/O buffers.
On the contrary, traditional netCDF APIs allow only continguous data buffers.
Example programs can be found in
examples/flex.c and
examples/flex_f.F.
An example API is given below.
int ncmpi_put_vars_all(int ncid, int varid, const MPI_Offset start[], /* [ndims] */ const MPI_Offset count[], /* [ndims] */ const MPI_Offset stride[], /* [ndims] */ const void *buf, MPI_Datatype bufcount, /* number of elements in datatype */ MPI_Datatype datatype) /* MPI derived data type */ - APIs that make multiple subarray requests --
Existing netCDF APIs allow one request to a variable per API call.
Using PnetCDF request aggregation APIs can bypass this limitation in a nonblocking fashion, as described above.
If blocking APIs is preferred, i.e. users wish to make a single call to complete the requests, PnetCDF also provides a set of APIs for this feature (mput/mget for multiple requests to multiple variables.)
The APIs allow to make multiple subarray requests (cross- or same-variable) in a single function call.
One use case is to make a request that consists of a series of arbitrary array indices and lengths.
(See the example program examples/mput.c).
An example API is
int ncmpi_mput_vara_all(int ncid, /* dataset ID */ int nvars, /* number of variables */ int varids[], /* [nvars] list of variable IDs */ MPI_Offset* const starts[], /* [nvars][ndims] list of start offsets */ MPI_Offset* const counts[], /* [nvars][ndims] list of access counts */ void* const bufs[], /* [nvars] list of buffer pointers */ MPI_Offset bufcounts[], /* [nvars] list of buffer counts */ MPI_Datatype datatypes[]); /* [nvars] MPI derived datatypes describing bufs */
Under Development
- Subfiling -- We propose a subfiling scheme that divides a large multi-dimensional global array into smaller subarrays, each saved in a separate netCDF file, named subfile. The subfiling scheme can decrease the number of processes sharing a file, so it can reduce the file access contention, an overhead the file system pays to maintain data consistency.
Software Download
- The latest stable version parallel-netcdf-1.3.1.tar.gz released on September 24, 2012.
- SVN repository for the source codes under development. Users are welcomed to try the new features under testing. The SVN URL is https://svn.mcs.anl.gov/repos/parallel-netcdf/trunk
User Discuss Mailing List
- Participate the discussion and receive announcement by subscribing the mailing list.
- Mailing list archive.
Tutorial
- A tutorial with six use cases of popular parallel I/O strategies: 1) I/O from the master process; 2) one file per process; 3) parallel I/O on a shared file; 4) using non-contiguous I/O buffer; 5) using non-blocking I/O; and 6) using bufferred APIs.
Interoperability with netCDF-4
- Starting from version 4.2.1.1, netCDF-4 program can performance parallel I/O on the classical CDF-1 and CDF-2 files through PnetCDF. This is done by passing file create mode NC_PNETCDF to nc_create_par(), for instance,
nc_create_par(filename, NC_PNETCDF, MPI_COMM_WORLD, info, &ncid); - nc_pnc_put.c is an example program that makes a call to nc_put_vara_int() to write subarrays to a 2D integer array in parallel. In this example, the data partition pattern among processes is in a block fashion along both X and Y dimensions.
Documents
- PnetCDF C Interface Guide (still in progress ... This user guide is developed based on the NetCDF C interface Guide.)
Related Links
- Unidata's netCDF
- Message Passing Interface Standard
- Parallel netCDF project web page maintained at Argonne National Laboratory, including software download, users documents, etc.
- Extreme Linux: Parallel netCDF - an article from Linux Magazine
- Philippe Wauteleta and Pierre Kestener. Parallel IO Performance and Scalability Study on the PRACE CURIE Supercomputer, white paper at Partnership For Advanced Computing in Europe (PRACE), Sepetember, 2012. This report compares the performance of PnetCDF, HDF5, and MPI-IO on CURIE supercomputer with Lustre parallel file system using IOR and RAMSES.
Publications:
- Rob Latham, Chris Daley, Wei-keng Liao, Kui Gao, Rob Ross, Anshu Dubey, and Alok Choudhary. A Case Study for Scientific I/O: Improving the FLASH Astrophysics Code. Computer and Scientific Discovery, 5, March 2012.
- Kui Gao, Chen Jin, Alok Choudhary, and Wei-keng Liao. Supporting Computational Data Model Representation with High-performance I/O in Parallel netCDF. In the EEE International Conference on High Performance Computing, December 2011.
- Kui Gao, Wei-keng Liao, Arifa Nisar, Alok Choudhary, Robert Ross, and Robert Latham. Using Subfiling to Improve Programming Flexibility and Performance of Parallel Shared-file I/O. In the Proceedings of the International Conference on Parallel Processing, Vienna, Austria, September 2009.
- Kui Gao, Wei-keng Liao, Alok Choudhary, Robert Ross, and Robert Latham. Combining I/O Operations for Multiple Array Variables in Parallel NetCDF. In the Proceedings of the Workshop on Interfaces and Architectures for Scientific Data Storage, held in conjunction with the IEEE Cluster Conference, New Orleans, Louisiana, September 2009.
- Jianwei Li, Wei-keng Liao, Alok Choudhary, Robert Ross, Rajeev Thakur, William Gropp, Rob Latham, Andrew Siegel, Brad Gallagher, and Michael Zingale. Parallel netCDF: A Scientific High-Performance I/O Interface. In the Proceedings of Supercomputing Conference, November, 2003.
Users:
- Data Services for the Global Cloud Resolving Model (GCRM)
using parallel netCDF, Fortran 90, C++
Organization: Pacific Northwest National Laboratory
People: Karen Schuchardt
Reference: B. Palmer, A. Koontz, K. Schuchardt, R. Heikes, and D. Randall. Efficient Data IO for a Parallel Global Cloud Resolving Model Environ. Model. Softw. 26, 12 (December 2011), 1725-1735. DOI=10.1016/j.envsoft.2011.08.007. - NCAR Community Atmosphere Model (CAM)
using parallel netCDF, ZioLib
Platforms: IBM SP3, SP4, SP5, BlueGene/L, Cray X1E
File systems: GPFS, PVFS2, NFS
Organization: Department of Atmospheric Sciences, National Taiwan University
People: Yu-heng Tseng (yhtseng at as.ntu.edu.tw)
Reference: Tseng, Y. H. and Ding, C.H.Q. Efficient Parallel I/O in Community Atmosphere Model (CAM), International Journal of High Performance Computing Applications, 22, 206-218, 2008. - Astrophysical Thermonuclear Flashes (FLASH)
using parallel netCDF, HDF5, C
Platforms: IBM SP, Linux Clusters
Organization: ASCI Flash Center, University of Chicago
People: Brad Gallagher, Katie Antypas
Reference: R. Latham, C. Daley, W. Liao, K. Gao, R. Ross, A. Dubey and A. Choudhary. A Case Study for Scientific I/O: Improving the FLASH Astrophysics Code. In the Computational Science and Discovery, vol. 5, 2012. DOI. 10.1088/1749-4699/5/1/015001 - ASPECT, parallel VTK
using parallel netCDF, C
Platforms: Linux Clusters, Cray X
Organization: ORNL
People: Nagiza Samatova - Atmospheric Chemical Transport Model (ACTM)
using parallel netCDF, FORTRAN
Organization: Center for Applied Scientific Computing, LLNL
People: John R. Tannahill - PRogram for Integrated Earth System Modeling (PRISM) Support Initiative
supports netCDF and parallel netCDF within the IO library as part of the OASIS4 coupler
using netCDF, parallel netCDF (FORTRAN APIs)
Platforms: NEC SX, Linux Cluster, SGI and others
Organization: C&C Research Laboratories, NEC Europe Ltd.
Contacts for pnetcdf in PRISM: Reiner Vogelsang and Rene Redler
Contacts for pnetcdf users on NEC SX: Joachim Worringen and Rene Redler - Weather Research and Forecast
(WRF) modeling system software
using parallel netCDF, FORTRAN
Organization: National Center for Atmospheric Research (NCAR)
People: John Michalakes - WRF-ROMS (Regional Ocean Model System) I/O Module
using parallel netCDF, FORTRAN
Organization: Scientific Data Technologies Group, NCSA
People: MuQun Yang, John Blondin - Portable, Extensible Toolkit for Scientific Computation
(PETSc)
Organization: ANL - The Earth System Modeling Framework (ESMF)
platform: IBM Blue Gene / L
Organization: Scientific Computing Division
National Center for Atmospheric Research
Boulder, CO 80305
People: Nancy Collins, James P. Edwards - Community Earth System Model
(CESM)
Organization: UCAR
The program uses PIO library for parallel I/O operations, which includes PnetCDF method.
Acknowledgements:
We are grateful to the following people who provide valuable
comments/discussions to improve our implementation.
Yu-Heng Tseng (LBNL)
Reiner Vogelsang (Silicon Graphics, Germany),
Jon Rhoades (Information Systems & Technology ENSCO, Inc.),
Kilburn Building (University Of Manchester),
Foucar, James G (Sandia National Lab.),
Drake, Richard R (Sandia National Lab.),
Eileen Corelli (Senior Scientist, ENSCO Inc.),
Roger Ting,
Hao Yu,
Raimondo Giammanco,
John R. Tannahill (Lawrence Livermore Nattional. Lab.),
Tyce Mclarty (Lawrence Livermore Nattional. Lab.),
Peter Schmitt, Mike Dvorak (LCRC team, MCS ANL)



