Sponsor:
PnetCDF development was sponsored by the Scientific Data Management Center (SDM) under the DOE program of Scientific Discovery through Advanced Computing (SciDAC). It was also supported in part by National Science Foundation under the SDCI HPC program award numbers OCI-0724599 and HECURA program award numbers CCF-0938000.Project Team Members:
Northwestern University
- Wei-keng Liao
- Alok Choudhary
- Seung Woo Son(formerly a postdoc, now an Assistant Professor at UMass Lowell)
- Kui Gao (formerly postdoc, now Dassault Systèmes Simulia Corp.
- Jianwei Li (since graduated, now Bloomberg L.P.)
Argonne National Laboratory
- Rob Latham
- Rob Ross
- Rajeev Thakur
- William Gropp(now UIUC)
Tutorial on Subfiling in PnetCDF
Overview
Subfiling is a mechanism to partition the NetCDF file into multiple partitioned files (subfiles) internally, making the NetCDF data appear as a single file to users. In order to use the subfiling feature, users need to provide their intention of using it through the hints. Once users specify this hint, all variables defined in your programs will be partitioned and stored into subfiles. Figure below illustrates a high-level view of the subfiling mechanism.
Building and running with subfiling
The subfiling is disabled by default. To enable it, the PnetCDF source code needs to be configured explicitly using "--enable-subfiling" during configuration. Once it is enabled, users should convey their intention of using subfiling through MPI hints:... MPI_Info info; MPI_Info_create(&info); MPI_Info_set(info, "nc_num_subfiles", "2"); ncmpi_create(MPI_COMM_WORLD, filename, NC_CLOBBER|NC_64BIT_DATA, info, &ncid); ...
The example above will create two subfiles in addition to the original file that user specified to create. During writes, all subfile related information is stored in the master file, so that reading from subfiles can be done transparently. In other words, there will be no code change in reading cases.
Note that the programs built with subfiling should run in parallel execution. This applies to both writes and read cases. Note also that users cannot specify the number of subfiles higher than the number of MPI ranks because during subfiling the original communicator is partitioned according to the number of subfiles. If the number of subfiles is greater than the available MPI ranks, the program will create normal file without partitioning.
File layouts with and without subfiling
This section describes the file layout with and without subfiling to help understand the mechanism behind subfiling module in PnetCDF. Note that regardless of whether subfiling is enabled or not, all files generated are in the NetCDF file format. If no subfiling is used, all variables are stored in the original file specified by the user. For example, the .nc file, t1.4.1.0.nc, generated by running test/subfile/test-subfile.c using 2 ranks (mpiexec -n 2 ./test_subfile -f t1 -s 2 -l 4) looks like.
netcdf t1.4.1.0 { // file format: CDF-5 (big variables) dimensions: dim0_0 = 8 ; dim0_1 = 4 ; dim0_2 = 4 ; variables: int var0_0(dim0_0, dim0_1, dim0_2) ; data: var0_0 = ...
If the number of subfiles is set to N, the same program will generate N+1 files: 1 original (master) and N subfiles. For example, the above test case will generate one master (t1.4.1.0.nc) and two subfiles (t1.4.1.0.nc.subfile_0 and t1.4.1.0.subfile_1). Since the file is partitioned, now the master file does not contain any datasets but includes the information associated with subfiles. That information is stored as attributes in the master file. The nc_num_subfiles is a metadata that specify the number of subfiles associated with the original file. The subfile-range is on the other hand for specifying the data range stored in each subfile.
Using the example above, the subfiling-related metadata of t1.4.1.0.nc is inserted as attributes in both global and variable-specific:
netcdf t1.4.1.0 { // file format: CDF-5 (big variables) dimensions: dim0_0 = 8 ; dim0_1 = 4 ; dim0_2 = 4 ; variables: int var0_0 ; var0_0:_PnetCDF_SubFiling.par_dim_index = 0 ; var0_0:_PnetCDF_SubFiling.ndims_org = 3 ; var0_0:_PnetCDF_SubFiling.num_subfiles = 2 ; var0_0:_PnetCDF_SubFiling.range(dim0_0).subfile.0 = 0, 3 ; var0_0:_PnetCDF_SubFiling.range(dim0_1).subfile.0 = 0, 3 ; var0_0:_PnetCDF_SubFiling.range(dim0_2).subfile.0 = 0, 3 ; var0_0:_PnetCDF_SubFiling.range(dim0_0).subfile.1 = 4, 7 ; var0_0:_PnetCDF_SubFiling.range(dim0_1).subfile.1 = 0, 3 ; var0_0:_PnetCDF_SubFiling.range(dim0_2).subfile.1 = 0, 3 ; // global attributes: :num_subfiles = 2 ; data: var0_0 = 0 ;
Note that, since the file has been subfiled, the master file no longer store the data as indicated by var0_0 = 0.
Each subfile contains its dataset as follows:
netcdf t1.4.1.0.nc.subfile_0 { // file format: CDF-5 (big variables) dimensions: dim0_0.var0_0 = 4 ; dim0_1.var0_0 = 4 ; dim0_2.var0_0 = 4 ; variables: int var0_0(dim0_0.var0_0, dim0_1.var0_0, dim0_2.var0_0) ; var0_0:_PnetCDF_SubFiling.range(dim0_0).subfile.0 = 0, 3 ; var0_0:_PnetCDF_SubFiling.range(dim0_1).subfile.0 = 0, 3 ; var0_0:_PnetCDF_SubFiling.range(dim0_2).subfile.0 = 0, 3 ; var0_0:_PnetCDF_SubFiling.subfile_index = 0 ; data: var0_0 = ...
netcdf t1.4.1.0.nc.subfile_1 { // file format: CDF-5 (big variables) dimensions: dim0_0.var0_0 = 4 ; dim0_1.var0_0 = 4 ; dim0_2.var0_0 = 4 ; variables: int var0_0(dim0_0.var0_0, dim0_1.var0_0, dim0_2.var0_0) ; var0_0:_PnetCDF_SubFiling.range(dim0_0).subfile.1 = 4, 7 ; var0_0:_PnetCDF_SubFiling.range(dim0_1).subfile.1 = 0, 3 ; var0_0:_PnetCDF_SubFiling.range(dim0_2).subfile.1 = 0, 3 ; var0_0:_PnetCDF_SubFiling.subfile_index = 1 ; data: var0_0 = ...
Current limitations
- the number of subfiles should be equal or less than the number of MPI processes.
Future work
- user-level APIs to control variable-specific subfiling.
- remove the run-time condition where number of subfiles <= number of MPI processes.
References
- Seung Woo Son, Saba Sehrish, Wei-keng. Liao, Ron Oldfield, and Alok Choudhary, Dynamic File Striping and Data Layout Transformation on Parallel System with Fluctuating I/O Workload, In IASDS 2013 (held in conjunction with IEEE Cluster'13)
- Kui Gao, Wei-keng Liao, Arifa Nisar, Alok Choudhary, Robert Ross, and Robert Latham. Using Subfiling to Improve Programming Flexibility and Performance of Parallel Shared-file I/O. In the Proceedings of the International Conference on Parallel Processing, Vienna, Austria, September 2009.