-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi-level parallelism, async writes, and other HPC features from PIO library #555
Comments
A question: is e.g. PIOc_read_nc_decomp intended to be called by the user or will it |
The read/write decomp functions read/write a special netCDF file that contains the decomposition information for PIO. So you can create a decomposition, and save it to file, and use it over and over. These functions will be wrapped in functions to make them look more netCDF-like. |
Would this work? |
What I would like to do is come in and brief you and Ward on PIO and how it works. There are some details, but it is fitting into the netCDF API very well. In terms of decompositions, it would not be unusual to have several in use at one time. The user initializes them in an init_decomp() call. The read/write decomp need never be called, they are just convenience functions to allow easy recording and communication of what decomposition was used. (Doesn't matter for the contents of the data, just how the data are distributed over a particular hardware configuration.) PIO introduced two new objects to the netCDF data model:
The iosystemid is pretty easy to hide, as long as there is just one (and that is typical). The decomposition IDs are more plentiful and must be directly controlled by the user. A variable may have a different read and write decomposition (to manage halo effects, for example). |
Well, as you know, you can create an NC_PIO_INFO structure per-open/created-file When you say
|
Unfortunately, the decomps cannot be automatically determined. They are how PIO is able to be tuned to the specific hardware for optimum performance. I will get my base implementation together, and if we can come up with any way to better hide the decomps, I am open to it. |
Sorry, I was not clear. I am not asking to automatically determine the decomps, but rather |
It certainly would be possible to associate a default decomp for each var read and write. But note that the arguments to vara change anyway. Note that distributed arrays only work with a "record" (which is a looser definition of record than you are used to.) So there is no need for the user to specify start/count. An entire record is read/written at one time. The processor only gets/puts the local portion of the global array, based on the decomposition. |
I have taken down my PR with this feature because I now see a better way to implement this, one that will not require taking PIO code into the netCDF library codebase. Instead, it can be treated like pnetcdf, as an optional interface that can be turned on at configure, used with NC_PIO in the mode flag, but does not involve PIO code in netCDF. Just some wrappers, as with pnetcdf. This requires some work on PIO first. When I get that done I will circle around again and put up a PIO PR. |
I think this functionality may best be brought to users via a user-dispatch library, or via HPC NetCDF-C, or through some other mechanism. Interested users should contact me directly. |
OK, this capability has been added to PIO, and it works great! ;-) |
Introduction
This ticket described a plan to include the functions of the PIO library into the netCDF library, so that it may be (optionally) available to all netCDF HPC (High Performance Computing) users. The HPC community is a core component of the Unidata community, and better support of modern supercomputers will allow netCDF to better server these users.
The Parallel I/O library (https://github.com/) as developed at NCAR. It provides advances HPC I/O functionality for netCDF codes, with a netCDF API. The PIO library allows HPC users to make the most of their computational hardware, without waiting on I/O more than is absolutely essential.
The goal of this effort is to make the PIO functionality available to netCDF users, through the NetCDF API.
Main Features
Computational Components and I/O Component
In async mode, it is possible for the user to define several computational components, and one (shared) I/O component.
Each component can have an arbitrary number of processors.
In computational components, netCDF calls actually send data to the I/O component, which buffers data and handles disk I/O.
For example, a machine of 500K processors can assign 1K processors to I/O, 250K processors to an atmospheric model, 100K processors to an ocean model, 10K processors to a space weather model, etc., until all 500K processors are used up. Each of the models can do netCDF I/O as usual, but all actual I/O will be channeled through the I/O component. Data are buffered at all stages to improve performance.
Configuration and Build
The PIO functionality is only included if netCDF is built with --enable-pio. It makes no sense to use PIO on systems without multiple cores, but this is not enforced by the build, and tests will build and pass even on a single core.
The PIO functionality is available through the netCDF internal dispatch system. This allows the use of the netCDF API by various sublibraries (netCDF4, HDF4, pnetCDF, etc.) The PIO functionality is accessed in the same way.
The code for the integrated PIO functionality is in subdirectory libpio.
Testing of PIO
The pio_test directory contains tests for PIO. These tests will only be run of --enable-parallel-tests is specified.
Using PIO
In order to use PIO, the user must include the NC_PIO flag in the mode of nc_create/nc_open.
Additions to the NetCDF Data Model
PIO introduces two new objects to the netCDF data model:
New Functions for the API
Several new functions need to be added to the netCDF API to support PIO functionality. However, it is not necessary to add them to the netCDF dispatch table, since other netCDF sub-libraries will never need to implement these functions.
Initialization
There are two new PIO initialization functions, one for async, one for non-async (The names of these functions will be changed to match the netCDF API). A finalize call must be called at the end of all processing.
Decompositions
Decompositions define how a netCDF variable is distributed across processors on the supercomputer.
For convenience there are also two functions to write/read a netCDF file which records the decomposition information. Use of decomposition files is helpful for debugging, but not generally used in real processing.
Record Reads/Writes on Distributed Arrays
There are read/write functions to read/write a record of a distibuted array. (A record is defined by writing 1 element of the first dimension, and all elements of subsequent dimensions. In a classic netCDF file with an unlimited dimension, this is a record. But with PIO, the first dimension does not have to be unlimited.)
Schedule
I anticipate the first version of this integration, supporting netCDF classic files, will be ready before the end of 2017. If so, it can be included in the 4.5.1 release of netCDF.
Work Details
This work can be followed in detail on the GitHub repo: https://github.com/edhartnett/netcdf-c
The text was updated successfully, but these errors were encountered: