Should IO formats be limited to netcdf and pnetcdf? #2292

amametjanov · 2018-04-18T23:26:38Z

Support for netcdf4p file format varies across systems: see issues #1970 and #2048. Rather than loading netcdf + hdf5-parallel modules, which can then lead to reads/writes in netcdf4p format, proposed combination is netcdf (without hdf5) and pnetcdf. This could simplify software environment issues.

PeterCaldwell · 2018-04-19T01:00:20Z

I think "without hdf5" means "don't support netcdf4", right? I find it odd that E3SM would only support versions of netcdf which are a decade or more old. Aren't we supposed to be cutting-edge?

rljacob · 2018-04-19T04:10:04Z

Its still netcdf4. Its not netcdf4 with parallelism enabled via hdf5. That's never been shown to work well.

mfdeakin-sandia · 2018-04-19T17:01:18Z

I'm also confused; my understanding was that netcdf4 is actually an hdf5 format (looking at the header bytes of a netcdf4 file confirmed this, IIRC, and also some of the features of netcdf4 are from hdf5). So it would still require a version of hdf5 to read and write, though not necessarily one with parallel support.

amametjanov · 2018-04-19T22:48:08Z

I am not sure of the benefits provided by the hierarchical data storage format of HDF5. Parallelism provided by netcdf4p appears to be slower than that of pnetcdf. I am not sure if we can take advantage of compression provided by netcdf4c. Interoperability and getting the model to run reasonably well out-of-box with minimum set of libraries appears to outweigh the benefits of cutting-edge format.

mfdeakin-sandia · 2018-04-19T23:06:22Z

Sorry, pressed the wrong button

sarats · 2018-04-20T15:48:48Z

As I understand, HDF5 is the file storage layer underneath NetCDF-4. NetCDF-4 presumes certain conventions on metadata stored in underlying HDF5 file.

https://www.unidata.ucar.edu/software/netcdf/docs/interoperability_hdf5.html

The HDF5 Files produced by netCDF-4 are perfectly respectable HDF5 files, and can be read by any HDF5 application.
...
Additionally, netCDF stores some extra information for dimensions without dimension scale information. (That is, a dimension without an associated coordinate variable). So HDF5 users should not write data to a netCDF-4 file which extends any unlimited dimension.

sarats · 2018-04-20T16:08:35Z

To my knowledge, which one among pnetCDF and netCDF-4/HDF5-parallel yields better parallel performance is an open question depending on the architecture/tuning options (collective vs. independent mode, file system etc.).

@jayeshkrishna I presume pnetCDF has been better in our empirical runs so far. Have you ever encountered a scenario where HDF5-parallel was better? That I/O benchmark we were talking about can clear the picture ;). Including Adios, the picture gets further skewed.
Update: I see this topic was exhaustively discussed on the linked threads and robustness issues were noted with netcdf4p.

PeterCaldwell · 2018-04-20T16:09:29Z

@amametjanov - your argument that we want to minimize dependencies, keep things simple, and run as fast as possible resonates with me. I'm still uncomfortable with this decision because

I think people outside E3SM will laugh at us for not supporting libraries less than a decade old. Your argument seems analogous to me to insisting on writing all our code in F77 "because it is simpler and faster". Is that really where we want to go?
Not being able to read/write netcdf4 is somewhere between mildly and majorly problematic for domain scientists, who will be surprised and annoyed (like I was) that the inputdata files we write don't work. It would also be nice to be able to use netcdf4 features (like compression) for storing output.

I'm curious what @czender 's take on this is.

mt5555 · 2018-04-21T23:15:50Z

Could we support reading netcdf4 files if we expand the PIO interface to allow us to specify which library to use based on filename? IIRC, the problem is that we want to use the pnetcdf library for writing and most read operations, but pnetcdf cant read netcdf4/hdf5 files.

jayeshkrishna · 2018-04-23T14:59:07Z

Sorry for being late to this discussion. We can continue supporting netcdf4 and netcdf4p, but maybe we can limit the number of machines we support these features on?
The issue with netcdf4p (parallel I/O using netcdf) is that the stability depends on the version compatibility of netcdf and hdf5 libraries (there are no documents AFAIK that enumerate these compatibility scenarios). We have observed that with some version combinations of netcdf and hdf5 libraries some PIO tests hang (while others succeed). So supporting netcdf4p on all machines will take up resources (installing/debugging etc).
I have not encountered a scenario yet where netcdf4p performs better than pnetcdf. All of our tests indicate that pnetcdf outperforms netcdf4p+hdf5 for E3SM I/O. However like @mt5555 noted pnetcdf currently cannot directly read NetCDF4 files.
Another feature that has been introduced in NetCDF 4.1+ is to use PnetCDF instead of HDF5 for parallel I/O (but won't help with reading data already in NetCDF4 format, written out using NetCDF+HDF5). Also we haven't tested this setup extensively to know all the possible issues with it.

@jedwards4b

Leverage env command options to assist parsing Newline was not a strong-enough split character for parsing output of 'env' since newlines can occur in environment variables, especially functions. Incorrect handling was causing all machines using the 'soft' environment manager not to work. This change causes a null character to be placed between env variables, which should be a much more reliable way to split/parse this output. Test suite: ./scripts_regression_tests.py K_TestCimeCase.test_env_loading Test baseline: Test namelist changes: Test status: bit for bit Fixes [CIME Github issue #] User interface changes?: N Update gh-pages html (Y/N)?: N Code review: @jedwards4b

rljacob · 2018-07-24T17:11:08Z

Decision on this issue is here: https://acme-climate.atlassian.net/wiki/spaces/EIDMG/pages/769130507/Picking+a+netcdf+type+for+all+input+files

wkliao · 2018-07-30T17:50:05Z

Too bad the URL is not open to public.
As a PnetCDF developer, I am interested in learning the decision :(

rljacob · 2018-07-30T18:12:03Z

That is better called the "internal discussion" page. When we reach a decision, we'll update this issue.

rljacob · 2019-07-17T15:00:45Z

We decided that all input files will be in "netcdf3 classic" format.

…-srun-location EAMxx: Fix cmake line for srun on Chrysalis.

amametjanov added the question label Apr 18, 2018

mfdeakin-sandia closed this as completed Apr 19, 2018

mfdeakin-sandia reopened this Apr 19, 2018

rljacob closed this as completed Jul 17, 2019

rljacob mentioned this issue Jul 19, 2019

Netcdf file types at NERSC #1970

Closed

AaronDonahue pushed a commit that referenced this issue May 9, 2023

Merge pull request #2292 from E3SM-Project/ambrad/eamxx/fix-chrysalis…

48296c6

…-srun-location EAMxx: Fix cmake line for srun on Chrysalis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should IO formats be limited to netcdf and pnetcdf? #2292

Should IO formats be limited to netcdf and pnetcdf? #2292

amametjanov commented Apr 18, 2018

PeterCaldwell commented Apr 19, 2018

rljacob commented Apr 19, 2018

mfdeakin-sandia commented Apr 19, 2018

amametjanov commented Apr 19, 2018

mfdeakin-sandia commented Apr 19, 2018

sarats commented Apr 20, 2018

sarats commented Apr 20, 2018 •

edited

Loading

PeterCaldwell commented Apr 20, 2018

mt5555 commented Apr 21, 2018

jayeshkrishna commented Apr 23, 2018 •

edited

Loading

rljacob commented Jul 24, 2018

wkliao commented Jul 30, 2018

rljacob commented Jul 30, 2018

rljacob commented Jul 17, 2019

Should IO formats be limited to netcdf and pnetcdf? #2292

Should IO formats be limited to netcdf and pnetcdf? #2292

Comments

amametjanov commented Apr 18, 2018

PeterCaldwell commented Apr 19, 2018

rljacob commented Apr 19, 2018

mfdeakin-sandia commented Apr 19, 2018

amametjanov commented Apr 19, 2018

mfdeakin-sandia commented Apr 19, 2018

sarats commented Apr 20, 2018

sarats commented Apr 20, 2018 • edited Loading

PeterCaldwell commented Apr 20, 2018

mt5555 commented Apr 21, 2018

jayeshkrishna commented Apr 23, 2018 • edited Loading

rljacob commented Jul 24, 2018

wkliao commented Jul 30, 2018

rljacob commented Jul 30, 2018

rljacob commented Jul 17, 2019

sarats commented Apr 20, 2018 •

edited

Loading

jayeshkrishna commented Apr 23, 2018 •

edited

Loading