Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should IO formats be limited to netcdf and pnetcdf? #2292

Closed
amametjanov opened this issue Apr 18, 2018 · 14 comments
Closed

Should IO formats be limited to netcdf and pnetcdf? #2292

amametjanov opened this issue Apr 18, 2018 · 14 comments
Labels

Comments

@amametjanov
Copy link
Member

Support for netcdf4p file format varies across systems: see issues #1970 and #2048. Rather than loading netcdf + hdf5-parallel modules, which can then lead to reads/writes in netcdf4p format, proposed combination is netcdf (without hdf5) and pnetcdf. This could simplify software environment issues.

@PeterCaldwell
Copy link
Contributor

I think "without hdf5" means "don't support netcdf4", right? I find it odd that E3SM would only support versions of netcdf which are a decade or more old. Aren't we supposed to be cutting-edge?

@rljacob
Copy link
Member

rljacob commented Apr 19, 2018

Its still netcdf4. Its not netcdf4 with parallelism enabled via hdf5. That's never been shown to work well.

@mfdeakin-sandia
Copy link
Contributor

I'm also confused; my understanding was that netcdf4 is actually an hdf5 format (looking at the header bytes of a netcdf4 file confirmed this, IIRC, and also some of the features of netcdf4 are from hdf5). So it would still require a version of hdf5 to read and write, though not necessarily one with parallel support.

@amametjanov
Copy link
Member Author

I am not sure of the benefits provided by the hierarchical data storage format of HDF5. Parallelism provided by netcdf4p appears to be slower than that of pnetcdf. I am not sure if we can take advantage of compression provided by netcdf4c. Interoperability and getting the model to run reasonably well out-of-box with minimum set of libraries appears to outweigh the benefits of cutting-edge format.

@mfdeakin-sandia
Copy link
Contributor

Sorry, pressed the wrong button

@sarats
Copy link
Member

sarats commented Apr 20, 2018

As I understand, HDF5 is the file storage layer underneath NetCDF-4. NetCDF-4 presumes certain conventions on metadata stored in underlying HDF5 file.

https://www.unidata.ucar.edu/software/netcdf/docs/interoperability_hdf5.html

The HDF5 Files produced by netCDF-4 are perfectly respectable HDF5 files, and can be read by any HDF5 application.
...
Additionally, netCDF stores some extra information for dimensions without dimension scale information. (That is, a dimension without an associated coordinate variable). So HDF5 users should not write data to a netCDF-4 file which extends any unlimited dimension.

@sarats
Copy link
Member

sarats commented Apr 20, 2018

To my knowledge, which one among pnetCDF and netCDF-4/HDF5-parallel yields better parallel performance is an open question depending on the architecture/tuning options (collective vs. independent mode, file system etc.).

@jayeshkrishna I presume pnetCDF has been better in our empirical runs so far. Have you ever encountered a scenario where HDF5-parallel was better? That I/O benchmark we were talking about can clear the picture ;). Including Adios, the picture gets further skewed.
Update: I see this topic was exhaustively discussed on the linked threads and robustness issues were noted with netcdf4p.

@PeterCaldwell
Copy link
Contributor

@amametjanov - your argument that we want to minimize dependencies, keep things simple, and run as fast as possible resonates with me. I'm still uncomfortable with this decision because

  1. I think people outside E3SM will laugh at us for not supporting libraries less than a decade old. Your argument seems analogous to me to insisting on writing all our code in F77 "because it is simpler and faster". Is that really where we want to go?
  2. Not being able to read/write netcdf4 is somewhere between mildly and majorly problematic for domain scientists, who will be surprised and annoyed (like I was) that the inputdata files we write don't work. It would also be nice to be able to use netcdf4 features (like compression) for storing output.

I'm curious what @czender 's take on this is.

@mt5555
Copy link
Contributor

mt5555 commented Apr 21, 2018

Could we support reading netcdf4 files if we expand the PIO interface to allow us to specify which library to use based on filename? IIRC, the problem is that we want to use the pnetcdf library for writing and most read operations, but pnetcdf cant read netcdf4/hdf5 files.

@jayeshkrishna
Copy link
Contributor

jayeshkrishna commented Apr 23, 2018

Sorry for being late to this discussion. We can continue supporting netcdf4 and netcdf4p, but maybe we can limit the number of machines we support these features on?
The issue with netcdf4p (parallel I/O using netcdf) is that the stability depends on the version compatibility of netcdf and hdf5 libraries (there are no documents AFAIK that enumerate these compatibility scenarios). We have observed that with some version combinations of netcdf and hdf5 libraries some PIO tests hang (while others succeed). So supporting netcdf4p on all machines will take up resources (installing/debugging etc).
I have not encountered a scenario yet where netcdf4p performs better than pnetcdf. All of our tests indicate that pnetcdf outperforms netcdf4p+hdf5 for E3SM I/O. However like @mt5555 noted pnetcdf currently cannot directly read NetCDF4 files.
Another feature that has been introduced in NetCDF 4.1+ is to use PnetCDF instead of HDF5 for parallel I/O (but won't help with reading data already in NetCDF4 format, written out using NetCDF+HDF5). Also we haven't tested this setup extensively to know all the possible issues with it.

oksanaguba pushed a commit that referenced this issue May 3, 2018
Leverage env command options to assist parsing
Newline was not a strong-enough split character for parsing
output of 'env' since newlines can occur in environment variables,
especially functions. Incorrect handling was causing all machines using
the 'soft' environment manager not to work.

This change causes a null character to be placed between env variables,
which should be a much more reliable way to split/parse this output.

Test suite: ./scripts_regression_tests.py K_TestCimeCase.test_env_loading
Test baseline:
Test namelist changes:
Test status: bit for bit

Fixes [CIME Github issue #]

User interface changes?: N

Update gh-pages html (Y/N)?: N

Code review: @jedwards4b
@rljacob
Copy link
Member

rljacob commented Jul 24, 2018

@wkliao
Copy link
Member

wkliao commented Jul 30, 2018

Too bad the URL is not open to public.
As a PnetCDF developer, I am interested in learning the decision :(

@rljacob
Copy link
Member

rljacob commented Jul 30, 2018

That is better called the "internal discussion" page. When we reach a decision, we'll update this issue.

@rljacob
Copy link
Member

rljacob commented Jul 17, 2019

We decided that all input files will be in "netcdf3 classic" format.

@rljacob rljacob closed this as completed Jul 17, 2019
AaronDonahue pushed a commit that referenced this issue May 9, 2023
…-srun-location

EAMxx: Fix cmake line for srun on Chrysalis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants