-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should IO formats be limited to netcdf and pnetcdf? #2292
Comments
I think "without hdf5" means "don't support netcdf4", right? I find it odd that E3SM would only support versions of netcdf which are a decade or more old. Aren't we supposed to be cutting-edge? |
Its still netcdf4. Its not netcdf4 with parallelism enabled via hdf5. That's never been shown to work well. |
I'm also confused; my understanding was that netcdf4 is actually an hdf5 format (looking at the header bytes of a netcdf4 file confirmed this, IIRC, and also some of the features of netcdf4 are from hdf5). So it would still require a version of hdf5 to read and write, though not necessarily one with parallel support. |
I am not sure of the benefits provided by the hierarchical data storage format of HDF5. Parallelism provided by netcdf4p appears to be slower than that of pnetcdf. I am not sure if we can take advantage of compression provided by netcdf4c. Interoperability and getting the model to run reasonably well out-of-box with minimum set of libraries appears to outweigh the benefits of cutting-edge format. |
Sorry, pressed the wrong button |
As I understand, HDF5 is the file storage layer underneath NetCDF-4. NetCDF-4 presumes certain conventions on metadata stored in underlying HDF5 file. https://www.unidata.ucar.edu/software/netcdf/docs/interoperability_hdf5.html
|
To my knowledge, which one among pnetCDF and netCDF-4/HDF5-parallel yields better parallel performance is an open question depending on the architecture/tuning options (collective vs. independent mode, file system etc.). @jayeshkrishna I presume pnetCDF has been better in our empirical runs so far. Have you ever encountered a scenario where HDF5-parallel was better? That I/O benchmark we were talking about can clear the picture ;). Including Adios, the picture gets further skewed. |
@amametjanov - your argument that we want to minimize dependencies, keep things simple, and run as fast as possible resonates with me. I'm still uncomfortable with this decision because
I'm curious what @czender 's take on this is. |
Could we support reading netcdf4 files if we expand the PIO interface to allow us to specify which library to use based on filename? IIRC, the problem is that we want to use the pnetcdf library for writing and most read operations, but pnetcdf cant read netcdf4/hdf5 files. |
Sorry for being late to this discussion. We can continue supporting netcdf4 and netcdf4p, but maybe we can limit the number of machines we support these features on? |
Leverage env command options to assist parsing Newline was not a strong-enough split character for parsing output of 'env' since newlines can occur in environment variables, especially functions. Incorrect handling was causing all machines using the 'soft' environment manager not to work. This change causes a null character to be placed between env variables, which should be a much more reliable way to split/parse this output. Test suite: ./scripts_regression_tests.py K_TestCimeCase.test_env_loading Test baseline: Test namelist changes: Test status: bit for bit Fixes [CIME Github issue #] User interface changes?: N Update gh-pages html (Y/N)?: N Code review: @jedwards4b
Decision on this issue is here: https://acme-climate.atlassian.net/wiki/spaces/EIDMG/pages/769130507/Picking+a+netcdf+type+for+all+input+files |
Too bad the URL is not open to public. |
That is better called the "internal discussion" page. When we reach a decision, we'll update this issue. |
We decided that all input files will be in "netcdf3 classic" format. |
…-srun-location EAMxx: Fix cmake line for srun on Chrysalis.
Support for netcdf4p file format varies across systems: see issues #1970 and #2048. Rather than loading netcdf + hdf5-parallel modules, which can then lead to reads/writes in netcdf4p format, proposed combination is netcdf (without hdf5) and pnetcdf. This could simplify software environment issues.
The text was updated successfully, but these errors were encountered: