Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hdf5 1.14.3 errors when compiled with FPE trapping flag #1452

Closed
rem1776 opened this issue Jan 31, 2024 · 5 comments
Closed

hdf5 1.14.3 errors when compiled with FPE trapping flag #1452

rem1776 opened this issue Jan 31, 2024 · 5 comments

Comments

@rem1776
Copy link
Contributor

rem1776 commented Jan 31, 2024

This first came up in the coupler's null model build (see below) but applies to FMS in general.

When using the latest hdf5 (1.14.3) FPE's occur during netcdf open calls if you compile with the floating point exception trapping flag (ie. -ftrapuv which is in our debug flags for mkmf). This is due to changes in hdf5 and is specific to version 1.14.3. There is a fix in hdf5's dev branch and the next scheduled release is for end of march.

**old issue**
While updating the CI container, I found that the null model test was failing with a runtime error when compiled with the latest hdf5 version (1.14.3) and netcdf version (4.9.2) with gcc 13.

I reproduced the error on the AMD dev box with both gcc 13 and the latest oneapi (2024.0), so it doesn't seem compiler-specific.

It happens when trying to open the grid_spec.nc as part of fms_init, using open_file from fms2_io.

NOTE: MPP_DOMAINS_SET_STACK_SIZE: stack size set to    32768.
NOTE: MPP_DOMAINS_SET_STACK_SIZE: stack size set to   200000.
forrtl: error (65): floating invalid
Image              PC                Routine            Line        Source             
libpthread-2.28.s  000015112881CCF0  Unknown               Unknown  Unknown
libhdf5.so.310.3.  000015112795138C  H5T__init_native_     Unknown  Unknown
libhdf5.so.310.3.  000015112787E8F6  H5T_init              Unknown  Unknown
libhdf5.so.310.3.  000015112796F4C9  H5VL_init_phase2      Unknown  Unknown
libhdf5.so.310.3.  000015112766246B  H5_init_library       Unknown  Unknown
libhdf5.so.310.3.  00001511276FFF4C  H5Eset_auto2          Unknown  Unknown
libnetcdf.so.19.2  000015112B15622C  nc4_hdf5_initiali     Unknown  Unknown
libnetcdf.so.19.2  000015112B15EE8E  NC_HDF5_initializ     Unknown  Unknown
libnetcdf.so.19.2  000015112B0CBC36  nc_initialize         Unknown  Unknown
libnetcdf.so.19.2  000015112B0CEDD6  NC_open               Unknown  Unknown
libnetcdf.so.19.2  000015112B0CEF58  nc__open              Unknown  Unknown
libnetcdff.so.7.2  000015112B4C85C1  nf__open_             Unknown  Unknown
libnetcdff.so.7.2  000015112B524CAA  netcdf_mp_nf90_op     Unknown  Unknown
coupler_full_test  00000000008525D7  netcdf_io_mod_mp_         647  netcdf_io.F90
coupler_full_test  0000000000937B08  netcdf_io_mod_mp_        2033  netcdf_io.F90
coupler_full_test  0000000002091D9A  grid2_mod_mp_open         193  grid2.F90
coupler_full_test  0000000002090852  grid2_mod_mp_grid         153  grid2.F90
coupler_full_test  00000000007097CC  fms_mod_mp_fms_in         457  fms.F90
coupler_full_test  000000000041169A  MAIN__                    586  coupler_main.F90
coupler_full_test  0000000000410D5D  Unknown               Unknown  Unknown
libc-2.28.so       000015112847FD85  __libc_start_main     Unknown  Unknown
coupler_full_test  0000000000410C7E  Unknown               Unknown  Unknown

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 3133016 RUNNING AT lscamd50-d.gfdl.noaa.gov
=   KILLED BY SIGNAL: 6 (Aborted)
===================================================================================
@bensonr
Copy link
Contributor

bensonr commented Jan 31, 2024

@rem1776 - do we have a unit test we can use to target this particular file directly to see if there is some attribute or data element in the file that it is complaining about?

@rem1776
Copy link
Contributor Author

rem1776 commented Jan 31, 2024

@bensonr I know the test test_fms/mosaic2/test_grid2.F90 tests this module specifically. It creates a grid_spec.nc as well as the other input files needed and read them in as part of the test. There's a few other tests that also do the same in fms2_io, data_overide and the coupler. They all read in and check specific files though, so I'm not sure how useful that is.

So far I tried a very simple test, using a program with just fms_init and fms_end calls and then ran with the coupler null test's input files. It was able read them in without issue.

@rem1776
Copy link
Contributor Author

rem1776 commented Feb 1, 2024

Looks like the problem here is the hdf5 update, after talking to Marshall he referred me to this issue: HDFGroup/hdf5#3831

It's specific to hdf5 1.14.3 and should be fixed in the next release, scheduled for the end of march.

I'm gonna transfer this and edit the original message since this is more of a FMS problem, just happened to come up here.

@rem1776 rem1776 transferred this issue from NOAA-GFDL/FMScoupler Feb 1, 2024
@rem1776 rem1776 changed the title newest netcdf/hdf5 causing failure when reading in grid_spec.nc hdf5 1.14.3 errors when compiled with FPE trapping flag Feb 1, 2024
@bensonr
Copy link
Contributor

bensonr commented Feb 1, 2024

once the next version of hdf5 is released, we'll update and test

@rem1776
Copy link
Contributor Author

rem1776 commented Oct 9, 2024

Closed since #1455 added a check in configure for this. This bug is also fixed for newer versions of hdf5 (tested with 1.14.4-3)

@rem1776 rem1776 closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants