Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DETECT_F causes an invalid FP operation #3831

Closed
amorison opened this issue Nov 6, 2023 · 4 comments · Fixed by #3837
Closed

DETECT_F causes an invalid FP operation #3831

amorison opened this issue Nov 6, 2023 · 4 comments · Fixed by #3837
Assignees
Labels
Component - C Library Core C library issues (usually in the src directory) Component - Fortran Fortran wrappers Priority - 1. High 🔼 These are important issues that should be resolved in the next release Type - Improvement Improvements that don't add a new feature or functionality
Milestone

Comments

@amorison
Copy link

amorison commented Nov 6, 2023

Describe the bug
Calling h5open_f (Fortran) leads to an invalid floating point exception. This is problematic when catching FPEs, as the application crashes when initializing HDF5.

With the following prgm.f90 program:

program prgm
    use hdf5
    implicit none
    integer :: ierr
    call h5open_f(ierr)
    if (ierr /= 0) then
        error stop "error open"
    endif
    call h5close_f(ierr)
    if (ierr /= 0) then
        error stop "error close"
    endif
    print "(a)", "all good"
end program

We have the following issue:

$ h5pfc -showme
gfortran -I/usr/include -L/usr/lib -lhdf5hl_fortran -lhdf5_hl -lhdf5_fortran -lhdf5 -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -lsz -lz -ldl -lm -Wl,-rpath -Wl,/usr/lib

$ h5pfc prgm.f90 -ffpe-trap=invalid

$ gdb a.out
# ...
(gdb) run
# ...
Program received signal SIGFPE, Arithmetic exception.
0x00007ffff7cd5072 in H5T__init_native_float_types () at /usr/src/debug/hdf5-openmpi/hdf5-1.14.3/src/H5Tinit_float.c:531
531         DETECT_F(long double, LDOUBLE, det);

Expected behavior
No FPE should be raised when initializing HDF5.

Platform (please complete the following information)

  • HDF5 version: 1.14.3
  • OS and version: arch linux, up-to-date at the time of writing
  • Compiler and version: gfortran 13.2.1
  • Build system (e.g. CMake, Autotools) and version: NA
  • Any configure options you specified: NA
  • MPI library and version (parallel HDF5): NA

Additional context
We do not encounter the issue with hdf5 1.12 on other machines.

@derobins
Copy link
Member

derobins commented Nov 6, 2023

It's probably because you are building with -ffpe-trap=invalid. That's incompatible with our floating-point type introspection. We moved some compile-type checks to run-time checks (to make cross-compiling easier) and some of the floating-point checks can raise FPEs that are normally ignored. We'll engineer around this for 1.14.4.

@derobins derobins self-assigned this Nov 6, 2023
@derobins derobins added Merge - To 1.14 Priority - 1. High 🔼 These are important issues that should be resolved in the next release Component - C Library Core C library issues (usually in the src directory) Component - Fortran Fortran wrappers Type - Improvement Improvements that don't add a new feature or functionality labels Nov 6, 2023
@derobins derobins added this to the 1.14.4 milestone Nov 6, 2023
@amorison
Copy link
Author

amorison commented Nov 6, 2023

Thanks for looking into this.

It's probably because you are building with -ffpe-trap=invalid.

Yes, the program above is definitely crashing because I'm explicitly trapping FPEs. That's on purpose: I am working on an application that does scientific computing, and using those options is very useful to track FP issues when debugging. HDF5 crashing with an FPE on initialization renders these options useless for my purposes.

@derobins
Copy link
Member

derobins commented Nov 7, 2023

Thanks for looking into this.

It's probably because you are building with -ffpe-trap=invalid.

Yes, the program above is definitely crashing because I'm explicitly trapping FPEs. That's on purpose: I am working on an application that does scientific computing, and using those options is very useful to track FP issues when debugging. HDF5 crashing with an FPE on initialization renders these options useless for my purposes.

I have a fix in #3837 that disables FE_INVALID exceptions while initializing the floating-point datatypes (restoring the original flags on the way out). We generate NaNs in that code as we munge bits and check to see what's happening. We're probing type behavior and not doing math per se, so I think that's a better solution than making many changes to protect against NaN.

derobins added a commit that referenced this issue Nov 7, 2023
The H5T floating-point datatype initialization code can raise exceptions when handling signaling NaNs. This change disables FE_INVALID exceptions during initialization.

Also removes the -ieee=full change for NAG Fortran as that shouldn't be necessary anymore.

Fixes #3831
jhendersonHDF pushed a commit to jhendersonHDF/hdf5 that referenced this issue Dec 7, 2023
The H5T floating-point datatype initialization code can raise exceptions when handling signaling NaNs. This change disables FE_INVALID exceptions during initialization.

Also removes the -ieee=full change for NAG Fortran as that shouldn't be necessary anymore.

Fixes HDFGroup#3831
lrknox pushed a commit that referenced this issue Dec 7, 2023
* Preserve MPI-I/O file hints when fapl is closed (#3755)

* Fix for issue #3025: Save the MPI info in the file struct so H5Fget_access_plist() can retrieve it from there.

* Add compression tests for subfiling (#3769)

* Fix typo in comment (#3775)

* Fixed a file handle leak in the core VFD (#3779)

When opening a file with the core VFD and a file image, if the file
already exists, the file check would leak the POSIX file handle.

Fixes GitHub issue #635

* Fix a format string warning in the C++ examples (#3776)

* Cancel running GitHub workflows on push to same PR (#3772)

* Cancel running GitHub workflows on push to same PR

* Remove github.sha from workflow concurrency groups

* Print some messages in parallel tests on MPI rank 0 only (#3785)

Avoids overly verbose output from all processes emitting progress, etc. info.

* Avoid attempted use of NULL pointer in parallel compression code (#3786)

The parallel compression test code tests for the case where all MPI ranks have no selection in a dataset when writing to it. Add an early exit to the code to avoid attempting to use a NULL pointer due to there being no work to do.

* Don't install h5tools_test_utils test program on system (#3793)

* Add Doxygen to H5FDsplitter.h (#3794)

* H5FD_CURR_SPLITTER_VFD_CONFIG_VERSION
* H5FD_SPLITTER_PATH_MAX
* H5FD_SPLITTER_MAGIC
* H5FD_splitter_vfd_config_t
* H5Pset_fapl_splitter()
* H5Pget_fapl_splitter()

* Update Doxygen initializers & identifiers in VFDs (#3795)

* Add Doxygen for all H5FD_<VFD> initializers
* Add Doxygen for all H5FD_<VFD>_VALUE values
* Mark H5FD_<vfd>_init() calls private in Doxygen

* Fix memory corruption in 'MPI I/O FAPL preserve' test (#3806)

* Fix usage of h5_clean_files in t_pflush2.c (#3807)

* Fix parallel driver check in h5_fixname_real (#3808)

* Fix a couple usages of MPI_Info_get (#3809)

* Remove H5system.c warning on Windows oneAPI. (#3812)

* Add processing of NVHPC flags in linux-gnulibc1 file (#3804)

* Disable testing as tests are failing the same as in CMake

* Use the current toolchain for examples as default (#3810)

* Fix misc. warnings from GCC when compiling with -fsanitize=undefined (#3787)

* Set NVHPC maximum optimization level to -O1 for now (#3800)

* Set NVHPC maximum optimization level to -O1 for now

Compiling HDF5 with NVHPC 23.5 - 23.9 results in test failures in
4 different test files that need to be resolved. Since those tests
pass with an optimization level of -O1 (and -O0) and it is currently
unclear whether the test failures are due to issues in HDF5 or issues
in the 'nvc' compiler, set the maximum optimization level for NVHPC
to -O1 until the test failures are resolved.

* Disable nvhpc Java testing in CMake and amend known issues

* Re-enable testing of Autotools nvhpc

* Update some doxygen links to local refs (#3814)

* Rework MPI Info FAPL preserve PR to use VFD 'ctl' operations (#3782)

* Removed the use of C wrappers from H5P APIs. (#3824)

* fix seg fault on frontier/cray

* fix seg fault on frontier/cray

* fix seg fault on frontier/cray

* removed the use of h5pclose_c

* removed the use of h5pclose_c

* Fortran Wrappers H5VLnative_addr_to_token_f and H5VLnative_token_to_address_f (#3801)

* Added H5VLnative_addr_to_token_f and H5VLnative_token_to_address_f

* Added H5VLnative_addr_to_token_f and H5VLnative_token_to_address_f tests

* Create test for H5Pget_dxpl_mpio (#3825)

* Create test and add to testphdf5

* Renamed h5fuse.sh to h5fuse (#3834)

* provide an alternative to mapfile for older bash

* Disable FP exceptions in H5T init code (#3837)

The H5T floating-point datatype initialization code can raise exceptions when handling signaling NaNs. This change disables FE_INVALID exceptions during initialization.

Also removes the -ieee=full change for NAG Fortran as that shouldn't be necessary anymore.

Fixes #3831

* Add intel oneapi windows build to CI CMake (#3836)

* Remove printf format warning on Windows oneAPI. (#3838)

* Correct ENV variables (#3841)

* Remove Autotools sed hack (#3848)

configure.ac contains a sed line that cleans up incorrect library
flags which was added to paper over some bugs in earlier versions
of the Autotools. These issues are not a problem with the current
versions of the Autootols.

The sed line causes problems on MacOS, so it has been removed.

Fixes #3843

* Make filter unregister callbacks safe for VOL connectors (#3629)

* Make filter callbacks use top-level API functions

When using VOL connectors, H5I_iterate may not provide
valid object pointers to its callback. This change keeps
existing functionality in H5Zunregister() without using
potentially unsafe pointers.

* Filter callbacks use internal API

* Skip MPI work on non-native VOL

* Add extra space in comments for consistency (#3852)

* Add extra space in comments for consistency

* uncomment tfloatsattrs test

* Update Actions badges to link to relevant workflow (#3850)

* Add CMake long double cross-compile defaults (#3683)

HDF5 performs a couple of checks at build time to see if long double
values can be converted correctly (IBM's Power architecture uses a
special format for long doubles). These checks were performed using
TRY_RUN, which is a problem when cross-compiling.

These checks now use default values appropriate for most non-Power
systems when cross-compiling. The cache values can be pre-set if
necessary, which will preempt both the TRY_RUN and the default.

Affected values:
    H5_LDOUBLE_TO_LONG_SPECIAL      (default no)
    H5_LONG_TO_LDOUBLE_SPECIAL      (default no)
    H5_LDOUBLE_TO_LLONG_ACCURATE    (default yes)
    H5_LLONG_TO_LDOUBLE_CORRECT     (default yes)
    H5_DISABLE_SOME_LDOUBLE_CONV    (default no)

Fixes GitHub #3585

* Updates for building and testing VOL connectors

* Fix issue with HDF5_VOL_ALLOW_EXTERNAL CMake variable

* Initialize parallel testing with MPI_THREAD_MULTIPLE when testing API

* Add CMake variable to allow specifying a VOL connector's package name

* Remove call to MPI_Init in serial API tests

While previously necessary, it now interferes with VOL connectors that
may need to be initialized with MPI_THREAD_MULTIPLE

* Fixes for CI and presets (#3853)

* Change dest for doxygen (#3856)

* Implement selection vector I/O with collective chunk filling (#3826)

* Changes for ECP-344: Implement selection vector I/O with collective chunk filling.
Also fix a bug in H5FD__mpio_write_vector() to account for fixed size optimization
when computing max address.

* Fixes based on PR review comments:
For H5Dchunk.c: fix H5MM_xfree()
For H5FDmpio.c:
1) Revert the fix to H5FD__mpio_write_vector()
2) Apply the patch from Neil on the proper length of s_sizes reported by H5FD__mpio_vector_build_types()

* Put back the logic of dividing up the work among all the mpi ranks similar to the
original H5D__chunk_collective_fill() routine.

* Add a test to verify the fix for the illegal reference problem in H5FD__mpio_write_vector().

* Do not publish compression headers or docs (#3865)

* Fix typo: look -> loop (#3866)

* Moved the README to markdown and expanded its overview of the files, file generation, and other Fortran wrapper development practices as mentioned in the HDF5 architectural document. I added a new figure and included the SVG file and the original xfig file it was generated from. (#3862)

* Add HDF5_DISABLE_TESTS_REGEX option to skip tests (#3859)

* Fix typo in error message for `MPI_Type_dup`. (#3867)

* Complete the `if command line option` sentence. (#3868)

* Fix h5dump segmentation fault when --vfd-value option is used (#3873)

* Updated URL in funding.yml (#3882)

Using new shortened URL, might look better.

* Remove unused variable from unmerged changes

* Update src/H5Tinit_float.c
@mjreno
Copy link

mjreno commented Mar 5, 2024

Perhaps not new information but seeing this on macOS:

ProductName: macOS
ProductVersion: 13.6
Homebrew 4.2.10
gcc: 13.2.0
netcdf-fortran: 4.6.1
netcdf: 4.9.2
hdf5: 1.14.3

Louwrensth added a commit to Louwrensth/easybuild-easyconfigs that referenced this issue Jul 25, 2024
Bump dependency to HDF5-1.14.4.3
Because HDF5-1.14.3 is a broken release: HDFGroup/hdf5#3831
Louwrensth added a commit to Louwrensth/easybuild-easyconfigs that referenced this issue Aug 14, 2024
Louwrensth added a commit to Louwrensth/easybuild-easyconfigs that referenced this issue Aug 14, 2024
Bump dependency to HDF5-1.14.4.3
Because HDF5-1.14.3 is a broken release: HDFGroup/hdf5#3831
Louwrensth added a commit to Louwrensth/easybuild-easyconfigs that referenced this issue Aug 28, 2024
Louwrensth added a commit to Louwrensth/easybuild-easyconfigs that referenced this issue Aug 28, 2024
HDF5 v1.14.3 was broken, this patch is the hotfix that
was published soon after the release. See:

HDFGroup/hdf5#3831
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - C Library Core C library issues (usually in the src directory) Component - Fortran Fortran wrappers Priority - 1. High 🔼 These are important issues that should be resolved in the next release Type - Improvement Improvements that don't add a new feature or functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants