Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

t_mpi test fails with OpenMPI 4.1.5, but passes with MPICH 4.1.2 #3504

Closed
wreckdump opened this issue Sep 4, 2023 · 2 comments · Fixed by #3657
Closed

t_mpi test fails with OpenMPI 4.1.5, but passes with MPICH 4.1.2 #3504

wreckdump opened this issue Sep 4, 2023 · 2 comments · Fixed by #3657
Assignees
Labels
Component - Testing Code in test or testpar directories, GitHub workflows Priority - 2. Medium ⏹ It would be nice to have this in the next release Type - Bug / Bugfix Please report security issues to help@hdfgroup.org instead of creating an issue on GitHub
Milestone

Comments

@wreckdump
Copy link

wreckdump commented Sep 4, 2023

Describe the bug
t_mpi test fails with OpenMPI 4.1.5, but passes with MPICH 4.1.2 (both compiled with AMD AOCC 4.1.0). And this affects the netcdf parallel related functions. For example, when nc_open_par() function from netcdf is called, it hangs without any activity.

Expected behavior
I expect the hdf5 parallel related tests and functions to work properly.

Platform (please complete the following information)

  • HDF5 version : 1.14.2
  • OS and version : Arch Linux 6.4.12-arch1-1
  • Compiler and version : AMD AOCC 4.1.0
  • Build system (e.g. CMake, Autotools) and version : GNU Make 4.4.1, Autoconf 2.71
  • Any configure options you specified : --enable-parallel during configuration stage
  • MPI library and version (parallel HDF5) : OpenMPI 4.1.5 (problematic), MPICH 4.1.2 (functional)
@wreckdump wreckdump changed the title t_mpi test fail with OpenMPI 4.1.5, but works with MPICH 4.1.2 t_mpi test fails with OpenMPI 4.1.5, but passes with MPICH 4.1.2 Sep 4, 2023
@glennsong09 glennsong09 added Priority - 2. Medium ⏹ It would be nice to have this in the next release Component - Testing Code in test or testpar directories, GitHub workflows Type - Bug / Bugfix Please report security issues to help@hdfgroup.org instead of creating an issue on GitHub labels Sep 5, 2023
@derobins derobins added this to the 1.14.3 milestone Oct 9, 2023
@hyoklee
Copy link
Member

hyoklee commented Oct 10, 2023

Hi, @wreckdump !

I can't duplicate the error with our develop branch on GitHub Ubuntu-latest Action:

https://github.com/hyoklee/actions/actions/runs/6472049023/job/17571862095

Would you please check if the same error still occurs with develop branch?

For reference, you can try something similar to my GitHub Action workflow:

https://github.com/hyoklee/actions/blob/6d207a453790cde9163787ed6aa593d631319575/.github/workflows/aocc4.yml

@derobins
Copy link
Member

@hyoklee says it passes tests for him

brtnfld pushed a commit to brtnfld/hdf5 that referenced this issue Oct 16, 2023
jhendersonHDF pushed a commit to jhendersonHDF/hdf5 that referenced this issue Oct 18, 2023
derobins added a commit that referenced this issue Oct 18, 2023
* Address nagfor exceptions stoppage. (#3658)

* added cmake ieee flag for nagfor

* generalized determining the nag compiler

* fixing some misc. NAG warnings

* Simplify. (#3659)


* Address @jhendersonHDF review

* Add expedited testing support to t_filters_parallel (#3665)

* Remove clang warnings (#3656)

* Fixes test failure for gfortran -O2 and -O3, -fdefault-real-16 (#3662)

* added cmake ieee flag for nagfor

* fixes gfortran -O2 and -O3, -fdefault-real-16

* fixed sync

* updated release notes

* Fix link error on clang17/gfortran13/macOS-13 (#3666) (#3671)

* Correct fortran CMake generator expressions (#3670)

* Add AOCC GitHub Action (#3504) (#3657)

* Fix uninitialized subfiling test variable (#3675)

Picked up by gcc 10 on skybridge. Probably spurious, but no harm in
initializing it to a "bad" value.

* Add support for AOCC & Flang w/ the Autotools (#3674)

* Adds a config/clang-fflags options file to support Flang
* Corrects missing "-Wl," from linker options in the libtool wrappers
  when using Flang, the MPI Fortran compiler wrappers, and building
  the shared library. This would often result in unrecognized options
  like -soname.
* Enable -nomp w/ Flang to avoid linking to the OpenMPI library.

CMake can build the parallel, shared library w/ Fortran using AOCC
and Flang, so no changes were needed for that build system.

Fixes GitHub issues #3439, #1588, #366, #280

* Fix a strncpy call to use dest size not src (#3677)

A strncpy call in a path construction call used the size of the src
buffer instead of the dest buffer as the limit n.

This was switched to use the dest size and properly terminate the
string if truncation occurs.

* Remove CANBE_UNUSED() from subfiling VFD (#3678)

This macro was an attempt to quiet warnings about release mode unused
variables that only appear in asserts. It resolves to a void cast, which
doesn't quiet warnings when an assignment has already taken place.

* Suppress MPI_Waitall warnings w/ MPICH (#3680)

MPICH defines MPI_STATUSES_IGNORE (a pointer) to 1, which raises warnings
w/ gcc. This is a known issue that the MPICH devs are not going to fix.

See here:
    pmodels/mpich#5687

This fix suppresses those issues w/ gcc

* Fix a possible NULL pointer dereference in tests (#3676)


The dtypes test could dereference a NULL pointer if a strdup call
failed.

* Fix printf warnings in t_mpi (#3679)

* Fix printf warnings in t_mpi

The type of MPI_Offset varies with implementation. In MPICH, it's long,
which raises warnings when we attempt to use long long format
specifiers. Casting to long long fixes the warnings.

* Fix invalid memory access in S3 comms (#3681)

In the ros3 VFD, passing an empty string parameter to an internal
API call could result in accessing the -1th element of a string.
This would cause failures on big-endian systems like s390x.

This parameter is now checked before writing to the string.

Fixes GitHub #1168

* Add Doxygen for H5Pset_fapl_sec2() (#3685)

*

* switch to using time function instead of date function (#3690)

* Initialize API context MPI types to MPI_BYTE (#3688)

* Add test info output to t_filters_parallel (#3696)

* Suppress format string warnings in subfiling test (#3699)

* Fix unused variable in tselect.c (#3701)

* Fix unused variable warning in H5F_sfile_assert_num (#3700)

* Restore floating-point suffixes in tests (#3698)

A prior commit removed too many F suffixes. This restores the suffixes
for float variables.

* Sync with changes from develop

---------

Co-authored-by: Scot Breitenfeld <brtnfld@hdfgroup.org>
Co-authored-by: H. Joe Lee <hyoklee@hdfgroup.org>
Co-authored-by: Allen Byrne <50328838+byrnHDF@users.noreply.github.com>
Co-authored-by: Dana Robinson <43805+derobins@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - Testing Code in test or testpar directories, GitHub workflows Priority - 2. Medium ⏹ It would be nice to have this in the next release Type - Bug / Bugfix Please report security issues to help@hdfgroup.org instead of creating an issue on GitHub
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants