Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hdf5 1.12.1 atomicity test fails with 4.1.5rc2 #11240

Open
opoplawski opened this issue Dec 22, 2022 · 7 comments
Open

hdf5 1.12.1 atomicity test fails with 4.1.5rc2 #11240

opoplawski opened this issue Dec 22, 2022 · 7 comments

Comments

@opoplawski
Copy link
Contributor

Background information

I'm testing updating openmpi in Fedora Rawhide here: https://copr.fedorainfracloud.org/coprs/orion/openmpi-4.1.5/build/5155430/

With just the change from openmpi 4.1.4 to 4.1.5rc2 a test is failing only on ppc64le.

What version of Open MPI are you using?

4.1.5rc2

Please describe the system on which you are running

  • Operating system/version: Fedora Rawhide
  • Computer hardware: ppc64le
  • Network type: loopback only

Details of the problem

hdf5 test is failing:

Test log for testphdf5 
============================
MPI-process 4. hostname=6c5db7d9dcac46ecb112b8f01f9f8d92

For help use: /builddir/build/BUILD/hdf5-1.12.1/openmpi/testpar/.libs/testphdf5 -help
Linked with hdf5 version 1.12 release 1
MPI-process 5. hostname=6c5db7d9dcac46ecb112b8f01f9f8d92

For help use: /builddir/build/BUILD/hdf5-1.12.1/openmpi/testpar/.libs/testphdf5 -help
Linked with hdf5 version 1.12 release 1
MPI-process 1. hostname=6c5db7d9dcac46ecb112b8f01f9f8d92

For help use: /builddir/build/BUILD/hdf5-1.12.1/openmpi/testpar/.libs/testphdf5 -help
Linked with hdf5 version 1.12 release 1
MPI-process 2.MPI-process 3. hostname=6c5db7d9dcac46ecb112b8f01f9f8d92

For help use: /builddir/build/BUILD/hdf5-1.12.1/openmpi/testpar/.libs/testphdf5 -help
Linked with hdf5 version 1.12 release 1
 hostname=6c5db7d9dcac46ecb112b8f01f9f8d92

For help use: /builddir/build/BUILD/hdf5-1.12.1/openmpi/testpar/.libs/testphdf5 -help
===================================
PHDF5 TESTS START
===================================
Linked with hdf5 version 1.12 release 1
MPI-process 0. hostname=6c5db7d9dcac46ecb112b8f01f9f8d92

For help use: /builddir/build/BUILD/hdf5-1.12.1/openmpi/testpar/.libs/testphdf5 -help
Linked with hdf5 version 1.12 release 1
Test filenames are:
    ParaTest.h5
Testing  -- fapl_mpio duplicate (mpiodup) 
Test filenames are:
    ParaTest.h5
Testing  -- fapl_mpio duplicate (mpiodup) 
Test filenames are:
    ParaTest.h5
Testing  -- fapl_mpio duplicate (mpiodup) 
*** Hint ***
You can use environment variable HDF5_PARAPREFIX to run parallel test files in a
different directory or to add file type prefix. e.g.,
   HDF5_PARAPREFIX=pfs:/PFS/user/me
   export HDF5_PARAPREFIX
*** End of Hint ***
Test filenames are:
    ParaTest.h5
Testing  -- fapl_mpio duplicate (mpiodup) 
Test filenames are:
    ParaTest.h5
Testing  -- fapl_mpio duplicate (mpiodup) 
Test filenames are:
    ParaTest.h5
Testing  -- fapl_mpio duplicate (mpiodup) 
Testing  -- dataset using split communicators (split) 
Testing  -- dataset using split communicators (split) 
...
Testing  -- dataset atomic updates (atomicity) 
Testing  -- dataset atomic updates (atomicity) 
Testing  -- dataset atomic updates (atomicity) 
Testing  -- dataset atomic updates (atomicity) 
Testing  -- dataset atomic updates (atomicity) 
Testing  -- dataset atomic updates (atomicity) 
Atomicity Test Failed Process 2: read_buf[800] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[801] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[802] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[803] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[804] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[805] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[806] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[807] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[808] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[809] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[810] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[811] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[812] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[813] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[814] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[815] is 5, should be 0
Atomicity Test Failed Process 2: read_buf[816] is 5, should be 0
....

Mentioned to hdf5 here: HDFGroup/hdf5#2196

@edgargabriel
Copy link
Member

@opoplawski thank you for the report. OMPIO in Open MPI 4.1.x does unfortunately not support the atomicity operations. They will be supported starting from the upcoming 5.0.x series.

@opoplawski
Copy link
Contributor Author

So what's the way forward here? Something seems to have changed between 4.1.4 and 4.1.5rc2. Do we need to disable something in hdf5 to not attempt to use atomicity operations?

@edgargabriel
Copy link
Member

I can have a look in the next few days, but I am not aware of any changes in the ompio code between 4.1.4 and 4.1.5 that would (should ?) affect this functionality. But just as a side note, I can only test on x84_64, I do not have access to ppc

@edgargabriel
Copy link
Member

Based on my tests, the atomicity tests fail for 4.1.4 with hdf5 1.12.2 as well, so it is not a regression from 4.1.4 to 4.1.5rc2. I think what has changed is the set of tests executed by hdf5. With hdf 1.12.0, all tests seem to pass, while with 1.12.2 we have these failures. I would suspect that the atomicity tests are new with later hdf5 version (or were not executed by default with 1.12.0).

On the positive side, I can confirm that the tests pass with ompi main (which is for ompio identical to the code in the 5.0 release series).

@opoplawski
Copy link
Contributor Author

So, this is the most recent CI build of hdf5-1.12.1 in Fedora: https://kojipkgs.fedoraproject.org/work/tasks/9723/97629723/build.log:
In it it runs these same atomicity tests:

Testing  -- dataset atomic updates (atomicity) 
Testing  -- dataset atomic updates (atomicity) 
Testing  -- dataset atomic updates (atomicity) 
Testing  -- dataset atomic updates (atomicity) 
Testing  -- dataset atomic updates (atomicity) 
Testing  -- dataset atomic updates (atomicity) 

and succeeds. But it fails as described when I build with openmpi 4.1.5rc4: https://download.copr.fedorainfracloud.org/results/orion/openmpi-4.1.5/fedora-rawhide-ppc64le/05542849-hdf5/builder-live.log.gz

@edgargabriel
Copy link
Member

@opoplawski thank you. Yes, as I mentioned above, the ompio version in the 4.1.x series of Open MPI does not support the atomicity operations (yet), they are only available starting from the 5.0 release.

@opoplawski
Copy link
Contributor Author

Well, something related to it seems to be changing between 4.1.4 and 4.1.5rc4. But if it's not supposed to be supported I guess I'll just ignore it - but it seems like hdf5 is trying to make use of it somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants