Closed
Description
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v5.0.3
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Using spack 0.22.1
Please describe the system on which you are running
- Operating system/version: Ubuntu 20.04
- Computer hardware: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
Details of the problem
I am using parallel hdf5 to write a 2D distributed array. If I pass a cartesian communicator to hdf5, I sometimes notice that the dataset in the hdf5 file is corrupted when using 3 processes. You can find attached (hdf5_reproducer.tar.gz) a small reproducer in C (< 100 LOC) with a hdf5 file I got running the reproducer. You will also find the result of the ompi_info
command.
Without understanding the logic behind, I also noticed different situations where I seem to never get corrupted data:
- requiring
MPI_THREAD_MULTIPLE
during MPI initialization, - passing a non-cartesian communicator,
- using an other MPI implementation like MPICH.
Thank you,
Thomas