-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Problem
For MPI_NEIGHBOR_ALLTOALL|V|W, the behavior for 1 or 2 processes in the Cartesian case is well defined for users and implementors, but neither users nor implementors can easily read the standard. As a result, both mpich and OpenMPI return wrong results in the receive buffer, if a Cartesian dimension is cyclic and the number of processes in that dimension is 1 or 2. Some additional explanations or examples are missing in the MPI standard.
Here a set of 7 slides showing the problem:
153_neighbor_mpi-3_bug_2010-10-03.pdf
Proposal
The following slides show a draft how we may fix this problem:
153_neighbor_mpi-3_bug_2019-10-08.pdf
Changes to the Text
Here is the latest Latex pdf file: mpi-report-issue153-neighbor-errata-2020-01-30-annotated-corr-2020-02-03.pdf
Small update from the reading in Portland:
mpi-report-issue153-neighbor-errata-2020-02-20-annotated.pdf
The pull request is PR153: https://github.com/mpi-forum/mpi-standard/pull/153
The latex source is at https://github.com/RolfRabenseifner/mpi-standard
Impact on Implementations
Both, OpenMPI and mpich and all their derivatives must correct their outcome for 1 or 2 processes in a direction of a Cartesian topology. I tested it only for periodic, but I expect that they have the same bug also for non-perodic Cartesian virtula topologies.
TBD: To check other independent implementations.
Impact on Users
Users will better understand the definition in the MPI standard and therefore also which behavior of an MPI library is correct and which is not.
If a user has implemented a work-around for the wrong behavior of MPI_NEIGHBOR_ALLTOALL
then for the moment, she/he should substitute the MPI_NEIGHBOR_ALLTOALL by a correct usage of MPI_SENDRECV or nonblocking calls as shown in the slides, see above, at least for the case that 1 or more process dimensions are less or equal 2.
References
PR is PR153: https://github.com/mpi-forum/mpi-standard/pull/153
The problem was reported to me by a user: Simone Chiocchetti, University of Trento, ITALY
Continuation, see Issue #320 and PR304 https://github.com/mpi-forum/mpi-standard/pull/304 and PR312 https://github.com/mpi-forum/mpi-standard/pull/312