Skip to content

Bug in MPI_NEIGHBOR_ALLTOALL--1 or 2 processes in the cyclic Cartesian case #153

@tonyskjellum

Description

@tonyskjellum

Problem

For MPI_NEIGHBOR_ALLTOALL|V|W, the behavior for 1 or 2 processes in the Cartesian case is well defined for users and implementors, but neither users nor implementors can easily read the standard. As a result, both mpich and OpenMPI return wrong results in the receive buffer, if a Cartesian dimension is cyclic and the number of processes in that dimension is 1 or 2. Some additional explanations or examples are missing in the MPI standard.

Here a set of 7 slides showing the problem:
153_neighbor_mpi-3_bug_2010-10-03.pdf

Proposal

The following slides show a draft how we may fix this problem:
153_neighbor_mpi-3_bug_2019-10-08.pdf

Changes to the Text

Here is the latest Latex pdf file: mpi-report-issue153-neighbor-errata-2020-01-30-annotated-corr-2020-02-03.pdf

Small update from the reading in Portland:
mpi-report-issue153-neighbor-errata-2020-02-20-annotated.pdf

The pull request is PR153: https://github.com/mpi-forum/mpi-standard/pull/153
The latex source is at https://github.com/RolfRabenseifner/mpi-standard

Impact on Implementations

Both, OpenMPI and mpich and all their derivatives must correct their outcome for 1 or 2 processes in a direction of a Cartesian topology. I tested it only for periodic, but I expect that they have the same bug also for non-perodic Cartesian virtula topologies.

TBD: To check other independent implementations.

Impact on Users

Users will better understand the definition in the MPI standard and therefore also which behavior of an MPI library is correct and which is not.
If a user has implemented a work-around for the wrong behavior of MPI_NEIGHBOR_ALLTOALL
then for the moment, she/he should substitute the MPI_NEIGHBOR_ALLTOALL by a correct usage of MPI_SENDRECV or nonblocking calls as shown in the slides, see above, at least for the case that 1 or more process dimensions are less or equal 2.

References

PR is PR153: https://github.com/mpi-forum/mpi-standard/pull/153
The problem was reported to me by a user: Simone Chiocchetti, University of Trento, ITALY

Continuation, see Issue #320 and PR304 https://github.com/mpi-forum/mpi-standard/pull/304 and PR312 https://github.com/mpi-forum/mpi-standard/pull/312

Metadata

Metadata

Labels

errataErrata items for the previous MPI Standardhad readingCompleted the formal proposal readingpassed final votePassed the final formal votescheduled readingReading is scheduled for the next meetingwg-collectivesCollectives Working Group

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions