-
Notifications
You must be signed in to change notification settings - Fork 9
Description
I'm evaluating R&R MPI wrappers for an HPC application. This simulation makes use of MPI_IN_PLACE buffer value calls such as MPI_Allgather or MPI_Allreduce, resulting in various kinds of errors when using ReMPI.
Please consider the following minimal sample, that is simply supposed to sum all ranks everywhere:
PROGRAM sample_allreduce
USE mpi
IMPLICIT NONE
INTEGER :: ierr
INTEGER :: rank, rank_in_place
INTEGER :: rank_sum
CALL MPI_Init(ierr)
CALL MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
rank_in_place = rank
PRINT *, 'Rank: ', rank
CALL MPI_Allreduce(rank, rank_sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, ierr)
CALL MPI_Allreduce(MPI_IN_PLACE, rank_in_place, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, ierr)
PRINT *, 'Sum: ', rank_sum, ' - ', rank_in_place
CALL MPI_Finalize(ierr)
END PROGRAM sample_allreduceLet's say I launch this sample via OpenMPI v2.0.2, Linux v4.11 x86-64 and GNU Fortran Compiler v6.3.0, and mpiexec -np 3, I receive the following and correct result:
Rank: 0
Rank: 1
Rank: 2
Sum: 3 - 3
Sum: 3 - 3
Sum: 3 - 3
Now, I try the same with REMPI_MODE=0 and rev. 21ae26b of ReMPI, this results in bad values (0 instead of 3) for the MPI_IN_PLACE call:
Rank: 1
Rank: 2
Rank: 0
Sum: 3 - 0
Sum: 3 - 0
Sum: 3 - 0
In addition, there is some corruption when being used in the HPC Fortran application, resulting in this error after calling MPI_Allgather and MPI_IN_PLACE for sendbuf argument:
*** An error occurred in MPI_Allgather
*** reported by process [287899649,3]
*** on communicator MPI COMMUNICATOR 11 SPLIT FROM 5
*** MPI_ERR_TYPE: invalid datatype
This is what made me investigating further. For now, I cannot post any of its code, and I hope the sample gives enough reproducibility for this issue.
According to this statement: nerscadmin/IPM#6 (comment), in case of MPI_IN_PLACE (and MPI_BOTTOM), there is no implementation-agnostic way of relaying a Fortran mpi_allgather_ (or any similar method) symboled call of a wrapper properly to the C domain PMPI_Allgather as in ReMPI. Passing values through to Fortran pmpi_allgather_ instead is one way to handle this.