Skip to content

OpenMPI: MPI_IN_PLACE malfunctioning #8

@MarcelHB

Description

@MarcelHB

I'm evaluating R&R MPI wrappers for an HPC application. This simulation makes use of MPI_IN_PLACE buffer value calls such as MPI_Allgather or MPI_Allreduce, resulting in various kinds of errors when using ReMPI.

Please consider the following minimal sample, that is simply supposed to sum all ranks everywhere:

PROGRAM sample_allreduce
  USE mpi
  IMPLICIT NONE

  INTEGER :: ierr
  INTEGER :: rank, rank_in_place
  INTEGER :: rank_sum

  CALL MPI_Init(ierr)
  CALL MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

  rank_in_place = rank

  PRINT *, 'Rank: ', rank
  CALL MPI_Allreduce(rank, rank_sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, ierr)
  CALL MPI_Allreduce(MPI_IN_PLACE, rank_in_place, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, ierr)
  PRINT *, 'Sum: ', rank_sum, ' - ', rank_in_place

  CALL MPI_Finalize(ierr)
END PROGRAM sample_allreduce

Let's say I launch this sample via OpenMPI v2.0.2, Linux v4.11 x86-64 and GNU Fortran Compiler v6.3.0, and mpiexec -np 3, I receive the following and correct result:

 Rank:            0
 Rank:            1
 Rank:            2
 Sum:            3  -            3
 Sum:            3  -            3
 Sum:            3  -            3

Now, I try the same with REMPI_MODE=0 and rev. 21ae26b of ReMPI, this results in bad values (0 instead of 3) for the MPI_IN_PLACE call:

 Rank:            1
 Rank:            2
 Rank:            0
 Sum:            3  -            0
 Sum:            3  -            0
 Sum:            3  -            0

In addition, there is some corruption when being used in the HPC Fortran application, resulting in this error after calling MPI_Allgather and MPI_IN_PLACE for sendbuf argument:

*** An error occurred in MPI_Allgather
*** reported by process [287899649,3]
*** on communicator MPI COMMUNICATOR 11 SPLIT FROM 5
*** MPI_ERR_TYPE: invalid datatype

This is what made me investigating further. For now, I cannot post any of its code, and I hope the sample gives enough reproducibility for this issue.

According to this statement: nerscadmin/IPM#6 (comment), in case of MPI_IN_PLACE (and MPI_BOTTOM), there is no implementation-agnostic way of relaying a Fortran mpi_allgather_ (or any similar method) symboled call of a wrapper properly to the C domain PMPI_Allgather as in ReMPI. Passing values through to Fortran pmpi_allgather_ instead is one way to handle this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions