Skip to content

A system call failed during shared memory initialization ... #7393

Open
@manomars

Description

@manomars

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

v4.0.1 and v4.0.2

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Installed from the source tarball (both with Intel Parallel Studio 2020.0.088 as well as with GCC-7.4.0).

Please describe the system on which you are running

  • Operating system/version: Ubuntu 18.04.3 LTS
  • Computer hardware: 2 x Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
  • Network type:

Details of the problem

When I split the comm_world communicator into two groups (comm_shmem) and try to allocate shmem segments on the latter by means of MPI_win_allocate I get the following error message:

--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

  Local host:  guppy01
  System call: unlink(2) /dev/shm/osc_rdma.guppy01.fd690001.4
  Error:       No such file or directory (errno 2)
--------------------------------------------------------------------------

I used the following program:

    PROGRAM test

    USE mpi_f08

    TYPE(MPI_comm)  :: comm_world, comm_shmem
    TYPE(MPI_group) :: group_world,group_shmem

    TYPE(MPI_win)   :: win
    TYPE(c_ptr)     :: baseptr

    INTEGER(KIND=MPI_ADDRESS_KIND) :: winsize

    INTEGER, ALLOCATABLE :: group(:)

    INTEGER :: nrank,irank,nrank_shmem,irank_shmem,nshmem
    INTEGER :: i,n,sizeoftype
    INTEGER :: ierror

    CALL MPI_init( ierror )

    comm_world = MPI_comm_world

    CALL MPI_comm_rank( comm_world, irank, ierror )
    CALL MPI_comm_size( comm_world, nrank, ierror )

    WRITE(*,'(a,i4,2x,a,i4)') 'nrank:',nrank,'irank:',irank

    ALLOCATE(group(0:nrank-1))

    nshmem=4

    n=0
    DO i=0,nrank-1
       IF (i/nshmem == irank/nshmem) THEN
          group(n)=i
          n=n+1
       ENDIF
    ENDDO

    CALL MPI_comm_group( comm_world, group_world, ierror )
    CALL MPI_group_incl( group_world, n, group, group_shmem, ierror )
    CALL MPI_comm_create( comm_world, group_shmem, comm_shmem, ierror )

    DEALLOCATE(group)

    CALL MPI_comm_rank( comm_shmem, irank_shmem, ierror )
    CALL MPI_comm_size( comm_shmem, nrank_shmem, ierror )

    WRITE(*,'(a,i4,2x,a,i4)') 'irank:',irank,'irank_shmem:',irank_shmem

    CALL MPI_sizeof( i, sizeoftype, ierror )
    winsize=10*sizeoftype

    CALL MPI_win_allocate( winsize, sizeoftype, MPI_INFO_NULL, comm_shmem, baseptr, win, ierror )

    CALL MPI_win_free( win, ierror )

    CALL MPI_finalize( ierror )

    END PROGRAM

and ran it with 8 ranks:

mpirun -mca shmem mmap -np 8 test

Switching to "posix" (mpirun -mca shmem posix ...) gets rid of this error but has problems of its own for which I'll submit a separate issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions