UVM buffers failing in cuIpcGetMemHandle ?


## Background information

I'm running OpenMPI 4.0.1 self compiled over Omnipath with IFS 10.8, as distributed by Intel.

The boards are 
- HPE XA with 
- 4 x Nvidia Volta V100 GPU's and 
- 4 OPA 100Gb ports on two PCIe dual port HFI cards.

The good news is that MPI appears to work between nodes, where these buffers are sent from explicit device memory.

However when I run four MPI ranks per node and ensure that communications between ranks use unified virtual memory (UVM) allocated with cudaMallocManaged(), I get a failure:

```
r6i6n7.218497 Benchmark_dwf: CUDA failure: cuIpcGetMemHandle() (at /nfs/site/home/phcvs2/gitrepo/ifs-all/Ofed_Delta/rpmbuild/BUILD/libpsm2-11.2.23/ptl_am/am_reqrep_shmem.c:1977)returned 1 
r6i6n7.218497 Error returned from CUDA function.
```

When I run with a patch to the code to use explicit host memory the code succeeds.
However, I want to be able to run these buffers from UVM and have loops with either host or device execution policy fill them, as that is how the code was designed to operate.

### What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

v4.0.1 

### Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

./configure CC=gcc CXX=g++ --prefix=/home/dp008/dp008/paboyle/Modules/openmpi/install/ --with-psm2-libdir=/lib64/ --with-cuda=/tessfs1/sw/cuda/9.2/ --enable-orterun-prefix-by-default

Compiled with gcc set to 7.3.0

### Please describe the system on which you are running

* Operating system/version: 

Redhat Centos 7.4

* Computer hardware: 

HPE XA780i
Dual skylake 4116, 12+12 core.
Two OPA dual port HFI's.
Four V100 SXM2.
96GB RAM.

* Network type: 

Two OPA dual port HFI's.

-----------------------------

## Details of the problem

When I run four MPI ranks per node and ensure that communications between ranks use unified virtual memory (UVM) allocated with cudaMallocManaged(), I get a failure:

```
r6i6n7.218497 Benchmark_dwf: CUDA failure: cuIpcGetMemHandle() (at /nfs/site/home/phcvs2/gitrepo/ifs-all/Ofed_Delta/rpmbuild/BUILD/libpsm2-11.2.23/ptl_am/am_reqrep_shmem.c:1977)returned 1 
r6i6n7.218497 Error returned from CUDA function.
```

When I run with a patch to the code to use explicit host memory the code succeeds.
However, I want to be able to run these buffers from UVM and have loops with either host or device execution policy fill them, as that is how the code was designed to operate.

Running the unmodified code with one rank per node works, so the UVM is _working as a source for network traffic_, _but not as a source for intra-node traffic between GPUs_.

Is there something I need to configure differently (I admit this is a complex environment so
I could be missing something !)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UVM buffers failing in cuIpcGetMemHandle ? #6799

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Details of the problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UVM buffers failing in cuIpcGetMemHandle ? #6799

Description

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Details of the problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions