Vader in a Docker Container #4948

ax3l · 2018-03-22T10:16:12Z

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

OpenMPI 3.0.0 (and 2.1.2 for comparisons)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

From source (via spack) inside a docker container.

Please describe the system on which you are running

Operating system/version: Ubuntu "xenial" 16.04.4 LTS with kernel 4.4.0-116 (host), Ubuntu "xenial" 16.04.4 LTS with kernel 4.4.0-116 (container)
Computer hardware: two NUMA nodes with each an Intel Xeon E5-2698v4 (Nvidia DGX-1)
Network type: only in-node relevant for now.

Details of the problem

Starting MPI with more than one rank will result in errors of the form

Read -1, expected <someNumber>, errno =1
Read -1, expected <someNumber>, errno =1
Read -1, expected <someNumber>, errno =1
Read -1, expected <someNumber>, errno =1
Read -1, expected <someNumber>, errno =1
...

as soon as communication is performed (send, receive, reduce, etc.). Simple programs that only contain MPI startup (Init, Get_Rank, Finalize) and shutdown run without issues.

The only way to work-around this issue was for me to down-grade to OpenMPI 2.X which still supports "sm" as a BTL and deactivating vader, e.g. with export OMPI_MCA_btl="^vader".

Is it possible the detection/test of a working CMA is not fully functional? This issue is likely caused by either a non-existent or not-fully forwarded CMA kernel support inside the docker container. Do you have any recommendations on how to use vader in such an environment as the in-node BTL?

The text was updated successfully, but these errors were encountered:

jsquyres · 2018-03-23T20:00:38Z

What are those error messages from -- are they from your program? I.e., what exactly are those error messages indicating?

FWIW: I do not believe we have any tests -- in configure or otherwise -- to check for non-functional CMA. If we find CMA support, we assume it's working. Vader should work just fine if CMA support is not present -- it will fall back to regular copy-in-copy-out shared memory.

hppritcha · 2018-03-27T15:39:52Z

Are the processes within different docker containers? If so, then its likely CMA is failing because the containers may be in different name spaces. The workaround is to disable CMA on the mpirun command line:

--mca btl_vader_single_copy_mechanism=none

ax3l · 2018-03-27T16:14:18Z

Thank you for the details!

What are those error messages from -- are they from your program?

No they originate from OpenMPI. I guess from here:

https://github.com/open-mpi/ompi/blob/v3.0.0/opal/mca/btl/vader/btl_vader_get.c#L74-L78

Are the processes within different docker containers?

No it's the same container on a Nvidia DGX-1 (low detail datasheet, detailed guide) which has two Xeon packages in case that is relevant. ~~Might be, I am not sure if they have regular QPI on it.~~ QPI is there (last link, page 6)

We will try to debug it hands-on again with Nvidia engineers next week. I was just wondering if the error (see code lines above) already tells you something that could give me pointers on how to debug CMA (or if you had runtime CMA tests in place).

I see that you define OPAL_BTL_VADER_HAVE_CMA at compile time, which is of course a bit tricky for an image that is shipped around (might exist on the build machine but not on the run machine).

ggouaillardet · 2018-03-28T06:14:18Z

@ax3l the issue here is that process_vm_readv() fails with EPERM which is suspicious.

The root cause could be docker prevents this, and some sysadmin config might be required.

From the man page

In order to read from or write to another process, either the caller must have the capability CAP_SYS_PTRACE, or the real user ID, effective user ID, and saved set-user-ID of the remote process must match the real user ID of the caller and the real group ID, effective group ID, and saved set-group-ID of the remote process must match the real group ID of the caller. (The permission required is exactly the same as that required to perform a ptrace(2) PTRACE_ATTACH on the remote process.)

you might want to manually check those conditions are met as well.
there is a theoretical risk your kernel incorrectly supports this in the context of docker (e.g. namespaces).

I would suggest you first try to run your app on the host first, and then in the container.
As pointed earlier

mpirun --mca btl_vader_single_copy_mechanism none ...

might help you

Note the btl/sm is still available in recent Open MPI versions, so

mpirun --mca btl ^vader ...

might also help you here.

hjelmn · 2018-03-29T19:10:41Z

@ggouaillardet btl/sm is not needed. The way we are recommending to deal with docker if the ptrace permissions can't be fixed is to set OMPI_MCA_btl_vader_single_copy_mechanism=none. That will disable CMA.

felker · 2018-06-07T15:42:17Z

I recently upgraded the OpenMPI library in our project's Travis CI setup from 2.1.1 to 3.0.2 (and tried 3.1.0), and we were observing

Read -1, expected 5120, errno = 1
Read -1, expected 5120, errno = 1
Read -1, expected 5120, errno = 1
....

as soon as mpirun was launched with 2 or more ranks on the Ubuntu Trusty Docker image build environments.

Setting

export OMPI_MCA_btl_vader_single_copy_mechanism=none

before launching the jobs, as @hjelmn suggested, seems to have fixed the problem in 3.0.2 for us.

hjelmn · 2018-06-07T15:47:22Z

Note if you want better performance you want CMA to work. It will only work if all the local MPI processes are in the same namespace.

An alternative (I haven't tested this) would be to use xpmem: http://gitlab.com/hjelmn/xpmem (there is a version on github but it will no longer be maintained).

blechta · 2018-07-24T09:12:22Z

Note that one can allow ptrace permissions by docker run --cap-add=SYS_PTRACE ... which seems to get CMA working.

JiaweiZhuang · 2018-12-15T21:53:59Z

Setting

export OMPI_MCA_btl_vader_single_copy_mechanism=none

Got exactly the same issue with OpenMPI 3.1.3 in Docker. This fixes the problem!

milthorpe · 2019-01-09T12:14:02Z

Same problem on OpenMPI 4.0.0 in Docker; disabling vader single copy mechanism as suggested above fixes it.

see open-mpi/ompi#4948

not directly relevant for our current 16.04 image but still good to have. See open-mpi/ompi#4948

AdamSimpson · 2019-07-18T21:05:46Z

FWIW the root cause of this is likely that process_vm_readv()/process_vm_writev() are disabled in the default Docker seccomp profile. A slightly less heavy handed option than --cap-add=SYS_PTRACE would be to modify the seccomp profile so that process_vm_readv and process_vm_writev are whitelisted, by adding them to the syscalls.names list. Having done that I was able to use vader and UCX with CMA without issue.

Edit:
It is also worth mentioning that if you're still running into CMA issues: openucx/ucx#3545

… Docker container. We add a line in Dockerfile to set OpenMPI environment variable after everything has bee installed. For further details, see [the issue in OpenMPI Github page](open-mpi/ompi#4948)

ax3l · 2021-09-21T08:06:00Z

Saw the problem again today with:

Docker version 20.10.7, build 20.10.7-0ubuntu1~20.04.1
OpenMPI 4.0.3
Ubuntu 20.04 image (x86-64)

Work-around as before is still:

export OMPI_MCA_btl_vader_single_copy_mechanism=none

gpaulsen · 2021-09-21T13:24:08Z

master PR #6844 was merged into v4.0.x with PR #6997 in time for Open MPI v4.0.2, so perhaps that's not the only issue here, or something changed after that.

Update Docker container and run instructions to avoid MPI / Docker security conflicts This approach is expedient for this example, but probably not the best approach for production deployments. See separate discussion on this issue open-mpi/ompi#4948.

Details: * open-mpi/ompi#4948 * open-mpi/ompi#6920

jjhursey · 2022-08-19T20:18:27Z

A few notes on this ticket:

In main/v5.0.x the btl_vader_single_copy_mechanism MCA parameter does not exist. Use -mca smsc ^cma instead. Also vader has been renamed to sm
In PR smsc/cma: Add a check for CAP_SYS_PTRACE between processes #10694 I added an additional check that should disqualify cma if CAP_SYS_PTRACE is not granted to the container. See example output in the PR.

I think this ticket can be closed given the workaround for the v4.x series. The change in PR #10694 should make it so that the workaround is not required and the cma will disqualify itself automatically.

As noted above cma does have some nice performance benefits so it is recommended that you pass --cap-add SYS_PTRACE to your favorite container runtime to make sure that CAP_SYS_PTRACE is granted to processes in the resulting namespace.

…e call OMPI_MCA_btl_vader_single_copy_mechanism is meant to suppress an error message from an incompatibility between btl/vader and docker, see open-mpi/ompi#4948. PARSEC_MCA_runtime_bind_threads is meant to disable thread binding in PaRSEC, potentially speeding up test runs. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>

…orkaround for open-mpi/ompi#4948)

vedantroy · 2024-01-02T05:32:30Z

I'm running into this issue on https://modal.com, and it is causing cudaIpcOpenMemHandle to fail. Using --mca btl ^vader doesn't solve the problem, and I can't modify the container environment, is there any other way to make this work?

rhc54 · 2024-01-02T08:03:20Z

What version of OMPI are you using? It's hard to help if you don't provide at least some basic info.

vedantroy · 2024-01-02T15:44:38Z

mpirun --version prints mpirun (Open MPI) 4.1.2.

The command I'm running is:

mpirun --hostfile hostfile.txt --mca btl ^vader --allow-run-as-root -np 2 csrc/reference_allreduce/fastallreduce_test.bin

where the hostfile is: localhost slots=2 max_slots=2

rhc54 · 2024-01-03T07:30:35Z

@hppritcha @ggouaillardet You folks have any thoughts here? I don't know anything about v4.1, I'm afraid.

ggouaillardet · 2024-01-03T11:40:04Z

This looks like a gpu within docker issue.

@vedantroy please open a new issue and give a full description of your problem.

bosilca · 2024-01-03T14:14:21Z

vader does not use cudaIpcOpenMemHandle, the culprit might be smcuda. Try --mca btl ^smcuda.

hppritcha · 2024-01-03T15:07:27Z

I second @ggouaillardet on opening a separate issue if @bosilca 's suggestion doesn't work.

This leads to errors such as [runner-...] Read -1, expected <some numer>, errno = 1 in docker, so we disable it. Some more discussion can be found here: open-mpi/ompi#4948

ax3l mentioned this issue Mar 22, 2018

Improved Docker & NGC ComputationalRadiationPhysics/picongpu#2557

Merged

1 task

jsquyres added the question label Mar 23, 2018

felker mentioned this issue Jun 7, 2018

Upgrade MPICH and OpenMPI in Travis CI build environments PrincetonUniversity/athena#135

Merged

1 task

NablaCFD mentioned this issue Aug 30, 2018

OpenMPI security paranoia about ptrace microsoft/WSL#3397

Closed

JiaweiZhuang added a commit to geoschem/geos-chem-docker that referenced this issue Dec 15, 2018

Fix OpenMPI3 + Docker error. See open-mpi/ompi#4948 (comment)

d4c3d20

xy124 mentioned this issue Jan 11, 2019

netcdf on more than one core throws errors... parflow/parflow#97

Open

DEKHTIARJonathan pushed a commit to DEKHTIARJonathan/GPU_DockerFiles that referenced this issue Jan 13, 2019

Fix for HVD and OpenMPI: open-mpi/ompi#4948

775b145

antoinetavant pushed a commit to antoinetavant/teamcity-docker-agent-fedora-lppic that referenced this issue Mar 19, 2019

added reference for OpenMPI in docker

bad0949

see open-mpi/ompi#4948

rhaas80 added a commit to rhaas80/jupyter-et that referenced this issue Mar 31, 2019

CactusTutorial: work around OpenMPI issue on Ubuntu 18.04

a29128e

not directly relevant for our current 16.04 image but still good to have. See open-mpi/ompi#4948

rhaas80 added a commit to rhaas80/jupyter-et that referenced this issue Mar 31, 2019

CactusTutorial: work around OpenMPI issue on Ubuntu 18.04

41c4ff5

not directly relevant for our current 16.04 image but still good to have. See open-mpi/ompi#4948

rhaas80 added a commit to rhaas80/jupyter-et that referenced this issue Apr 4, 2019

CactusTutorial: work around OpenMPI issue on Ubuntu 18.04

19bf9e0

not directly relevant for our current 16.04 image but still good to have. See open-mpi/ompi#4948

rhaas80 added a commit to rhaas80/jupyter-et that referenced this issue Apr 5, 2019

CactusTutorial: work around OpenMPI issue on Ubuntu 18.04

9772f90

not directly relevant for our current 16.04 image but still good to have. See open-mpi/ompi#4948

johnomotani mentioned this issue Apr 18, 2019

OpenMPI-3 in docker containers boutproject/docker-bout#3

Open

hirokemono mentioned this issue Apr 19, 2019

[WIP] Create Jenkinsfile geodynamics/calypso#8

Merged

rhaas80 added a commit to rhaas80/jupyter-et that referenced this issue May 28, 2019

CactusTutorial: work around OpenMPI issue on Ubuntu 18.04

b346511

not directly relevant for our current 16.04 image but still good to have. See open-mpi/ompi#4948

rhaas80 added a commit to rhaas80/jupyter-et that referenced this issue Jun 6, 2019

CactusTutorial: work around OpenMPI issue on Ubuntu 18.04

2f7d94d

not directly relevant for our current 16.04 image but still good to have. See open-mpi/ompi#4948

rhaas80 added a commit to rhaas80/jupyter-et that referenced this issue Jun 7, 2019

CactusTutorial: work around OpenMPI issue on Ubuntu 18.04

148841f

not directly relevant for our current 16.04 image but still good to have. See open-mpi/ompi#4948

marmistrz mentioned this issue Jul 24, 2019

Add support for running inside Docker containers golemfactory/gumpi#56

Merged

23 tasks

jngrad mentioned this issue Aug 3, 2019

ParticleCache_test fails on Fedora 31 espressomd/espresso#2985

Closed

TotoGaz mentioned this issue Dec 3, 2020

Reenabling unit tests for Pangea2 target GEOS-DEV/GEOS#1245

Closed

chrislupp mentioned this issue Feb 8, 2021

Warning/error messages when running MPI jobs OpenMDAO/mphys#46

Closed

ax3l mentioned this issue Sep 21, 2021

Azure: Ubuntu 20.04 ECP-WarpX/WarpX#2302

Merged

philbucher mentioned this issue Oct 19, 2021

[CI] setting OpenMPI flags KratosMultiphysics/Kratos#9236

Merged

chimaerase mentioned this issue Nov 2, 2021

Concerning outputs from running multiple chains nanograv/PTMCMCSampler#23

Closed

mgovoni-devel pushed a commit to west-code-development/West that referenced this issue Aug 19, 2022

Work around two issues of OpenMPI in Docker

71eafc9

Details: * open-mpi/ompi#4948 * open-mpi/ompi#6920

mgovoni-devel pushed a commit to west-code-development/West that referenced this issue Aug 19, 2022

Work around two issues of OpenMPI in Docker

72ec29e

Details: * open-mpi/ompi#4948 * open-mpi/ompi#6920

jjhursey closed this as completed Aug 19, 2022

devreal mentioned this issue Sep 22, 2022

Github: set OMPI and PaRSEC related env variables and remove setlocale TESSEorg/ttg#242

Merged

ghost mentioned this issue Jan 9, 2023

error outputs when running and closing mpi examples in a docker container JungWhoNam/ospray#8

Open

oliveroxtoby mentioned this issue Apr 1, 2023

Log file with errors due to MPI [docker install] can cause freezes of FreeCAD jaheyns/CfdOF#96

Closed

panda1100 mentioned this issue Apr 26, 2023

Open MPI MCA Param file not applied. #7737

Closed

santiagoMonedero mentioned this issue Jul 25, 2023

OpenFoam error when using more than one thread in Docker firelab/windninja#497

Open

vasdommes added a commit to davidsd/sdpb that referenced this issue Oct 3, 2023

.circleci/config.yml: Add docker run --cap-add=SYS_PTRACE argument (w…

9641ece

…orkaround for open-mpi/ompi#4948)

vasdommes added a commit to davidsd/sdpb that referenced this issue Oct 10, 2023

.circleci/config.yml: Add docker run --cap-add=SYS_PTRACE argument (w…

3e3675f

…orkaround for open-mpi/ompi#4948)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vader in a Docker Container #4948

Vader in a Docker Container #4948

ax3l commented Mar 22, 2018

jsquyres commented Mar 23, 2018

hppritcha commented Mar 27, 2018

ax3l commented Mar 27, 2018 •

edited

Loading

ggouaillardet commented Mar 28, 2018

hjelmn commented Mar 29, 2018

felker commented Jun 7, 2018

hjelmn commented Jun 7, 2018

blechta commented Jul 24, 2018

JiaweiZhuang commented Dec 15, 2018

milthorpe commented Jan 9, 2019

AdamSimpson commented Jul 18, 2019 •

edited

Loading

ax3l commented Sep 21, 2021 •

edited

Loading

gpaulsen commented Sep 21, 2021

jjhursey commented Aug 19, 2022

vedantroy commented Jan 2, 2024 •

edited

Loading

rhc54 commented Jan 2, 2024

vedantroy commented Jan 2, 2024

rhc54 commented Jan 3, 2024

ggouaillardet commented Jan 3, 2024

bosilca commented Jan 3, 2024

hppritcha commented Jan 3, 2024

Vader in a Docker Container #4948

Vader in a Docker Container #4948

Comments

ax3l commented Mar 22, 2018

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Details of the problem

jsquyres commented Mar 23, 2018

hppritcha commented Mar 27, 2018

ax3l commented Mar 27, 2018 • edited Loading

ggouaillardet commented Mar 28, 2018

hjelmn commented Mar 29, 2018

felker commented Jun 7, 2018

hjelmn commented Jun 7, 2018

blechta commented Jul 24, 2018

JiaweiZhuang commented Dec 15, 2018

milthorpe commented Jan 9, 2019

AdamSimpson commented Jul 18, 2019 • edited Loading

ax3l commented Sep 21, 2021 • edited Loading

gpaulsen commented Sep 21, 2021

jjhursey commented Aug 19, 2022

vedantroy commented Jan 2, 2024 • edited Loading

rhc54 commented Jan 2, 2024

vedantroy commented Jan 2, 2024

rhc54 commented Jan 3, 2024

ggouaillardet commented Jan 3, 2024

bosilca commented Jan 3, 2024

hppritcha commented Jan 3, 2024

ax3l commented Mar 27, 2018 •

edited

Loading

AdamSimpson commented Jul 18, 2019 •

edited

Loading

ax3l commented Sep 21, 2021 •

edited

Loading

vedantroy commented Jan 2, 2024 •

edited

Loading