-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[5.0.0rc10/main] CUDA-aware MPI is broken when using the ob1 PML #11399
Comments
I had the same problem on a very similar configuration (no slingshot). See here |
I rebuilt from main (commit ff1f1b7) against UCX 1.12.1 and it works with
I don't immediately have a minimal reproducer. Unfortunately, It looks like |
Ok, I have a minimal-ish reproducer using the From this directory, you can simply Then run:
I get:
|
I can also reproduce this issue when built with OFI rather than UCX (and UCX explicitly disabled):
Is something going wrong inside the progress engine for messages that use CUDA device buffers? |
How are you forcing using the ucx pml in you environment ? Do you set an environment variable or similar? I see many output lines from ob1 and sm btl, which should not really be involved in a code that uses GPU buffers. I usually do something like
for GPU code |
I am not forcing it to use UCX at all. I assume it will use it automatically if needed. It crashes in the same way when I do not build with UCX support. I've run addr2line to get the line numbers where it crashes, and for some reason it is using the
|
The cluster I'm running on uses SLURM, so I have to use srun to launch things properly. Can I force using the UCX pml via an environment variable? |
I would highly recommend that you force using ucx. I use ucx 1.13.x with GPU literally on a daily basis, but you have to tell Open MPI that it should use ucx. Either using the command line that I showed, or setting the corresponding environment variables.
If you see ob1 or sm btl error messages, you know it did not use the correct components, they should not be used with GPU code. |
According to the OpenMPI FAQ, "the Open MPI library will automatically detect that the pointer being passed in is a CUDA device memory pointer and do the right thing" and "CUDA-aware support is available in the sm, smcuda, tcp, and openib BTLs" (https://www.open-mpi.org/faq/?category=runcuda#mpi-cuda-support). Is this not the case anymore? If not, I hope the documentation will be fixed to reflect this. |
I think for 5.0 you need to look at this documentation: I let somebody else answer your last question in details, since I personally am only using Open MPI with UCX for GPU code. The openib component is definitely gone in the 5.0 release. The one additional remark that I have is to make sure that your UCX library has also been compiled with support for the GPUs that you want to use. |
@edgargabriel Thanks for your suggestions. I tried forcing the However, I still get a segmentation fault if I force the |
No supposedly the ob1 pml should work with cuda buffers. I've tested the path myself with both ob1 and mtl ofi outside of ucx and it should work. There should be copying mechanisms in place to handle this |
Are you able to reproduce this with the reproducer I linked? I realize it's a heavy lift to build, but I haven't been able to reproduce this with the OSU benchmarks. Presumably there is some kind of race condition or issue with handling in-flight messages in the progress engine. |
No I have not made an attempt to reproduce yet and probably won't have the time to do so for a few weeks. Were you able to reproduce it using the pml/cm & mtl/ofi path? |
Also do you have a coredump at hand? |
No, I haven't tried building with libfabric. I can try this, though. Aside from building with OFI, what env vars or flags do I need to set for both of these? |
No coredump, only the backtrace that I copied above. It tries to dereference a device pointer in |
@wckzhang I've re-built with libfabric and set
With (I've also set |
I've re-built libfabric 1.17.0 from source using
|
@wckzhang I've uploaded the core dumps for both ob1 and cm+ofi crashes here: https://cloudstor.aarnet.edu.au/plus/s/Qsc6m9liCJXpi9P |
I will post a patch soon that modifies the btls |
fixed with #11564 |
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
5.0.0rc10 and main (commit dad058e)
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
From tarball and from git clone
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
Details of the problem
I have built against UCX 1.14rc2 with CUDA support, which works correctly with OpenMPI 4.1.4.
However, running the
osu_bw
benchmark with device buffers (osu_bw D D
) with either 5.0.0rc10 or main immediately causes a segmentation fault with the following output:The text was updated successfully, but these errors were encountered: