-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA-aware Ireduce and Iallreduce operations for GPU tensors segfault #9845
Comments
@open-mpi/cuda Can someone look at this issue? Thanks. |
@Akshay-Venkatesh @bureddy mentioned offline that |
Does this same problem occur with the equivalent test program written in C? |
Well, I don't know about equivalent and for C, but it does for the (presumably) analogous program in C++. The OSU micro benchmark suite tests for Ireduce/Iallreduce (
|
@Akshay-Venkatesh Can you help? |
@jorab @jsquyres as @leofang mentioned, running osu_iallreduce or any non-blocking MPI collective operation that involves a reduction is not supported over cuda buffers. |
That said, segfault is not acceptable... Couldn't we return an error code to indicate "not supported"? |
Agreed. @Akshay-Venkatesh Can we do better? |
Will discuss internally and get back early this week. Hope that works. |
@jsquyres When are the next 4x/5x releases planned for? I don't think targeting for 4.1.4 or 5.0.0 is realistic but we may have resources for beyond that point. If we need better handling for the current problem (i.e reporting not supported instead of segfault), we would need to add cuda detection in nbc components that get picked up to run the collective. This also seems non-trivial work and must be aimed for post 4.1.3/5.0.0 time frame. |
v4.1.3 is quite possibly going to be late next week. There's no date for v4.1.4 yet. I don't recall the exact timeline for v5.0.0, but it's (currently) within the next few months. You might want to have some discussions with other Open MPI community members before sprinkling more CUDA code throughout the Open MPI code base (e.g., are you going to need to edit all the NBC collectives? What about the blocking collectives?). There may be some architectural issues at stake here; better to get buy-in before you invest a lot of time/effort. |
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
v4.1.2, v4.1.1, v4.1.0, and v4.0.7 tested
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
tarball
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.n/a
Please describe the system on which you are running
Details of the problem
When calling either
Ireduce
orIallreduce
on PyTorch GPU tensors, a segfault occurs. I haven't exhaustively tested all of the ops, but I don't have problems with Reduce, Allreduce, Isend / Irecv, and Ibcast when tested the same way. I haven't tested CuPy tensors, but it might be worthwhile (numba GPU tensors are affected also). This behavior was discovered by leofang mpi4py/mpi4py#164 (comment) while testing mpi4py.Here is a minimal script that can be used to demonstrate this behavior. The errors are only present when running on GPU:
Software/Hardware Versions:
You can reproduce my environment setup with the following commands:
Here is the error message from running Ireduce:
I appreciate any guidance!
The text was updated successfully, but these errors were encountered: