-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General way to test whether CUDA-aware MPI is available? #886
Comments
Currently there is https://juliaparallel.org/MPI.jl/stable/usage/#CUDA-aware-MPI-support and if IIRC only OpenMPI allows for checking the |
Yes, that is helpful! I guess I am wondering whether it make sense to wrap this code: https://gist.github.com/luraess/0063e90cb08eb2208b7fe204bbd90ed2 in a function. We could then use it to provide more information like ("the "all to all" test fails, so you probably don't have CUDA-aware MPI). |
That could be helper tool. However, it may not fully cover the case where, e.g., you have a non-functionning GPU-aware implementation or installation. |
Hmm, yes if it is non-comprehensive then perhaps it belongs downstream instead of here. We can prototype it in So I make sure I understand --- are you saying that the test can pass even when the MPI implementation is non-functioning? |
The opposite way, tests may fail although CUDA-aware MPI may be supported. But as a downstream check, it can be useful indeed. |
At the very least if This seems like a good path for us; we simply run tiny tests of all the I recently ran into the issue where something didn't work because of incorrect linking when libraries were installed on a cluster (eg by the vendor that installed MPI). It took me almost a week to figure out what was wrong! So I'm searching for ways to speed this up for other systems and also users; I suspect that MPI usage is going to increase quite a bit in the near future. |
I'll close this but feel free to re-open if you think that users would benefit from helper functions implemented directly in MPI.jl or the CUDA extension. |
I think it would be useful to have a (more) general way to test whether CUDA-aware MPI is available (eg making
has_cuda()
work for more than just Open MPI). This would allow us to distinguish between a configuration issue vs other errors when complex applications fail on a cluster. I'm curious whether this is impossible or simply a lot of work (for example each MPI installation requiring some bespoke method).Perhaps a practical solution would involve some empirical test, ie testing some simple piece of code that should work in most circumstances if CUDA-aware MPI is available, and reporting whether it errors or not.
The text was updated successfully, but these errors were encountered: