-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCX: Hang combining exclusive/shared window lock #6549
Comments
@devreal what is OpenMPI configure command and git revision? |
I used Open MPI version 4.0.1 downloaded from the website and configured using:
It's the Intel compiler in version 19.0.1. I could try changing to the GNU compiler but I'm not sure that that makes a difference, let me know if I should give it a shot. |
@jladd-mlnx Any progress? |
@janjust please take it. |
@devreal I have a hard time reproducing this issue, what is the ucx and ompi commit? |
@janjust That was done using UCX 1.5.0 and Open MPI 4.0.1, both built from release branches. I will give the 4.0.x branch a try with the latest UCX release and report back. |
@janjust I cannot reproduce this on latest |
@devreal , thanks - I believe a lot of your issues are sw atomic related |
Running Open MPI 4.0.1 in combination with Open UCX 1.5 I am seeing my application hang while one process attempts to release an exclusive lock while the target attempts to acquire a shared lock. The code below can be used to reproduce the issue (tested on our IB cluster):
Build with:
Run with:
Interestingly, leaving out the barrier between acquiring and releasing the lock lets the example run successfully. Also, things run fine when using Open IB instead of UCX.
The text was updated successfully, but these errors were encountered: