You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered the following problem when forcing UCX with --mca osc ucx on a shared memory system (mainly for testing; I know it's not optimal but it shouldn't hang):
Process 1 will hang in MPI_Win_lock_all because MPI_Barrier will not progress UCX (presumably because UCX is not used for collectives in shared memory?). The UCX osc's progress callback gets called in the barrier call but since there are no active workers in process 0 it will not progress the operations required by MPI_Win_lock_all on process 1.
I have a possible fix that I will PR soon.
The text was updated successfully, but these errors were encountered:
I encountered the following problem when forcing UCX with
--mca osc ucx
on a shared memory system (mainly for testing; I know it's not optimal but it shouldn't hang):Process 1 will hang in
MPI_Win_lock_all
becauseMPI_Barrier
will not progress UCX (presumably because UCX is not used for collectives in shared memory?). The UCX osc's progress callback gets called in the barrier call but since there are no active workers in process 0 it will not progress the operations required byMPI_Win_lock_all
on process 1.I have a possible fix that I will PR soon.
The text was updated successfully, but these errors were encountered: