You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to run a rather simple benchmark (osu_latency_mt ported to use Argobots) to test the Argobots integration on an IB cluster using UCX. It appears that the UCX PML does not use the sync objects in blocking operations but instead hammers the UCX library, with occasional calls to opal_progress. Of course, there will never be a switch to another argobot in that case, causing deadlocks if there are more argobots than execution streams.
One way to work around this I guess is to call ABT_thread_yield in opal_progress. So I replaced the call to sched_yield with ABT_thread_yield. Curiously, it seems that running with --mca mpi_yield_when_idle true does not actually cause opal_progress_yield_when_idle to be set to true so that code path isn't taken (I am not familiar with the mca parameter code and couldn't spot anything wrong on a first glance). I can open a separate issue for this if people think that this is fishy.
If I remove the check for opal_progress_yield_when_idle in opal_progress things start to work. Now the latency seems sensitive to pml_ucx_progress_iterations, the smaller the better (as the UCX PML calls into opal_progress more often).
The text was updated successfully, but these errors were encountered:
In MPICH, all the possible busy-wait calls in the MPICH runtime call a yield function (i.e., ABT_thread_yield() for Argobots or sched_yield() for Pthreads). Blocking system call might be problematic in Open MPI, though I haven't encountered this issue in MPICH.
There are a few pieces to fix:
libevent
We should need ULT-aware (in this case, Argobots-aware) libevent, which has been dropped from #6578 and I am separately making it. It is still being debugged now (https://github.com/shintaro-iwasaki/libevent/tree/2.0.22-abt) -- this will be pushed to somewhere other than my personal repository else once this prototype gets stable enough.
MCA/thread yield
If such a busy wait calls sched_yield(), this can be replaced with another MCA/thread yield function (it needs a PR). I suspect there are busy waits that call neither sched_yield() nor a pause-equivalent operation; we need to identify them and see the solution.
I tried to run a rather simple benchmark (
osu_latency_mt
ported to use Argobots) to test the Argobots integration on an IB cluster using UCX. It appears that the UCX PML does not use the sync objects in blocking operations but instead hammers the UCX library, with occasional calls toopal_progress
. Of course, there will never be a switch to another argobot in that case, causing deadlocks if there are more argobots than execution streams.One way to work around this I guess is to call
ABT_thread_yield
inopal_progress
. So I replaced the call tosched_yield
withABT_thread_yield
. Curiously, it seems that running with--mca mpi_yield_when_idle true
does not actually causeopal_progress_yield_when_idle
to be set totrue
so that code path isn't taken (I am not familiar with the mca parameter code and couldn't spot anything wrong on a first glance). I can open a separate issue for this if people think that this is fishy.If I remove the check for
opal_progress_yield_when_idle
inopal_progress
things start to work. Now the latency seems sensitive topml_ucx_progress_iterations
, the smaller the better (as the UCX PML calls intoopal_progress
more often).The text was updated successfully, but these errors were encountered: