Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argobots integration vs UCX PML #7702

Closed
devreal opened this issue May 6, 2020 · 2 comments · Fixed by #8037
Closed

Argobots integration vs UCX PML #7702

devreal opened this issue May 6, 2020 · 2 comments · Fixed by #8037

Comments

@devreal
Copy link
Contributor

devreal commented May 6, 2020

I tried to run a rather simple benchmark (osu_latency_mt ported to use Argobots) to test the Argobots integration on an IB cluster using UCX. It appears that the UCX PML does not use the sync objects in blocking operations but instead hammers the UCX library, with occasional calls to opal_progress. Of course, there will never be a switch to another argobot in that case, causing deadlocks if there are more argobots than execution streams.

One way to work around this I guess is to call ABT_thread_yield in opal_progress. So I replaced the call to sched_yield with ABT_thread_yield. Curiously, it seems that running with --mca mpi_yield_when_idle true does not actually cause opal_progress_yield_when_idle to be set to true so that code path isn't taken (I am not familiar with the mca parameter code and couldn't spot anything wrong on a first glance). I can open a separate issue for this if people think that this is fishy.

If I remove the check for opal_progress_yield_when_idle in opal_progress things start to work. Now the latency seems sensitive to pml_ucx_progress_iterations, the smaller the better (as the UCX PML calls into opal_progress more often).

@hppritcha
Copy link
Member

@shintaro-iwasaki - how did you handle this in MPICH?

@shintaro-iwasaki
Copy link
Contributor

In MPICH, all the possible busy-wait calls in the MPICH runtime call a yield function (i.e., ABT_thread_yield() for Argobots or sched_yield() for Pthreads). Blocking system call might be problematic in Open MPI, though I haven't encountered this issue in MPICH.

There are a few pieces to fix:

libevent

We should need ULT-aware (in this case, Argobots-aware) libevent, which has been dropped from #6578 and I am separately making it. It is still being debugged now (https://github.com/shintaro-iwasaki/libevent/tree/2.0.22-abt) -- this will be pushed to somewhere other than my personal repository else once this prototype gets stable enough.

MCA/thread yield

If such a busy wait calls sched_yield(), this can be replaced with another MCA/thread yield function (it needs a PR). I suspect there are busy waits that call neither sched_yield() nor a pause-equivalent operation; we need to identify them and see the solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants