Argobots integration vs UCX PML

I tried to run a rather simple benchmark (`osu_latency_mt` ported to use Argobots) to test the Argobots integration on an IB cluster using UCX. It appears that the UCX PML does not use the sync objects in blocking operations but instead hammers the UCX library, with occasional calls to `opal_progress`. Of course, there will never be a switch to another argobot in that case, causing deadlocks if there are more argobots than execution streams.

One way to work around this I guess is to call `ABT_thread_yield` in `opal_progress`. So I replaced the call to `sched_yield` with `ABT_thread_yield`. Curiously, it seems that running with `--mca mpi_yield_when_idle true` does not actually cause `opal_progress_yield_when_idle` to be set to `true` so that code path isn't taken (I am not familiar with the mca parameter code and couldn't spot anything wrong on a first glance). I can open a separate issue for this if people think that this is fishy.

If I remove the check for  `opal_progress_yield_when_idle` in `opal_progress` things start to work. Now the latency seems sensitive to `pml_ucx_progress_iterations`, the smaller the better (as the UCX PML calls into `opal_progress` more often).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Argobots integration vs UCX PML #7702

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Argobots integration vs UCX PML #7702

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions