Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCX osc: make progress on idle worker if none are active #7632

Merged
merged 1 commit into from
Nov 13, 2020

Conversation

devreal
Copy link
Contributor

@devreal devreal commented Apr 15, 2020

When using UCX in shared memory progress is missing if one process waits in a barrier while the another process attempts to perform RMA operations. This PR adds progress on the first inactive workers if there are no active workers available.

Fixes #7631

Signed-off-by: Joseph Schuchart schuchart@hlrs.de

@devreal devreal requested review from yosefe and artpol84 April 15, 2020 10:41
@@ -308,6 +309,10 @@ opal_common_ucx_wpool_progress(opal_common_ucx_wpool_t *wpool)
}
opal_mutex_unlock(&winfo->mutex);
}
if (active_workers == 0 && opal_list_get_size(&wpool->idle_workers)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we need to progress it in any case.
It might happen that it's in the "idle" list even if we have other workers non-idling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are currently doing a significant refactoring of this code BTW.
But this is a good hint for us. So thank you very much.
For now, I think, we can just always progress this one. But with locking precautions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wpool->mutex is held at this point, that should be sufficient, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devreal we need to progress the default worker.

@devreal
Copy link
Contributor Author

devreal commented Jun 29, 2020

@artpol84 Is there anything left to do here?

@lanl-ompi
Copy link
Contributor

Can one of the admins verify this patch?

@devreal
Copy link
Contributor Author

devreal commented Nov 6, 2020

@artpol84 @janjust I rebased this PR onto current master. I was surprised to see that the wpool mutex is now take unconditionally (https://github.com/open-mpi/ompi/pull/7632/files#diff-4277b36854f67c21224bff1f16f6aa090af4e3905c9e00ce96dc735f05c90200R275). This seems like an unnecessary synchronization point, threads could do other things if another thread is already progressing, no?

Comment on lines 286 to 307
if (active_workers == 0 && opal_list_get_size(&wpool->idle_workers)) {
/* make sure to progress at least some */
ucp_worker_progress(wpool->dflt_worker);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (active_workers == 0 && opal_list_get_size(&wpool->idle_workers)) {
/* make sure to progress at least some */
ucp_worker_progress(wpool->dflt_worker);
}
ucp_worker_progress(wpool->dflt_worker);

@@ -282,6 +283,10 @@ opal_common_ucx_wpool_progress(opal_common_ucx_wpool_t *wpool)
} while (progressed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can skip the default worker here to avoid double-progress.

But we can't skip progressing the default worker if there are active workers. The default worker might still not be active, but you want to progress it.

@artpol84
Copy link
Contributor

@devreal lets sync if you still would like to discuss this.
I'm planning to merge this ASAP.

@devreal
Copy link
Contributor Author

devreal commented Nov 11, 2020

@artpol84 I made a couple of changes:

  1. Store the dflt_winfo in the wpool instead of the dflt_worker to allow for locking it in case the default worker has to be progressed.
  2. Avoid blocking on a taken mutex (which actually lead to deadlocks otherwise, presumably due to lock inversion).
  3. Avoid progressing the dflt_worker directly in osc/ucx but instead call opal_common_ucx_wpool_progress due to ensure proper locking.
  4. I found that the winfo objects are allocated using calloc but then released using OBJ_RELEASE. I don't see how that ever can have worked... I tired to fix it, please review carefully.

@bosilca
Copy link
Member

bosilca commented Nov 11, 2020

Calloc/Malloc + OBJ_CONSTRUCT is morally equivalent to OBJ_NEW. It can therefore be used with OBJ_RELEASE.

@devreal
Copy link
Contributor Author

devreal commented Nov 11, 2020

@bosilca I am aware of that, but there was no OBJ_CONSTRUCT as far as I can see. Maybe I am just blind here...?

Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
@artpol84
Copy link
Contributor

@devreal as I mentioned we have a PR ready with a lot of cleanups and stability fixes. Including more active use of opal objects.
@janjust please see if you’ll be able to rebase on top of this one.
If not- maybe you can port these changes instead keeping the original authorship of @devreal

Copy link
Contributor

@artpol84 artpol84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks ok.

Though please let @janjust to check what’s the easier sequence of commits in the context of upcoming refactoring.

@@ -272,17 +278,33 @@ opal_common_ucx_wpool_progress(opal_common_ucx_wpool_t *wpool)
/* Go over all active workers and progress them
* TODO: may want to have some partitioning to progress only part of
* workers */
opal_mutex_lock(&wpool->mutex);
if (0 != opal_mutex_trylock(&wpool->mutex)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this in @janjust pr already.
Maybe we should have merged it first :(

@janjust
Copy link
Contributor

janjust commented Nov 13, 2020

@janjust please see if you’ll be able to rebase on top of this one.
If not- maybe you can port these changes instead keeping the original authorship of @devreal

@devreal @artpol84 rebasing is fine, let's get it merged

@artpol84 artpol84 merged commit f9ef4b4 into open-mpi:master Nov 13, 2020
@artpol84
Copy link
Contributor

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UCX osc: missing progress in shared memory
6 participants