Skip to content

coll/libnbc: do not recursively call opal_progress() #4711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 17, 2018

Conversation

ggouaillardet
Copy link
Contributor

instead of invoking ompi_request_test_all(), that will end up
calling opal_progress() recursively, manually check the status
of the requests.

the same method is used in ompi_comm_request_progress()

Refs #3901

Signed-off-by: Gilles Gouaillardet gilles@rist.or.jp

instead of invoking ompi_request_test_all(), that will end up
calling opal_progress() recursively, manually check the status
of the requests.

the same method is used in ompi_comm_request_progress()

Refs open-mpi#3901

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
@ggouaillardet
Copy link
Contributor Author

@bosilca per my comments on #3901, i had the feeling that ompi_coll_libnbc_progress() being called recursively was either sub-optimal or a bad design.
This PR should fix that.

Generally speaking, is there any legitimate use case in which ompi_coll_libnbc_progress() is called recursively ? an other way to put it, is do we still need libnbc_in_progress ?

@bosilca
Copy link
Member

bosilca commented Jan 15, 2018

Your change to the code looks legit. I don't have enough knowledge about libNBC to answer your question, but keep in mind that the progress can be also called every time we reach a resource exhaustion (such as fragments for BTL/PML). So, if you post new communications from a progress function, there is an opportunity to be called recursively (if new communications were posted from a progress function).

@artpol84
Copy link
Contributor

bot:mellanox:retest

@artpol84
Copy link
Contributor

vader issue, try again
bot:mellanox:retest

@hjelmn
Copy link
Member

hjelmn commented Mar 2, 2018

Looks good to me.

@ggouaillardet
Copy link
Contributor Author

@kawashima-fj FYI

as pointed by @bosilca, MPI_Isend() might invoke opal_progress(), so there is still the possibility the coll/libnbc progress function is invoked recursively.

@hjelmn hjelmn merged commit 1a41482 into open-mpi:master Jul 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants