Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mtl/ofi: Do not fail if error CQ is empty #8113

Merged
merged 1 commit into from
Oct 21, 2020

Conversation

rajachan
Copy link
Member

In multi-threaded scenarios, any thread that attempts to read a CQ
when there's a pending error CQ entry gets an -FI_EAVAIL. Without
any serialization here (which is okay, since libfabric will protect
access to critical CQ objects), all threads proceed to read from the
error CQ, but only one thread fetches the entry while others get
-FI_EAGAIN indicating an empty queue, which is not erroneous.

Signed-off-by: Raghu Raja craghun@amazon.com

In multi-threaded scenarios, any thread that attempts to read a CQ
when there's a pending error CQ entry gets an -FI_EAVAIL. Without
any serialization here (which is okay, since libfabric will protect
access to critical CQ objects), all threads proceed to read from the
error CQ, but only one thread fetches the entry while others get
-FI_EAGAIN indicating an empty queue, which is not erroneous.

Signed-off-by: Raghu Raja <craghun@amazon.com>
@bwbarrett bwbarrett merged commit 9afcb8e into open-mpi:master Oct 21, 2020
@hppritcha
Copy link
Member

@rajachan could you cherry-pick this to the 4.0.x branch as well? thanks.

@rajachan
Copy link
Member Author

@hppritcha done #8125

@rajachan rajachan deleted the empty-error-cq branch October 23, 2020 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants