Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use sync_send mask for ofi_create_recv_tag #8052

Merged

Conversation

hkuno
Copy link
Contributor

@hkuno hkuno commented Sep 17, 2020

This commit fixes issue #8051 .

The upper 2 bits of an ompi tag encode the synchronize send and synchronize send ack. Because the mtl_ofi_create_recv_tag_CQD and mtl_ofi_create_recv_tag functions both use ompi_mtl_ofi.sync_proto_mask instead of
ompi_mtl_ofi.sync_send when generating their "ignore" masks, the recv tag-matching logic will disregard the ack bit so that it may match a tag that has the ack bit set.

This is an issue because ssend is implemented by doing a send and receive internally. So if there happens to be an outstanding receive posted by a user before an ssend, that user's receive may end up consuming the internal message intended for the ssend's internal receive.

Updating mtl_ofi_create_recv_tag_CQD and mtl_ofi_create_recv_tag functions to use ompi_mtl_ofi.sync_send fixes this.

Authored-by: John L. Byrne john.l.byrne@hpe.com
Signed-off-by: Harumi Kuno harumi.kuno@hpe.com

The upper 2 bits of an ompi tag encode the synchronize send and
synchronize send ack.
Because the mtl_ofi_create_recv_tag_CQD and mtl_ofi_create_recv_tag
functions both use ompi_mtl_ofi.sync_proto_mask instead of
ompi_mtl_ofi.sync_send when generating their "ignore" masks, they hide
the ack bit, turning the tag into an "any tag receive"

This is an issue because ssend is implemented by doing a send and
receive internally.  So if there happens to be an outstanding posted
receive posted before the ssend, that receive will end up consuming the
internal message intended for the ssend's internal receive.

Updating mtl_ofi_create_recv_tag_CQD and mtl_ofi_create_recv_tag functions
to use ompi_mtl_ofi.sync_send fixes this.

Authored-by: John L. Byrne <john.l.byrne@hpe.com>

Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
@jsquyres jsquyres requested a review from hppritcha September 17, 2020 14:26
@lanl-ompi
Copy link
Contributor

Can one of the admins verify this patch?

@hppritcha hppritcha merged commit 487bbf3 into open-mpi:master Oct 26, 2020
@hkuno
Copy link
Contributor Author

hkuno commented Oct 26, 2020

Thank you!

@hkuno hkuno deleted the john.l.byrne/ofi_create_recv_tag_mask branch October 26, 2020 21:59
@acgoldma
Copy link
Contributor

acgoldma commented Feb 3, 2021

Can this issue be backported to 4.0.x (and 4.1.x)?
@hppritcha @bwbarrett

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants