forked from pmodels/mpich
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test ssend #20
Open
hzhou
wants to merge
21
commits into
main
Choose a base branch
from
test_ssend
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Test ssend #20
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Enhance the test by test multiple communicators. This catches the current ofi huge path's incorrect context_id handling.
Since we always call MPIDI_OFI_recv_event at completion of huge message, we do not need the indirect function pointer. Remove for cleaner code.
Make MPIDI_OFI_send_control_t internally a union to reflect the general control semantics. Replace MPIDI_OFI_do_control_send with MPIDI_NM_am_send_hdr.
It is not critical to inline functions related to huge messages as they are bandwidth dominated. Move them into ofi_huge.c for better context.
The counter was not incremented for the correct nic. Add FIXME to note potential issues with multiple simultaneous fi_read.
Rather than check and fallback in MPIDI_OFI_recv_huge_event, check whether it's a huge message in event loop and dispatch accordingly.
This partially fixes test/mpi/errors/pt2pt/truncmsg1 when set MPIR_CVAR_CH4_OFI_EAGER_MAX_MSG_SIZE=16384. We still need fix the case when huge message sent but small buffer is posted.
The progress of huge message will depend on receiving the ctrl message. Send it first to promote the likelihood of ctrl header not arriving too much behind the message body.
The debug messages are temporary and should be removed after debugging.
When probing huge messages and control is missing, we should handle probe and mprobe differently. With probe, we can simply return not found. With mprobe, we can enqueue the rreq since the entry is guaranteed not to be double matched.
Store huge_send_mrs in the sreq so we don't need the extra global map.
Store the recv_elem pointer with rreq, so we don't need extra huge_recv_counters map.
Clean up the data structure to be more specific. Differentiate probe and mprobe. The former can be discarded when the control isn't ready. If we don't discard unsuccessful probe and put it in a queue, it can cause issues when another probe or recv come to interfere. Mprobe, on the other hand, is guaranteed to match once, so there is no issue. Persist the remote info with the original rreq. This avoids the use of separate hash maps to look up. It is also cleaner to track.
Split the huge recv completion into static function.
We need handle this case the the sender still receives the ack message.
Add safety assertions to ensure consistentcy.
The huge recv modifies the data_sz. Thus we need set MPIDI_OFI_REQUEST(rreq, util.iov.iov_len) earlier to prevent setting the wrong size.
The comm_id in huge message path is, in fact, context_id of the communicator. We need use MPIR_Context_id_t as type. Direct applying mask to wc->tag won't get the corresponding context_id due to missing shift. Fix it by directly using rreq->comm->recvcontext_id.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Description
Test trying to expose the SYNC ACK mismatch issues in ch4:ofi. Ref. pmodels#5574
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short description
Commit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.