-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ch4: add MPIDI_NM_am_tag_send/recv #7183
Open
hzhou
wants to merge
12
commits into
pmodels:main
Choose a base branch
from
hzhou:2410_am_tag_send
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hzhou
force-pushed
the
2410_am_tag_send
branch
9 times, most recently
from
October 26, 2024 16:58
077265c
to
07f43ac
Compare
In mpidig RNDV, if the netmod supports tag -- query via MPIDI_NM_am_can_do_tag() -- add MPIDI_NM_am_tag_{send,recv} to support direct send/recv. This not only avoids the extra copy in UCX or extra hdr message and additional RDMA read in OFI am_send_long, it also supports native GPU RDMA direct when the netmod supports it. This is how it works - * sender sends RTS * receiver posts MPIDI_NM_am_tag_recv and reply CTS * sender do MPIDI_NM_am_tag_send
Some adjustment to make part of {ofi,ucx}_{send,recv}.h available in AM_ONLY build to provide MPIDI_NM_am_tag_send and MPIDI_NM_am_tag_recv.
* remove err_flag -- we no longer propagate err_flag (ref e1c2ee8) * remove the "noreq" param -- we no longer use pre-created request * derive sender_nic, receiver_nic, match_bits in MPIDI_OFI_send and pass them down to avoid code duplication.
Remove the type argument from MPIDI_OFI_init_sendtag. Instead, always explicitly set the protocol bit to the match_bits. This makes it clear that we are using separate tag spaces. This is symmetric to what we do with MPIDI_OFI_init_recvtag.
When MPIDI_OFI_CONTEXT_BITS, MPIDI_OFI_SOURCE_BITS, and MPIDI_OFI_TAG_BITS are defined as constants, all its derived macros are compile-time constants as well. Thus, they do not need be individually defined in capability sets in order to avoid the runtime bit-calculation cost. Thus we can define these macros in ofi_types.h without distinguishing MPIDI_OFI_ENABLE_RUNTIME_CHECKS.
hzhou
force-pushed
the
2410_am_tag_send
branch
from
October 27, 2024 19:34
07f43ac
to
fa1728b
Compare
test:mpich/ch4/most ✔️ |
MPIDI_NM_am_tag_send is similar to MPIDI_NM_mpi_isend except - * already inside per-vci critical section * uses MPIDI_OFI_TAG_AM protocol bit (am tag space) * sets am_req and am_handler_id so at event completion it will invoke MPIDIG_global.tag_recv_cbs.
It depends on MPIDI_OFI_ENABLE_TAGGED.
Now we can support a true RNDV mode, where sender will not wait for receiver complete but only wait for receiver to match.
* remove the have_request parameter in MPIDI_UCX_send. We always allocate a new request.
MPIDI_NM_am_tag_{send,recv} is similar to MPIDI_NM_mpi_i{send,recv} except - * already inside per-vci critical section * uses MPIDI_UCX_TAG_AM protocol bit (am tag space)
When both netmod and shm are using active messages, we need check the is_local field in matching or they may mismatch when it is anysource recv or probe. The previous code happen to work because we reply based on the is_local from the am header, thus even with MPIDI_SHM_mpi_isend mismatched an MPIDI_NM_mpi_isend (am_only mode), the matched request switched to reply over netmod after matching. This won't work when netmod am path deviates from shm, such as when we introduce MPIDI_NM_am_tag_{send,recv}. This was fixed in 21870d1, but got lost during recvq refactor.
We should call MPIDI_anysrc_try_cancel_partner upon matching, and call MPIDI_anysrc_free_partner upon completion. * In MPIDIG_do_irecv, we only called MPIDI_anysrc_try_cancel_partner upon matching unexpected request but only for has_request branch. We should do that for both branches. * MPIDI_anysrc_free_partner need copy the status when the partner request is the user visible request. Thus we only should free partner when rreq completes. Unfortunately, we have multiple completion branches. 1. Eager unexpected path 2. recv_target_cmpl_cb 3. MPIDIG_tag_recv_complete
hzhou
force-pushed
the
2410_am_tag_send
branch
from
October 28, 2024 03:52
fa1728b
to
6c1b54e
Compare
test:mpich/ch4/ofi/more ✔️ |
4 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Description
In mpidig RNDV, if the netmod supports tag -- query via
MPIDI_NM_am_can_do_tag() -- add MPIDI_NM_am_tag_{send,recv} to support
direct send/recv. This not only avoids the extra copy in UCX or extra
hdr message and additional RDMA read in OFI am_send_long, it also
supports native GPU RDMA direct when the netmod supports it.
This is how it works -
* sender sends RTS
* receiver posts MPIDI_NM_am_tag_recv and reply CTS
* sender do MPIDI_NM_am_tag_send
Strength
Comparisons
Existing OFI
Existing UCX
Follow-up
[skip warnings]
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short description
Commit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.