Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test ssend #20

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Test ssend #20

wants to merge 21 commits into from

Commits on Sep 27, 2021

  1. Configuration menu
    Copy the full SHA
    bbd0f5c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    30f1c6e View commit details
    Browse the repository at this point in the history
  3. test: enhance test/mpi/pt2pt/probe_unexp.c

    Enhance the test by test multiple communicators. This catches the
    current ofi huge path's incorrect context_id handling.
    hzhou committed Sep 27, 2021
    Configuration menu
    Copy the full SHA
    9d5c2c9 View commit details
    Browse the repository at this point in the history
  4. ch4/ofi: remove done_fn in MPIDI_OFI_huge_recv_t

    Since we always call MPIDI_OFI_recv_event at completion of huge message,
    we do not need the indirect function pointer. Remove for cleaner code.
    hzhou committed Sep 27, 2021
    Configuration menu
    Copy the full SHA
    f53eff1 View commit details
    Browse the repository at this point in the history
  5. ch4/ofi: refactor sending ctrl messages

    Make MPIDI_OFI_send_control_t internally a union to reflect the general
    control semantics.
    
    Replace MPIDI_OFI_do_control_send with MPIDI_NM_am_send_hdr.
    hzhou committed Sep 27, 2021
    Configuration menu
    Copy the full SHA
    ac2d82b View commit details
    Browse the repository at this point in the history

Commits on Sep 28, 2021

  1. ch4/ofi: refactor huge functions to separate file

    It is not critical to inline functions related to huge messages as they
    are bandwidth dominated. Move them into ofi_huge.c for better context.
    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    9d48318 View commit details
    Browse the repository at this point in the history
  2. ch4/ofi: fix fi_read in enable_striping

    The counter was not incremented for the correct nic.
    
    Add FIXME to note potential issues with multiple simultaneous fi_read.
    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    4e9d269 View commit details
    Browse the repository at this point in the history
  3. ch4/ofi: detect normal send in event loop

    Rather than check and fallback in MPIDI_OFI_recv_huge_event, check
    whether it's a huge message in event loop and dispatch accordingly.
    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    1e1ad68 View commit details
    Browse the repository at this point in the history
  4. ch4/ofi: detect huge message data truncation

    This partially fixes test/mpi/errors/pt2pt/truncmsg1 when set
    MPIR_CVAR_CH4_OFI_EAGER_MAX_MSG_SIZE=16384. We still need fix
    the case when huge message sent but small buffer is posted.
    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    4f661a7 View commit details
    Browse the repository at this point in the history
  5. ch4/ofi: swap order of sending huge ctrl

    The progress of huge message will depend on receiving the ctrl message.
    Send it first to promote the likelihood of ctrl header not arriving too
    much behind the message body.
    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    b63867e View commit details
    Browse the repository at this point in the history
  6. ch4/ofi: remove debug messages from huge path

    The debug messages are temporary and should be removed after debugging.
    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    0ec4a61 View commit details
    Browse the repository at this point in the history
  7. ch4/ofi: differentiate probe and mprobe of huge messages

    When probing huge messages and control is missing, we should handle
    probe and mprobe differently. With probe, we can simply return not
    found. With mprobe, we can enqueue the rreq since the entry is
    guaranteed not to be double matched.
    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    f33a14a View commit details
    Browse the repository at this point in the history
  8. ch4/ofi: remove MPIDI_OFI_global.huge_send_counters

    Store huge_send_mrs in the sreq so we don't need the extra global map.
    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    54b3905 View commit details
    Browse the repository at this point in the history
  9. ch4/ofi: remove huge_recv_counters

    Store the recv_elem pointer with rreq, so we don't need extra
    huge_recv_counters map.
    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    a7e5407 View commit details
    Browse the repository at this point in the history
  10. ch4/ofi: revamp huge message handling

    Clean up the data structure to be more specific.
    
    Differentiate probe and mprobe. The former can be discarded when the
    control isn't ready. If we don't discard unsuccessful probe and put it in
    a queue, it can cause issues when another probe or recv come to
    interfere. Mprobe, on the other hand, is guaranteed to match once, so
    there is no issue.
    
    Persist the remote info with the original rreq. This avoids the use of
    separate hash maps to look up. It is also cleaner to track.
    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    1b4f505 View commit details
    Browse the repository at this point in the history
  11. ch4/ofi: split get_huge_complete

    Split the huge recv completion into static function.
    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    81fa0ad View commit details
    Browse the repository at this point in the history
  12. ch4/ofi: handle when huge message sent to small buffer

    We need handle this case the the sender still receives the ack message.
    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    8886473 View commit details
    Browse the repository at this point in the history
  13. ch4/ofi: add vni assertions in control handler

    Add safety assertions to ensure consistentcy.
    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    9e8c23d View commit details
    Browse the repository at this point in the history
  14. ch4/ofi: set recv data_sz correctly in the huge path

    The huge recv modifies the data_sz. Thus we need set
    MPIDI_OFI_REQUEST(rreq, util.iov.iov_len) earlier to prevent setting the
    wrong size.
    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    ddc0f97 View commit details
    Browse the repository at this point in the history
  15. ch4/ofi: fix usage of comm_id

    The comm_id in huge message path is, in fact, context_id of the
    communicator. We need use MPIR_Context_id_t as type.
    
    Direct applying mask to wc->tag won't get the corresponding context_id
    due to missing shift. Fix it by directly using
    rreq->comm->recvcontext_id.
    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    ca8c3f3 View commit details
    Browse the repository at this point in the history
  16. test: threads/pt2pt/ssend.c

    hzhou committed Sep 28, 2021
    Configuration menu
    Copy the full SHA
    4e57600 View commit details
    Browse the repository at this point in the history