Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p communication overlap at interleaved pipelining #1616

Merged
merged 1 commit into from
Mar 20, 2023

Conversation

erhoo82
Copy link
Contributor

@erhoo82 erhoo82 commented Mar 15, 2023

  1. Overlap P2P communication with FWD and BWD computes during the 1F1B phase of pipelining. This implementation is only valid at interleaved-pipelining, where P2P communication adds significant performance overhead.
  2. Use individual send/recv kernel instead of batching multiple send/recv ops with a single kernel. This avoids unnecessary serialization of send/recv ops within a kernel.

fix comments and arguments

overlap individual send/recv ops

fix for p2p overlap with batched send recv op

remove redundancy

fix non-overlap case
@erhoo82 erhoo82 changed the title Draft: p2p communication overlap at interleaved pipelining p2p communication overlap at interleaved pipelining Mar 15, 2023
Comment on lines +91 to +96
overlap_p2p_comm: If :obj:`True`, returns cuda wait handles to scheduler instead of completing
the communication within the p2p transfer API instance. The scheduler manages the communication completion
to overlap with computation.
batch_p2p_comm: If :obj:`True`, use the batched send and receive api to conduct the communication of
a collection of send and receive operations between peer. If :obj:`False`, conduct each send and recv operation
individually.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noob question: Can these two arguments be the same value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimal option, for now, is True for overlap_p2p_comm and False for batch_p2p_comm. But, yes, they could have the same values.

@crcrpar crcrpar added this to the 23.04 milestone Mar 17, 2023
Copy link
Collaborator

@crcrpar crcrpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_run_p2pops seems to deserve a refactoring but fine for now

@crcrpar crcrpar merged commit d8643ef into NVIDIA:master Mar 20, 2023
yuanzhedong pushed a commit to yuanzhedong/apex that referenced this pull request Jul 14, 2023
fix comments and arguments

overlap individual send/recv ops

fix for p2p overlap with batched send recv op

remove redundancy

fix non-overlap case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants