-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p2p communication overlap at interleaved pipelining #1616
Conversation
erhoo82
commented
Mar 15, 2023
- Overlap P2P communication with FWD and BWD computes during the 1F1B phase of pipelining. This implementation is only valid at interleaved-pipelining, where P2P communication adds significant performance overhead.
- Use individual send/recv kernel instead of batching multiple send/recv ops with a single kernel. This avoids unnecessary serialization of send/recv ops within a kernel.
fix comments and arguments overlap individual send/recv ops fix for p2p overlap with batched send recv op remove redundancy fix non-overlap case
e886b5d
to
2c5d09c
Compare
overlap_p2p_comm: If :obj:`True`, returns cuda wait handles to scheduler instead of completing | ||
the communication within the p2p transfer API instance. The scheduler manages the communication completion | ||
to overlap with computation. | ||
batch_p2p_comm: If :obj:`True`, use the batched send and receive api to conduct the communication of | ||
a collection of send and receive operations between peer. If :obj:`False`, conduct each send and recv operation | ||
individually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noob question: Can these two arguments be the same value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimal option, for now, is True
for overlap_p2p_comm
and False
for batch_p2p_comm
. But, yes, they could have the same values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_run_p2pops
seems to deserve a refactoring but fine for now
fix comments and arguments overlap individual send/recv ops fix for p2p overlap with batched send recv op remove redundancy fix non-overlap case