-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dygraph hybrid pp for interleave] The interleave scheduler for pipeline parallel #45497
[dygraph hybrid pp for interleave] The interleave scheduler for pipeline parallel #45497
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
0a095e1
to
21aa3d6
Compare
2998491
to
4931d7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may split 1F1B pipeline and interleave pipe into two class.
python/paddle/distributed/fleet/meta_parallel/pipeline_parallel.py
Outdated
Show resolved
Hide resolved
python/paddle/distributed/fleet/meta_parallel/pipeline_parallel.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments.
python/paddle/distributed/fleet/meta_parallel/parallel_layers/pp_layers.py
Show resolved
Hide resolved
python/paddle/distributed/fleet/meta_parallel/pipeline_parallel.py
Outdated
Show resolved
Hide resolved
python/paddle/distributed/fleet/meta_parallel/pipeline_parallel.py
Outdated
Show resolved
Hide resolved
...paddle/fluid/tests/unittests/collective/fleet/hybrid_parallel_pp_layer_with_virtual_stage.py
Show resolved
Hide resolved
578dcdb
to
8b1ca0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
# init some data buffers for interleave scheduler | ||
self.input_tensors = [[] for _ in range(self.num_model_chunks)] | ||
self.output_tensors = [[] for _ in range(self.num_model_chunks)] | ||
self.output_tensor_grads = [[] for _ in range(self.num_model_chunks)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite important: self.output_tensor_grads
is not necessary when forward_only = True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下个PR删除RUN_SERIAL 1
参数
PR types
Others
PR changes
Others
Describe
Support interleave scheduler.
Main changes:
_get_p2p_next_rank()
,_get_p2p_prev_rank()
in classHybridCommunicateGroup
to get the global rank of prev pp stage and next pp stage.0
or1
to represent.Note that: the interleave scheduler only supports eager dygraph mode.
All works for supporting interleave scheduler for pipeline: