-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HybridParallel]Support 1f1b for PipelineParallel #34483
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
paddle.autograd.backward( | ||
self.scaler.scale(self.caches['outputs'][cache_id])) | ||
input_tensor_grad = self._backward_step(input_tensor, output_tensor, | ||
output_tensor_grad) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_tensor和output_tensor_grad用完了,貌似可以先手工释放一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该不能手动设置为None,让它释放吧。host端提前释放,可能device还没开始计算。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没问题的,gpu kernel调度拿到了地址,运行时候不被覆盖及被别人覆盖就行,可以试试🌚
paddle.distributed.send(dtype, dst=1, group=group) | ||
|
||
def send_meta(self, tensor, group): | ||
if isinstance(tensor, paddle.Tensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
提个建议,在pipeline_parallel.py里也有一大堆isinstance(tensor, tuple)的逻辑,不如把单个的paddle.Tensor封装成tuple,统一走tuple的逻辑
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
确实!这个后面重写代码的时候,可以改的优美一些。
python/paddle/distributed/fleet/meta_parallel/pp_utils/p2p_communication.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Others
Describe
[HybridParallel]Support 1f1b for PipelineParallel
修改当前流水线并行的调度方式,采用更省显存的1f1b的调度方式,类似于Megatron的 https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/schedules.py。

具体的调度图如下:
GPT-117M模型,V100-32G,PP=8, mircrobatch=2