Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Auto Parallel] Move reduce to opt stage #62157

Merged
merged 13 commits into from
Mar 5, 2024

Conversation

AndSonder
Copy link
Contributor

@AndSonder AndSonder commented Feb 27, 2024

PR types

Performance optimization

PR changes

Others

Description

静态图下开启sharding后,每个micro batch 反向阶段都会进行 allreduce, 但是反向阶段做通信的操作性能比较差

sharding 的这个 pass 在做通信的时候会插入一下reduce去做通信,现在将reduce移动到opt阶段

没有gm:  reduce(grad) -> opt(grad)
有gm: reduce(grad) -> grad_megred=add(grad) -> opt(grad_megred)         
优化后: grad_megred=add(grad) -> reduce(grad_megred) -> opt(grad_megred)
                                insert role=opt

依赖环境:

  • PaddleNLP develop llama 模型 (hidden_layer 修改为 4)
  • 4 卡 1080 Ti 服务器

经过测试,llama 模型的loss和pr修改前可以对齐

测试 Case:

  • 使用 c_reduce_sum 通过测试,llama 模型 loss 可以对齐 ✅
  • 使用 c_reduce_avg 通过测试,llama 模型 loss 可以对齐 ✅

Copy link

paddle-bot bot commented Feb 27, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Feb 27, 2024
Copy link

paddle-bot bot commented Feb 27, 2024

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

@AndSonder AndSonder changed the title [Auto Parallel] Move c_allreduce to opt stage [Auto Parallel] Move reduce to opt stage Feb 28, 2024
for idx, op in list(enumerate(main_block.ops)):
if is_data_parallel_reduce_op(op):
op_input_names = op.desc.input_arg_names()
if "@RENAME" in op_input_names[0]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这是针对一些特殊case的处理,加个NOTE说明下吧

Copy link
Contributor

@From00 From00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@From00 From00 merged commit 23e0355 into PaddlePaddle:develop Mar 5, 2024
30 checks passed
@AndSonder AndSonder deleted the move_reduce_to_opt_stage branch April 23, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants