[Auto Parallel] Move reduce to opt stage #62157

AndSonder · 2024-02-27T15:15:07Z

PR types

Performance optimization

PR changes

Others

Description

静态图下开启sharding后，每个micro batch 反向阶段都会进行 allreduce, 但是反向阶段做通信的操作性能比较差

sharding 的这个 pass 在做通信的时候会插入一下reduce去做通信，现在将reduce移动到opt阶段

没有gm:  reduce(grad) -> opt(grad)
有gm: reduce(grad) -> grad_megred=add(grad) -> opt(grad_megred)         
优化后: grad_megred=add(grad) -> reduce(grad_megred) -> opt(grad_megred)
                                insert role=opt

依赖环境：

PaddleNLP develop llama 模型（hidden_layer 修改为 4）
4 卡 1080 Ti 服务器

经过测试，llama 模型的loss和pr修改前可以对齐

测试 Case:

使用 c_reduce_sum 通过测试，llama 模型 loss 可以对齐 ✅
使用 c_reduce_avg 通过测试，llama 模型 loss 可以对齐 ✅

paddle-bot · 2024-02-27T15:15:15Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot · 2024-02-27T15:15:18Z

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

From00 · 2024-03-04T07:58:32Z

python/paddle/distributed/passes/auto_parallel_gradient_merge.py

+    for idx, op in list(enumerate(main_block.ops)):
+        if is_data_parallel_reduce_op(op):
+            op_input_names = op.desc.input_arg_names()
+            if "@RENAME" in op_input_names[0]:


这是针对一些特殊case的处理，加个NOTE说明下吧

From00

LGTM

move reduce to opt stage

aba628f

paddle-bot bot added the contributor External developers label Feb 27, 2024

AndSonder added 2 commits February 27, 2024 15:49

set op_role for reduce op

ab0f50d

update

1552295

AndSonder changed the title ~~[Auto Parallel] Move c_allreduce to opt stage~~ [Auto Parallel] Move reduce to opt stage Feb 28, 2024

AndSonder added 9 commits February 28, 2024 10:33

fix

44cbff4

add debug info

328ee29

add debug info

c3b5e91

skip reduce op which has @rename in the input name

28b2fdb

remove debug info

f14707e

update

0558d34

move scale op to opt stage

9bbec6d

add dp_gradient_sync_after_accumulate as a strategy

cca88bd

fix

eb9fd04

AndSonder mentioned this pull request Mar 2, 2024

[Distributed] Add dp_gradient_sync_after_accumulate PaddlePaddle/PaddleNLP#8045

Merged

From00 mentioned this pull request Mar 3, 2024

Using allreduce_avg to eliminate scale in auto parallel DP #61622

Merged

From00 reviewed Mar 4, 2024

View reviewed changes

add notes

3b4e94d

From00 approved these changes Mar 5, 2024

View reviewed changes

From00 merged commit 23e0355 into PaddlePaddle:develop Mar 5, 2024
30 checks passed

AndSonder mentioned this pull request Mar 6, 2024

[WeeklyReports] 2024.02.25~2024.03.08 周报汇总 PFCCLab/Camp#132

Closed

28 tasks

AndSonder deleted the move_reduce_to_opt_stage branch April 23, 2024 13:56

AndSonder mentioned this pull request Jun 11, 2024

WAVE SUMMIT+2024上半年飞桨开源之星评选-信息征集 PaddlePaddle/community#892

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Auto Parallel] Move reduce to opt stage #62157

[Auto Parallel] Move reduce to opt stage #62157

AndSonder commented Feb 27, 2024 •

edited

Loading

paddle-bot bot commented Feb 27, 2024

paddle-bot bot commented Feb 27, 2024

From00 Mar 4, 2024

From00 left a comment

[Auto Parallel] Move reduce to opt stage #62157

[Auto Parallel] Move reduce to opt stage #62157

Conversation

AndSonder commented Feb 27, 2024 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Feb 27, 2024

paddle-bot bot commented Feb 27, 2024

From00 Mar 4, 2024

Choose a reason for hiding this comment

From00 left a comment

Choose a reason for hiding this comment

AndSonder commented Feb 27, 2024 •

edited

Loading