-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[AutoParallel] Adapt grad clip for moe layer #74916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #74916 +/- ##
==========================================
Coverage ? 0.00%
==========================================
Files ? 1
Lines ? 1
Branches ? 0
==========================================
Hits ? 0
Misses ? 1
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
liym27
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
/re-run all-failed |
3 similar comments
|
/re-run all-failed |
|
/re-run all-failed |
|
/re-run all-failed |
PR Category
Auto Parallel
PR Types
Bug fixes
Description
背景:
ernie 模型在
pp2 + num_hidden_layers=4 + moe_layer_start_index=3情况下,stage 0上不包含 moe 层,所有 grad mesh 都是 [0,1,2,3] ,在 grad clip 中会调用allreduce求和所有梯度stage 1上包含 moe 层,grad 大部分是 [4,5,6,7];存在 moe softmax 的 grad mesh 是 [current_rank],这些在 grad clip 会被 reshard 成 [4,5,6,7],但并不会调用allreduce求和所有梯度解决方法:
grad clip 中
flag_auto_hybrid_pp控制着是否进行 allreduce 通信该 PR 修复
stage 1情况下适配 moe ,设置flag_auto_hybrid_pp=True,stage0和stage1可以进行allreduce通信card-92763