-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Auto Parallel] Move reduce to opt stage #62157
[Auto Parallel] Move reduce to opt stage #62157
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
❌ The PR is not created using PR's template. You can refer to this Demo. |
for idx, op in list(enumerate(main_block.ops)): | ||
if is_data_parallel_reduce_op(op): | ||
op_input_names = op.desc.input_arg_names() | ||
if "@RENAME" in op_input_names[0]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这是针对一些特殊case的处理,加个NOTE说明下吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Performance optimization
PR changes
Others
Description
静态图下开启sharding后,每个micro batch 反向阶段都会进行 allreduce, 但是反向阶段做通信的操作性能比较差
sharding 的这个 pass 在做通信的时候会插入一下reduce去做通信,现在将reduce移动到opt阶段
依赖环境:
经过测试,llama 模型的loss和pr修改前可以对齐
测试 Case: