-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3D-parallelism] Hybrid Model Parallelism #32074
[3D-parallelism] Hybrid Model Parallelism #32074
Conversation
Thanks for your contribution! |
✅ This PR's description meets the template requirements! |
optional int32 sharding_degree = 3 [ default = 8 ]; | ||
optional int32 mp_degree = 4 [ default = 1 ]; | ||
optional string sharding_segment_strategy = 5 | ||
optional string sharding_segment_strategy = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enum comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recorded, document will be added in fluiddoc and fleetx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need add comments to this code.
optional bool hybrid_dp = 7 [ default = false ]; | ||
optional int32 gradient_merge_acc_step = 8 [ default = 1 ]; | ||
optional bool optimize_offload = 9 [ default = false ]; | ||
optional bool pp_allreduce_in_optimize = 10 [ default = false ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some comments, in 3d or 4d parallel, allreduce_in_optimize=True can reduce communication, allreduce_in_optimize=False can reduce memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recorded, document will be added in fluiddoc and fleetx and .py file where the feature is called.
but I think this should be a feature for internal project now, and we should not expose It to users ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for backward.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
APIs
Describe
new features
performance optimization:
performance-related
example
assume we have 4 nodes with 8 gpus per node:
mp-sharding-pp 3D parallelism