-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add auto_wrap option in fairscale integration #10673
Conversation
@@ -470,10 +470,10 @@ class TrainingArguments: | |||
sharded_ddp: str = field( | |||
default="", | |||
metadata={ | |||
"choices": ["simple", "zero_dp_2", "zero_dp_3", "zero_dp_2 offload", "zero_dp_3 offload"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choices started to grow longer as there are 4 more possibilities of options now. Can put it back if you really want to but I think the doc is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why do all combinations? why not just list the components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"choices": ["simple", "zero_dp_2", "zero_dp_3", "offload", "auto_wrap"],
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This instruction will fail if we pass anything not in the list of choices, like `"zero_dp_2 offload".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I understand! It doesn't know of composite values and would have failed anyway if one were to pass: "offload zero_dp_2 offload" (reverse).
So it's either subclassing this argparse behavior to support compose values or removing it as you did.
I think as long as the help entry for that option lists all the choices, which it does, then it's good enough.
@@ -570,7 +570,7 @@ def __post_init__(self): | |||
"`--sharded_ddp offload` can't work on its own. It needs to be added to `--sharded_ddp zero_dp_2` or " | |||
'`--sharded_ddp zero_dp_3`. For example, `--sharded_ddp "zero_dp_2 offload"`.' | |||
) | |||
elif len(self.sharded_ddp) > 1 and ShardedDDPOption.Simple in self.sharded_ddp: | |||
elif len(self.sharded_ddp) > 1 and ShardedDDPOption.SIMPLE in self.sharded_ddp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Let's merge this and hopefully I will start testing tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, it's cool that they introduced this!
* Add auto_wrap option in fairscale integration * Style
What does this PR do?
This PR adds support for the
auto_wrap
function added by fairscale to automatically wrap the model's modules in theFSDP
container (necessary for ZeRO-DP3).cc @stas00 So you are informed for when you want to experiment more with fairscale.