-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【paddle.fleet】add auto parallel L1 implementations #27090
【paddle.fleet】add auto parallel L1 implementations #27090
Conversation
test=develop
Thanks for your contribution! |
✅ This PR's description meets the template requirements! |
test=develop
test=develop
test=develop
test=develop
test=develop
return __impl__ | ||
|
||
|
||
is_strict_auto = wrap_decorator(__non_auto_func_called__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This keyword is a little hard to be understood. Could we name it 'reset_auto_flag' as what it exactly does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just a check, whether a user is using strictly auto configuration. Here is a strictly auto configuration.
strategy = DistributedStrategy()
strategy.auto = True
Here is a case that is not a strict auto configuration.
strategy = DistributedStrategy()
strategy.amp = True
strategy.auto = True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the global auto
flag should not be shared with other strategies. For it is hard to be controlled. E.g. two strategies have auto
capacity both, and one requires auto
but the other not.
So I suggest to assign auto
the highest priority, which shields other strategies . And if one strategy could be 'auto' separately, it should has its own auto
configuration.
It is not good to see the the |
@@ -69,6 +69,10 @@ def _disable_strategy(self, dist_strategy): | |||
dist_strategy.dgc = False | |||
dist_strategy.dgc_configs = {} | |||
|
|||
def _enable_strategy(self, dist_strategy): | |||
dist_strategy.dgc = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will auto parallel open all the options that can be turned on? I think some options should not be turned on by default, otherwise accuracy may be lost.
For example, DGC may require warm-up which needs to be set by users, otherwise accuracy may be lost. Auto parallel needs to ensure accuracy without loss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depends on the network condition. I think current dgc is not intelligent that we should improve to remove the warmup thing. The enable_strategy interface does not require all meta optimizers to be valid.
The aim of auto is to make the interface easy to learn and use, not to show we can do different strategies automatically. |
if you set localsgd with auto, the strict auto will not be valid. strategy = DistributedStrategy()
strategy.localsgd = True
strategy.auto = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
APIs
Describe
make distributed strategy easy to config with the following example