-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove the AdamOptimizer、SGDOptimizer、MomentumOptimizer、ModelAverage、LookaheadOptimizer、FtrlOptimizer、DecayedAdagradOptimizer、DpsgdOptimizer in fluid and relocate the ExponentialMovingAverage、PipelineOptimizer、GradientMergeOptimizer and change optimizer base for LarsMomentumOptimizer and RecomputeOptimizer #55970
remove the AdamOptimizer、SGDOptimizer、MomentumOptimizer、ModelAverage、LookaheadOptimizer、FtrlOptimizer、DecayedAdagradOptimizer、DpsgdOptimizer in fluid and relocate the ExponentialMovingAverage、PipelineOptimizer、GradientMergeOptimizer and change optimizer base for LarsMomentumOptimizer and RecomputeOptimizer #55970
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
…into adam_sgd_momentum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
单测删除
found_inf = self._get_auxiliary_var('found_inf') | ||
|
||
if found_inf: | ||
inputs['SkipUpdate'] = found_inf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此处是被单测 test_mixed_precision
检出. 目前只有 fluid.optimizer.Adam
and paddle.optimizer.AdamW
添加了检测inf跳过更新的策略(见paddle.static.amp.decorator
)。但Adam
OP是支持这个输入的。因此在paddle.optimizer.Adam
添加该策略支持。
@@ -609,12 +609,6 @@ def test_sharding_weight_decay(self): | |||
'c_reduce_sum', | |||
'c_reduce_sum', | |||
'c_sync_comm_stream', | |||
'scale', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此处是因为paddle.optimizer.Momemtum
会将L2Decay
fuse到OP中,因此不存在额外的scale+sum
操作
return optimizer | ||
|
||
def get_optimizer(self): | ||
optimizer = paddle.optimizer.Lamb( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里删除部分2.0优化器paddle.optimizer.xxx
的单测,是因为在单测v2版本中存在吗,这两个单测文件间的关系是?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_imperative_optimizer.py是用来测试旧版的优化器的,test_imperative_optimizer_v2.py主要用来测试2.0版本的优化器的,但在一开始改动test_imperative_optimizer.py文件的时候只是将优化器替换为2.0的实现,并没有将其删除,所以导致现在两个文件越来相似,后面考虑将test_imperative_optimizer.py进行删除
9.8336181640625, | ||
8.22379207611084, | ||
8.195695877075195, | ||
10.508796691894531, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.0版本的paddle.optimizer.Momentum
会自动fuse L2Decay,此时和fluid存在数值差异。其他Decay或No Decay结果一致,本单测属于前一种场景。
9.569124221801758, | ||
8.251557350158691, | ||
8.513609886169434, | ||
10.603094100952148, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
9.559739112854004, | ||
8.430597305297852, | ||
8.109201431274414, | ||
10.224763870239258, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
@@ -68,7 +68,7 @@ def test_trainable(self): | |||
self.check_trainable( | |||
test_trainable, | |||
feed_dict, | |||
op_count={'adam': 1, 'scale': 0, 'mul_grad': 0}, | |||
op_count={'adam': 1, 'scale': 0, 'mul_grad': 1}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
该处理论上不应该改动,且下面一个关于adamax
的检查已经被错误的修改。经查,属于静态图下Parameter
未能正确设置stop_gradient
属性导致(fluid.Optimizer
检查的是trainable
属性而非stop_gradient
,paddle.Optimizer
则相反)。
后续需要另提PR修复这个问题
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, trainable问题专门解决
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for amp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ExponentialMovingAverage
相应的中文文档是否需要更改位置?需要的话中文这边也提个PR叭 @longranger2- 文档预览的ci现在出了点bug,暂时无法预览,之后预览有问题我再在这个comment下回复~
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
APIs
Description
进行移除的优化器如下:
进行迁移的优化器如下:
进行迁移并更新基类的优化器如下: