Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update optimizer for 2.0 #26288

Merged
merged 35 commits into from
Aug 23, 2020
Merged

update optimizer for 2.0 #26288

merged 35 commits into from
Aug 23, 2020

Conversation

MRXLT
Copy link
Contributor

@MRXLT MRXLT commented Aug 14, 2020

PR types

New features

PR changes

OPs

Describe

完善Adam、Adamax、Optimizer、RMSProp op
新增AdamW op

Optimizer类
参数parameter_list 变为 parameters
参数regularization 变为weight_decay,传入float类型时为L2Decay的系数
set_dict接口变为set_state_dict
动态图下新增step接口,替代minimize
current_step_lr接口变为get_lr
clear_gradicents变为clear_grad,原接口仍存在,作为clear_grad的alias接口

AdamOptimzer变为Adam、AdamaxOptimizer变为Adamax、RMSPropOptimizer变为RMSProp,其余改动与基类Optimizer相同。

新增AdamW类
继承自DecoupledWeightDecay、Adam

中文文档链接:PaddlePaddle/docs#2424

image

image

image

image

image

@paddle-bot-old
Copy link

Hi, It's a test PR, it will not trigger CI. If you want to trigger CI, please remove notest in your commit message.

python/paddle/optimizer/optimizer.py Show resolved Hide resolved
python/paddle/optimizer/optimizer.py Show resolved Hide resolved
python/paddle/optimizer/optimizer.py Outdated Show resolved Hide resolved
python/paddle/optimizer/optimizer.py Outdated Show resolved Hide resolved
python/paddle/optimizer/optimizer.py Outdated Show resolved Hide resolved
python/paddle/optimizer/optimizer.py Outdated Show resolved Hide resolved
python/paddle/optimizer/optimizer.py Show resolved Hide resolved
python/paddle/optimizer/optimizer.py Show resolved Hide resolved
python/paddle/optimizer/optimizer.py Outdated Show resolved Hide resolved
beta1=0.9,
beta2=0.999,
epsilon=1e-8,
parameters=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameters 的位置能上前移动么,毕竟动态图强依赖这个参数

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为了与其他优化器保持一致,暂时先不移动这个参数

outputs={"ParamOut": param_and_grad[0]})
return new_param_grads, (table_param, table_grad), sgd_op

def _append_dgc_ops(self, param_and_grad):
Copy link
Collaborator

@phlrain phlrain Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥要有这个api

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在DGCMomentum优化器中会重写并用到,这里主要是为了防止backward中报错

python/paddle/optimizer/optimizer.py Show resolved Hide resolved
XiaoguangHu01
XiaoguangHu01 previously approved these changes Aug 21, 2020
Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

反馈几个小问题,可以先合入,然后再修改。

python/paddle/optimizer/optimizer.py Outdated Show resolved Hide resolved
python/paddle/optimizer/optimizer.py Outdated Show resolved Hide resolved
python/paddle/optimizer/optimizer.py Outdated Show resolved Hide resolved
TCChenlong
TCChenlong previously approved these changes Aug 21, 2020
Copy link
Contributor

@TCChenlong TCChenlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MRXLT MRXLT dismissed stale reviews from TCChenlong and XiaoguangHu01 via 6cc0fc2 August 21, 2020 09:42
raindrops2sea
raindrops2sea previously approved these changes Aug 21, 2020
Related paper: `Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_

Args:
learning_rate (float|LearningRateDecay, optional): The learning rate used to update ``Parameter``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

learning_rate的类型 英文是float|LearningRateDecay,中文是float|Variable,保持一致哈,另外Variable->Tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文档以英文为准,中文文档后续会更新

The default value is 0.999.
epsilon (float, optional): A small float value for numerical stability.
The default value is 1e-08.
parameters (list, optional): List of ``Tensor`` names to update to minimize ``loss``. \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameters的参数顺序 中英文保持一致哈

indicate program pruning. If so, the program will be pruned by ``feed`` and
``fetch_list`` before run, see details in ``Executor``.

Examples:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.0的API实现哈

it is added here for numerical stability to prevent the division by 0 error.

Args:
learning_rate (float|LearningRateDecay, optional): The learning rate used to update ``Parameter``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float|LearningRateDecay 还是 float|Tensor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float|LearningRateDecay ,中文文档后续更新

Copy link
Contributor

@TCChenlong TCChenlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@raindrops2sea raindrops2sea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MRXLT MRXLT merged commit eeda90d into PaddlePaddle:develop Aug 23, 2020
@MRXLT MRXLT changed the title [WIP] update optimizer for 2.0 update optimizer for 2.0 Aug 24, 2020
@MRXLT MRXLT deleted the 2.0-op branch August 24, 2020 06:13
@jzhang533
Copy link
Contributor

参数parameter_list 变为 parameters
SGD, Momentum的没改吧?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants