-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update optimizer for 2.0 #26288
update optimizer for 2.0 #26288
Conversation
Hi, It's a test PR, it will not trigger CI. If you want to trigger CI, please remove |
beta1=0.9, | ||
beta2=0.999, | ||
epsilon=1e-8, | ||
parameters=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameters 的位置能上前移动么,毕竟动态图强依赖这个参数
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为了与其他优化器保持一致,暂时先不移动这个参数
outputs={"ParamOut": param_and_grad[0]}) | ||
return new_param_grads, (table_param, table_grad), sgd_op | ||
|
||
def _append_dgc_ops(self, param_and_grad): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为啥要有这个api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在DGCMomentum优化器中会重写并用到,这里主要是为了防止backward中报错
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
反馈几个小问题,可以先合入,然后再修改。
python/paddle/fluid/tests/unittests/test_fleet_graph_execution_meta_optimizer.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Related paper: `Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_ | ||
|
||
Args: | ||
learning_rate (float|LearningRateDecay, optional): The learning rate used to update ``Parameter``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
learning_rate的类型 英文是float|LearningRateDecay,中文是float|Variable,保持一致哈,另外Variable->Tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文档以英文为准,中文文档后续会更新
The default value is 0.999. | ||
epsilon (float, optional): A small float value for numerical stability. | ||
The default value is 1e-08. | ||
parameters (list, optional): List of ``Tensor`` names to update to minimize ``loss``. \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameters的参数顺序 中英文保持一致哈
indicate program pruning. If so, the program will be pruned by ``feed`` and | ||
``fetch_list`` before run, see details in ``Executor``. | ||
|
||
Examples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.0的API实现哈
it is added here for numerical stability to prevent the division by 0 error. | ||
|
||
Args: | ||
learning_rate (float|LearningRateDecay, optional): The learning rate used to update ``Parameter``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float|LearningRateDecay 还是 float|Tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float|LearningRateDecay ,中文文档后续更新
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
参数parameter_list 变为 parameters |
PR types
New features
PR changes
OPs
Describe
完善Adam、Adamax、Optimizer、RMSProp op
新增AdamW op
Optimizer类
参数parameter_list 变为 parameters
参数regularization 变为weight_decay,传入float类型时为L2Decay的系数
set_dict接口变为set_state_dict
动态图下新增step接口,替代minimize
current_step_lr接口变为get_lr
clear_gradicents变为clear_grad,原接口仍存在,作为clear_grad的alias接口
AdamOptimzer变为Adam、AdamaxOptimizer变为Adamax、RMSPropOptimizer变为RMSProp,其余改动与基类Optimizer相同。
新增AdamW类
继承自DecoupledWeightDecay、Adam
中文文档链接:PaddlePaddle/docs#2424