-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2.0API] Reconstruct all API related to LR Scheduler, unify dygraph and static #26550
[2.0API] Reconstruct all API related to LR Scheduler, unify dygraph and static #26550
Conversation
Thanks for your contribution! |
""" | ||
self.keys = ['last_epoch', 'last_lr'] | ||
|
||
def set_dict(self, state_dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
上次有说过建立一个别名,set_state_dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
|
||
Args: | ||
d$_{model}$(int): The dimensionality of input and output feature vector of model. It is a python float number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么要写吃d$_{model} 这种了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为了文档上model是下标的形式
d$_{model}$(int): The dimensionality of input and output feature vector of model. It is a python float number. | ||
warmup_steps(Variable|int): The number of warmup steps. A super parameter. It is a python float number | ||
learning_rate (float): The initial learning rate. It is a python float number. Default: 1.0. | ||
last_epoch (int, optional): If ``True``, prints a message to stdout for each update. Default: -1, means initial learning rate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看pytorch的实现方式last_epoch,是指如果想重启训练时,可以设置重启训练的epoch数然后来计算学习率,而等于-1时,默认的学习率就是初始学习率
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
Args: | ||
d$_{model}$(int): The dimensionality of input and output feature vector of model. It is a python float number. | ||
warmup_steps(Variable|int): The number of warmup steps. A super parameter. It is a python float number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable->Tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
last_epoch=last_epoch, verbose=verbose) | ||
|
||
def get_lr(self): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以把这行去掉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
learning_rate (float): The initial learning rate. It is a python float number. | ||
gamma (float, optional): The Ratio that the learning rate will be reduced. ``new_lr = origin_lr * decay_rate`` . | ||
It should be less than 1.0. Default: 0.1. | ||
last_epoch (int, optional): If ``True``, prints a message to stdout for each update. Default: -1, means initial learning rate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
python/paddle/fluid/optimizer.py
Outdated
lr_var = self._global_learning_rate() | ||
# only create global lr_var once | ||
if not isinstance(lr_var, framework.Variable): | ||
print("create global learning rate") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这行日志去掉吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
persistable=True, | ||
stop_gradient=True, | ||
dtype='float32' if self._dtype is None else self._dtype) | ||
main_prog = framework.default_main_program() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么是main_program, 如果不是main_program会不会有问题?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果optimizer op在哪个program就要设在放对应program里,被设置了这个属性的program会在每次executor run时,会feed相应float型学习率到对应Variable里->前向->反向->优化,跟着optimize op走的
d5ab480
to
818692d
Compare
818692d
to
6cb899b
Compare
def step(self, epoch=None): | ||
""" | ||
step should be called after 'minimize' . It will Update the learning rate in optimizer according to 'epoch'. | ||
The new learning rate will take effect on next optimize operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update->update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minimize
-> step
后续优化器也是调用step函数
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
learning_rate = 0.1 | ||
|
||
Args: | ||
learning_rate (float): The initial learning rate. It is a python float number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
learning_rate 好像不在初始化参数列表中
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
decay_steps(int): The decay step size. It determines the decay cycle. | ||
end_lr(float, optional): The minimum final learning rate. Default: 0.0001. | ||
power(float, optional): Power of polynomial. Default: 1.0. | ||
cycle(bool, optional): If set true, decay the learning rate every decay_steps. Default: False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cycle 这个解释有问题,可以看一下PolynomialDecay的解释
|
||
class LinearLrWarmup(_LRScheduler): | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里缺少一些该学习率的介绍,之前的API是有解释的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
paddle.disable_static() | ||
x = np.random.uniform(-1, 1, [10, 10]).astype("float32") | ||
linear = paddle.nn.Linear(10, 10) | ||
scheduler = paddle.optimizer.NoamLR(d_model=0.01, warmup_steps=100, verbose=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
区分下Optimizer
paddle.optimizer.lr_scheduler.NoamLR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
out = linear(x) | ||
loss = paddle.reduce_mean(out) | ||
out.backward() | ||
sgd.minimize(loss) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
原来的写法还可以用,动态图下推荐用新的写法:
sgd.step()
sgd.clear_grad()
静态图下的minimize和动态图下的minimize虽然函数名相同,但两者区别较大:
- 静态图minimize只被调用一次,动态图会被反复调用
- 静态图需要传入loss参数,动态图不需要
所以动态图下新增了一个step函数
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前sgd大部分optimizer还不支持step
x = paddle.to_tensor(x) | ||
out = linear(x) | ||
loss = paddle.reduce_mean(out) | ||
out.backward() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loss.backward()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
x = np.random.uniform(-1, 1, [10, 10]).astype("float32") | ||
linear = paddle.nn.Linear(10, 10) | ||
scheduler = paddle.optimizer.NoamLR(d_model=0.01, warmup_steps=100, verbose=True) | ||
sgd = paddle.optimizer.SGD(learning_rate=scheduler, parameter_list=linear.parameters()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimizer使用新的参数名称
parameter_list -> parameters
#26288
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下个PR统一修改文档
main_prog = paddle.static.Program() | ||
start_prog = paddle.static.Program() | ||
with paddle.static.program_guard(main_prog, start_prog): | ||
x = paddle.static.data(name='x', shape=[-1, 4, 5]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shape=[None, 4, 5]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
scheduler = paddle.optimizer.NoamLR(d_model=0.01, warmup_steps=100, verbose=True) | ||
sgd = paddle.optimizer.SGD(learning_rate=scheduler) | ||
sgd.minimize(loss) | ||
lr_var = sgd._global_learning_rate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么需要调用一个内部的函数?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done,删去
'x': np.random.randn(3, 4, 5).astype('float32'), | ||
'y': np.random.randn(3, 4, 5).astype('float32') | ||
}, | ||
fetch_list=lr_var.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么需要fetch lr_var? 并没有看到有使用返回的out。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done,删去
self._parameter_list = list( | ||
parameter_list) if parameter_list is not None else None | ||
self._name = name | ||
if framework.in_dygraph_mode(): | ||
if not isinstance(learning_rate, float) and \ | ||
not isinstance(learning_rate, LearningRateDecay): | ||
if not isinstance(learning_rate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么修改的是paddle.fluid.optimizer.py文件,而不是paddle.optimizer.optimizer.py文件?
1.8版本写的代码,运行的行为会发生变化。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
新optimizer目前不支持大部分优化器,通知迁移优化器同学将fluid 中optimizer行为迁移到paddle optimizer中。
是做的兼容升级,1.8中不会有行为变化,但支持新的逻辑。
文档修改在下个PR统一修复 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
先合入,下个PR更新示例代码。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
will have followup pr.
|
||
Args: | ||
learning_rate (float): The initial learning rate. It is a python float number. | ||
gamma (float, optional): The Ratio that the learning rate will be reduced. ``new_lr = origin_lr * gamma`` . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看init 是必选参数吧?
gamma (float, optional): The Ratio that the learning rate will be reduced. ``new_lr = origin_lr * gamma`` . | ||
It should be less than 1.0. Default: 0.1. | ||
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate. | ||
verbose (bool): If ``True``, prints a message to stdout for each update. Default: ``False`` . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
Args: | ||
learning_rate (float): The initial learning rate. It is a python float number. | ||
gamma (float, optional): The Ratio that the learning rate will be reduced. ``new_lr = origin_lr * gamma`` . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gamma 是否为 optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的
gamma (float, optional): The Ratio that the learning rate will be reduced. ``new_lr = origin_lr * gamma`` . | ||
It should be less than 1.0. Default: 0.1. | ||
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate. | ||
verbose (bool): If ``True``, prints a message to stdout for each update. Default: ``False`` . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上 缺少optional
learning_rate (float): The initial learning rate. It is a python float number. | ||
lr_lambda (function): A function which computes a factor by ``epoch`` , and then multiply the initial learning rate by this factor. | ||
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate. | ||
verbose (bool): If ``True``, prints a message to stdout for each update. Default: ``False`` . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上 缺少optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALL Done
warmup_steps(int): The number of warmup steps. A super parameter. It is a python int number | ||
learning_rate (float): The initial learning rate. It is a python float number. Default: 1.0. | ||
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate. | ||
verbose (bool): If ``True``, prints a message to stdout for each update. Default: ``False`` . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同 optional
values(list): A list of learning rate values that will be picked during different epoch boundaries. | ||
The type of element in the list is python float. | ||
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate. | ||
verbose (bool): If ``True``, prints a message to stdout for each update. Default: ``False`` . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同 optional
cycle(bool, optional): Whether the learning rate rises again. If True, then the learning rate will rise when it decrease | ||
to ``end_lr`` . If False, the learning rate is monotone decreasing. Default: False. | ||
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate. | ||
verbose (bool): If ``True``, prints a message to stdout for each update. Default: ``False`` . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同 optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aLL DONE
change of ``loss`` is ``threshold`` . Default: ``'rel'`` . | ||
cooldown (int, optional): The number of epochs to wait before resuming normal operation. Default: 0. | ||
min_lr (float, optional): The lower bound of the learning rate after reduction. Default: 0. | ||
epsilon (float, optional): Minimal decay applied to lr. If the difference between new and old lr is smaller than eps, the update is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
smaller than epsilon
gamma (float, optional): The Ratio that the learning rate will be reduced. ``new_lr = origin_lr * gamma`` . | ||
It should be less than 1.0. Default: 0.1. | ||
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate. | ||
verbose (bool): If ``True``, prints a message to stdout for each update. Default: ``False`` . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上 optional
lr scheduler都有一个 |
这个功能感觉还比较实用 |
PR types
New features
PR changes
APIs
Describe
Reconstruct all API related to lr scheduler, A total of 12 kinds of
class _LRScheduler
:Unify dygraph to manual update learning rate by
.step()
function. User should update learning rate manually bystep()
.Unify static with dygraph. User should update learning rate manually by
step()
afterexecutor.run()
, everyexecutor.run()
will feed the python float value oflr_scheduler
into globallearning_rate
variable.中文文档
PaddlePaddle/docs#2459
英文文档