API名称 | paddle.optimizer.lr.CyclicLR |
---|---|
提交作者 | Asthestarsfalll |
提交时间 | 2022-03-14 |
版本号 | V1.0 |
依赖飞桨版本 | v2.2.0 |
文件名 | 20220314_design_for_CyclicLR.md |
CyclicLR最早在论文Cyclical Learning Rates for Training Neural Networks中提出,它控制学习率以固定周期在base_lr和max_lr之间变化,它的调整是以batch.step进行的,同时,它还有3种内置的波幅调整策略:“triangular”、“triangular2”、“exp_range”。
在 Paddle 框架中,新增 CyclicLR 优化调度器,调用路径为:paddle.optimizer.lr.CyclicLR。
飞桨支持CyclicLR优化调度器
目前paddle缺少相关功能实现。
Pytorch中有API,以及对应的`torch.optim.lr_scheduler.
CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000, step_size_down=None,mode='triangular',gamma=1.,scale_fn=None,scale_mode='cycle',cycle_momentum=True,base_momentum=0.8, max_momentum=0.9, last_epoch=-1, verbose=False)`.
在pytorch中,介绍为:
Sets the learning rate of each parameter group according to cyclical learning rate policy (CLR). The policy cycles the learning rate between two boundaries with a constant frequency, as detailed in the paper Cyclical Learning Rates for Training Neural Networks. The distance between the two boundaries can be scaled on a per-iteration or per-cycle basis.
Cyclical learning rate policy changes the learning rate after every batch. step should be called after a batch has been used for training.
This class has three built-in policies, as put forth in the paper:
“triangular”: A basic triangular cycle without amplitude scaling.
“triangular2”: A basic triangular cycle that scales initial amplitude by half each cycle.
“exp_range”: A cycle that scales initial amplitude by \text{gamma}^{\text{cycle iterations}}gamma
cycle iterations at each cycle iteration.
This implementation was adapted from the github repo: bckenstler/CLR
在实现方法上, Pytorch是直接通过python实现的,代码位置。
核心代码为:
def get_lr(self):
"""Calculates the learning rate at batch index. This function treats
`self.last_epoch` as the last batch index.
If `self.cycle_momentum` is ``True``, this function has a side effect of
updating the optimizer's momentum.
"""
if not self._get_lr_called_within_step:
warnings.warn("To get the last learning rate computed by the scheduler, "
"please use `get_last_lr()`.", UserWarning)
cycle = math.floor(1 + self.last_epoch / self.total_size)
x = 1. + self.last_epoch / self.total_size - cycle
if x <= self.step_ratio:
scale_factor = x / self.step_ratio
else:
scale_factor = (x - 1) / (self.step_ratio - 1)
lrs = []
for base_lr, max_lr in zip(self.base_lrs, self.max_lrs):
base_height = (max_lr - base_lr) * scale_factor
if self.scale_mode == 'cycle':
lr = base_lr + base_height * self.scale_fn(cycle)
else:
lr = base_lr + base_height * self.scale_fn(self.last_epoch)
lrs.append(lr)
if self.cycle_momentum:
momentums = []
for base_momentum, max_momentum in zip(self.base_momentums, self.max_momentums):
base_height = (max_momentum - base_momentum) * scale_factor
if self.scale_mode == 'cycle':
momentum = max_momentum - base_height * self.scale_fn(cycle)
else:
momentum = max_momentum - base_height * self.scale_fn(self.last_epoch)
momentums.append(momentum)
for param_group, momentum in zip(self.optimizer.param_groups, momentums):
param_group['momentum'] = momentum
return lrs
https://github.com/bckenstler/CLR
该仓库使用tensorflow2进行了实现
介绍为:
A cyclical learning rate is a policy of learning rate adjustment that increases the learning rate off a base value in a cyclical nature. Typically the frequency of the cycle is constant, but the amplitude is often scaled dynamically at either each cycle or each mini-batch iteration.
直接使用python,代码位置
核心代码为:
def clr(self):
cycle = np.floor(1+self.clr_iterations/(2*self.step_size))
x = np.abs(self.clr_iterations/self.step_size - 2*cycle + 1)
if self.scale_mode == 'cycle':
return self.base_lr + (self.max_lr-self.base_lr)*np.maximum(0, (1-x))*self.scale_fn(cycle)
else:
return self.base_lr + (self.max_lr-self.base_lr)*np.maximum(0, (1-x))*self.scale_fn(self.clr_iterations)
Pytorch官方文档中提到参考了上述tensorflow仓库进行修改实现,Pytorch与上述tensorflow仓库的主要区别如下:
- 在该tensorflow仓库中,参数
step_size
表示从base_lr
上升到max_lr
(从max_lr
下降到base_lr
)的所需步数,而Pytorch使用两个参数step_size_up
和step_size_down
来分别表示学习率上升和下降所需的步数; - 引入动量衰减策略。
其余实现基本相同,且Pytorch代码更易理解,故参考Pytorch进行实现。
API设计为paddle.optimizer.lr.CyclicLR(base_learning_rate,max_learning_rate,step_size_up,step_size_down, mode='triangular',exp_gamma=1.,scale_fn=None,scale_mode='cycle',last_epoch=-1,verbose=False)
去除了Pytorch中momentum
的相关参数。
同时,为了保持与paddle其他lrscheduler相关的api保持一致,将base_lr
修改为base_learning_rate
,max_lr
修改为max_learning_rate
。
直接使用python实现,不再单独设计OP。
主要参考Pytorch进行实现,实现位置为paddle/optimizer/lr.py
。
测试考虑的case如下:
- 动态图,静态图,与numpy的结果保持一致;
- 错误检查:
step_size_up
和step_size_down
数值小于等于0时能正确抛出错误; - 错误检查:
scale_mode
不在可选范围内时能正确抛出异常;
方案仅使用python实现,工期上可以满足在当前版本周期内开发完成。
为独立新增API,对其他模块没有影响