-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-593] Adding 2 tutorials on Learning Rate Schedules #11296
Conversation
Very informative and well-written! Good job! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. Great tutorial as usual @thomelane.
|
||
# Learning Rate Schedules | ||
|
||
Setting the learning rate for stochastic gradient descent (SGD) is crucially important when training neural network because it controls both the speed of convergence and the ultimate performance of the network. One of the simplest learning rate strategies is to have a fixed learning rate throughout the training process. Choosing a small learning rate allows the optimizer find good solutions but this comes at the expense of limiting the initial speed of convergence. Changing the learning rate over time can overcome this tradeoff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"when training neural network" -> "when training neural networks"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"good solutions but this" -> "good solutions, but this"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected.
|
||
def __call__(self, iteration): | ||
if iteration <= self.cycle_length: | ||
unit_cycle = (1 + math.cos(iteration*math.pi/self.cycle_length))/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could add spaces around * and / operators to be consistent (and pass some linters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected throughout.
|
||
Epoch: 8; Batch 0; Loss 0.039928; LR 0.000300 <!--notebook-skip-line--> | ||
|
||
Epoch: 9; Batch 0; Loss 0.003349; LR 0.000300 <!--notebook-skip-line--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Epoch: 9; Batch 0; Loss 0.003349; LR 0.000300 should be Epoch: 9; Batch 0; Loss 0.003349; LR 0.0000300
I think right? (i.e. there's a missing 0 in LR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! Was due to schedule changing after defined step, not before, essentially expecting the iteration counter to start at 1 rather than 0. Corrected this error and clarified throughout.
self.inc_fraction = inc_fraction | ||
|
||
def __call__(self, iteration): | ||
if iteration <= self.cycle_length*self.inc_fraction: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: again I would be consistent and alway have a space in between operators *, /, etc.
Same comment for code below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed throughout.
|
||
#### 1-Cycle: for "Super-Convergence" | ||
|
||
Cool-down is used in the "1-Cycle" schedule proposed by [Leslie N. Smith, Nicholay Topin (2017)](https://arxiv.org/abs/1708.07120) to train neural networks very quickly in certain circumstances (coined "super-convergence"). We implement a single and symmetric cycle of the triangular schedule above (i.e. `inc_fraction=0.5`), followed by a cool-down period of `cooldown_length` iterations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super-convergence is also introduced in the previous section's description. Seems a bit redundant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, removed.
@KellenSunderland thanks for the review and feedback! Made changes as suggested, and fixed off by one. |
@indhub believe this is good to merge, since made adjustments as per @KellenSunderland feedback. |
* Added two tutorials on learning rate schedules; basic and advanced. * Correcting notebook skip line. * Corrected cosine graph * Changes based on @KellenSunderland feedback.
Description
First tutorial is on learning rate schedules in
mxnet.lr_scheduler
, how to use them with Optimizer and Trainer in Gluon and how to implement a custom learning rate schedule. Contains visualizations of the schedules.Second tutorial shows how to use common learning rate schedules such as Cyclical schedules, SDG with Restarts, Warm-up and Cool-down. Many paper references are provided. Contains visualizations of the schedules.
Added to tutorial testing and index page.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments