[MXNET-593] Adding 2 tutorials on Learning Rate Schedules #11296

thomelane · 2018-06-15T00:12:33Z

Description

First tutorial is on learning rate schedules in mxnet.lr_scheduler, how to use them with Optimizer and Trainer in Gluon and how to implement a custom learning rate schedule. Contains visualizations of the schedules.

Second tutorial shows how to use common learning rate schedules such as Cyclical schedules, SDG with Restarts, Warm-up and Cool-down. Many paper references are provided. Contains visualizations of the schedules.

Added to tutorial testing and index page.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

mklissa · 2018-06-15T22:27:34Z

Very informative and well-written! Good job!

KellenSunderland

Some minor comments. Great tutorial as usual @thomelane.

KellenSunderland · 2018-06-17T12:40:29Z

docs/tutorials/gluon/learning_rate_schedules.md

+
+# Learning Rate Schedules
+
+Setting the learning rate for stochastic gradient descent (SGD) is crucially important when training neural network because it controls both the speed of convergence and the ultimate performance of the network. One of the simplest learning rate strategies is to have a fixed learning rate throughout the training process. Choosing a small learning rate allows the optimizer find good solutions but this comes at the expense of limiting the initial speed of convergence. Changing the learning rate over time can overcome this tradeoff.


"when training neural network" -> "when training neural networks"

"good solutions but this" -> "good solutions, but this"

KellenSunderland · 2018-06-17T13:11:11Z

docs/tutorials/gluon/learning_rate_schedules.md

+
+    def __call__(self, iteration):
+        if iteration <= self.cycle_length:
+            unit_cycle = (1 + math.cos(iteration*math.pi/self.cycle_length))/2


nit: could add spaces around * and / operators to be consistent (and pass some linters).

Corrected throughout.

KellenSunderland · 2018-06-17T13:21:06Z

docs/tutorials/gluon/learning_rate_schedules.md

+
+Epoch: 8; Batch 0; Loss 0.039928; LR 0.000300 <!--notebook-skip-line-->
+
+Epoch: 9; Batch 0; Loss 0.003349; LR 0.000300 <!--notebook-skip-line-->


Epoch: 9; Batch 0; Loss 0.003349; LR 0.000300 should be Epoch: 9; Batch 0; Loss 0.003349; LR 0.0000300
I think right? (i.e. there's a missing 0 in LR).

Great catch! Was due to schedule changing after defined step, not before, essentially expecting the iteration counter to start at 1 rather than 0. Corrected this error and clarified throughout.

KellenSunderland · 2018-06-17T13:26:23Z

docs/tutorials/gluon/learning_rate_schedules_advanced.md

+        self.inc_fraction = inc_fraction
+
+    def __call__(self, iteration):
+        if iteration <= self.cycle_length*self.inc_fraction:


Nit: again I would be consistent and alway have a space in between operators *, /, etc.
Same comment for code below.

Changed throughout.

KellenSunderland · 2018-06-17T13:41:34Z

docs/tutorials/gluon/learning_rate_schedules_advanced.md

+
+#### 1-Cycle: for "Super-Convergence"
+
+Cool-down is used in the "1-Cycle" schedule proposed by [Leslie N. Smith, Nicholay Topin (2017)](https://arxiv.org/abs/1708.07120) to train neural networks very quickly in certain circumstances (coined "super-convergence"). We implement a single and symmetric cycle of the triangular schedule above (i.e. `inc_fraction=0.5`), followed by a cool-down period of `cooldown_length` iterations.


Super-convergence is also introduced in the previous section's description. Seems a bit redundant here.

Agreed, removed.

thomelane · 2018-06-20T00:15:02Z

@KellenSunderland thanks for the review and feedback! Made changes as suggested, and fixed off by one.

thomelane · 2018-06-22T17:36:58Z

@indhub believe this is good to merge, since made adjustments as per @KellenSunderland feedback.

thomelane · 2018-06-26T16:52:30Z

JIRA: https://issues.apache.org/jira/browse/MXNET-593

@KellenSunderland

* Added two tutorials on learning rate schedules; basic and advanced. * Correcting notebook skip line. * Corrected cosine graph * Changes based on @KellenSunderland feedback.

Added two tutorials on learning rate schedules; basic and advanced.

c981c99

thomelane requested a review from szha as a code owner June 15, 2018 00:12

Correcting notebook skip line.

c847c46

Corrected cosine graph

8e244f6

KellenSunderland reviewed Jun 17, 2018

View reviewed changes

Thom Lane and others added 2 commits June 19, 2018 17:11

Changes based on @KellenSunderland feedback.

59c8262

Merge branch 'master' into tutorials_lr_sched

b4ae296

KellenSunderland approved these changes Jun 25, 2018

View reviewed changes

thomelane changed the title ~~Adding 2 tutorials on Learning Rate Schedules~~ [MXNET-593] Adding 2 tutorials on Learning Rate Schedules Jun 26, 2018

indhub merged commit e494cee into apache:master Jun 26, 2018

thomelane deleted the tutorials_lr_sched branch January 11, 2019 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-593] Adding 2 tutorials on Learning Rate Schedules #11296

[MXNET-593] Adding 2 tutorials on Learning Rate Schedules #11296

thomelane commented Jun 15, 2018

mklissa commented Jun 15, 2018

KellenSunderland left a comment

KellenSunderland Jun 17, 2018

KellenSunderland Jun 17, 2018

thomelane Jun 20, 2018

KellenSunderland Jun 17, 2018

thomelane Jun 19, 2018

KellenSunderland Jun 17, 2018

thomelane Jun 19, 2018

KellenSunderland Jun 17, 2018

thomelane Jun 19, 2018

KellenSunderland Jun 17, 2018

thomelane Jun 19, 2018

thomelane commented Jun 20, 2018

thomelane commented Jun 22, 2018

thomelane commented Jun 26, 2018


		# Learning Rate Schedules

		Setting the learning rate for stochastic gradient descent (SGD) is crucially important when training neural network because it controls both the speed of convergence and the ultimate performance of the network. One of the simplest learning rate strategies is to have a fixed learning rate throughout the training process. Choosing a small learning rate allows the optimizer find good solutions but this comes at the expense of limiting the initial speed of convergence. Changing the learning rate over time can overcome this tradeoff.


		Epoch: 8; Batch 0; Loss 0.039928; LR 0.000300 <!--notebook-skip-line-->

		Epoch: 9; Batch 0; Loss 0.003349; LR 0.000300 <!--notebook-skip-line-->


		#### 1-Cycle: for "Super-Convergence"

		Cool-down is used in the "1-Cycle" schedule proposed by [Leslie N. Smith, Nicholay Topin (2017)](https://arxiv.org/abs/1708.07120) to train neural networks very quickly in certain circumstances (coined "super-convergence"). We implement a single and symmetric cycle of the triangular schedule above (i.e. `inc_fraction=0.5`), followed by a cool-down period of `cooldown_length` iterations.

[MXNET-593] Adding 2 tutorials on Learning Rate Schedules #11296

[MXNET-593] Adding 2 tutorials on Learning Rate Schedules #11296

Conversation

thomelane commented Jun 15, 2018

Description

Checklist

Essentials

Changes

Comments

mklissa commented Jun 15, 2018

KellenSunderland left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomelane commented Jun 20, 2018

thomelane commented Jun 22, 2018

thomelane commented Jun 26, 2018