Skip to content

A disciplined approach to neural network parameters - Reviewing the approach for setting Hyper parameters by Leslie Smith

Notifications You must be signed in to change notification settings

asvcode/1_cycle

Repository files navigation

A Disciplined Approach to Neural Network Hyper-parameters: Part 1 - Learning Rate, Batch Size, Momentum and Weight Decay

  • Reviewing the approach for setting Hyperparameters by Leslie Smith.
  • 'Setting the hyper-parameters remains a black art that requires years of experience to acquire' - Leslie Smith

You can review the paper here: (https://arxiv.org/abs/1803.09820)

The 1 cycle policy involves a cycle with 2 steps of equal length: Step 1 where the learning rate increases linearly from the maximum to the minimum and Step 2 where it linearly decreases.

The peak in the middle of the cycle (at 100 iterations) acts as a regularization method to prevent overfitting

Batch Size and Learning Rate Analysis

Low BS and High LR as well as High BS and High LR produce the highest accuracy

Learning and Validation Loss Analysis based on Weight Decay

Pictorial explanation of the tradeoff between underfitting and overfitting

  • The graphs from left to right potray Training (Orange) and Validation (Blue) Loss plots with a Weight Decay(wds) of 1e5, 1e4, 1e3 and 1e2
  • The graphs show that the Training loss is above the Validation loss when the wds is 1e5 and 1e4 but the two losses then intersect when the wds is 1e3 and 1e2

About

A disciplined approach to neural network parameters - Reviewing the approach for setting Hyper parameters by Leslie Smith

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published