Adjust learning rate when batch size changes #51

lukeyeager · 2015-04-07T19:03:52Z

See discussion in #44.

As Alex Krizhevsky explains in his paper One weird trick for parallelizing convolutional neural networks, the learning rate, momentum and weight decay are all dependent on the batch size (see section 5, page 5). It would be nice if DIGITS handled these calculations for you automatically so that you don't have to worry about it.

The issue is that different networks have different default learning rates and batch sizes. Is there a standard equation that fits all networks?

mrgloom · 2016-09-20T12:32:49Z

Also discussion about this BVLC/caffe#430

In theory when you reduce the batch_size by a factor of X then you should increase the base_lr by a factor of sqrt(X), but Alex have used a factor of X (see http://arxiv.org/abs/1404.5997)

yxchng · 2017-08-10T04:31:35Z

@lukeyeager @mrgloom Is this still relevant with recent paper https://arxiv.org/abs/1706.02677 that says that we should take linear scale, i.e. multiply base_lr by x when batch_size changes by x?

lukeyeager added the enhancement label Apr 7, 2015

gattia mentioned this issue Feb 23, 2017

Dice function with batch_size>1 faustomilletari/VNet#16

Open

abhaydoke09 mentioned this issue Jul 12, 2017

about learning rate abhaydoke09/Bilinear-CNN-TensorFlow#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust learning rate when batch size changes #51

Adjust learning rate when batch size changes #51

lukeyeager commented Apr 7, 2015

mrgloom commented Sep 20, 2016

yxchng commented Aug 10, 2017

Adjust learning rate when batch size changes #51

Adjust learning rate when batch size changes #51

Comments

lukeyeager commented Apr 7, 2015

mrgloom commented Sep 20, 2016

yxchng commented Aug 10, 2017