Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust learning rate when batch size changes #51

Open
lukeyeager opened this issue Apr 7, 2015 · 2 comments
Open

Adjust learning rate when batch size changes #51

lukeyeager opened this issue Apr 7, 2015 · 2 comments

Comments

@lukeyeager
Copy link
Member

See discussion in #44.

As Alex Krizhevsky explains in his paper One weird trick for parallelizing convolutional neural networks, the learning rate, momentum and weight decay are all dependent on the batch size (see section 5, page 5). It would be nice if DIGITS handled these calculations for you automatically so that you don't have to worry about it.

The issue is that different networks have different default learning rates and batch sizes. Is there a standard equation that fits all networks?

@mrgloom
Copy link

mrgloom commented Sep 20, 2016

Also discussion about this BVLC/caffe#430

In theory when you reduce the batch_size by a factor of X then you should increase the base_lr by a factor of sqrt(X), but Alex have used a factor of X (see http://arxiv.org/abs/1404.5997)

@yxchng
Copy link

yxchng commented Aug 10, 2017

@lukeyeager @mrgloom Is this still relevant with recent paper https://arxiv.org/abs/1706.02677 that says that we should take linear scale, i.e. multiply base_lr by x when batch_size changes by x?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants