How to set learning rate for SimCLR with multiple GPU training #373

FrankXinqi · 2020-11-17T02:14:34Z

FrankXinqi
Nov 17, 2020

❓ Questions and Help

What is your question?

How to set learning rate for SimCLR (or more general for other contrastive leraning as well) with multiple GPU training?

In SimCLR paper, authors suggests to use learning rate 0.3*batch_size/256.

In my case, I am trying on 4-2080Ti with batch_size=64 each, so the effective batch_size=256. Therefore, following the formula from the SimCLR paper, I should set around 0.3. However, if I do that the validation loss osscilates a lot and end up with a large loss with 1000 epochs training. In addition, the official PyTorch Lightning set lr=0.0001 for batch_size=32, which is also different from the suggested learning rate.

Since I currently cannot make online evaluator work and training takes a very long time, I am curious for getting some senses for setting the learning rate? How should I set my learning rate? Would there be any differences when I use 1 or 4 GPUS

Train loss with lr=0.0001, 4-GPU effective batch_size=256, epoch=1000

Validation loss with lr=0.0001, 4-GPU effective batch_size=256, epoch=1000

Train loss with lr=0.1, 4-GPU effective batch_size=256, epoch=1000

Validation loss with lr=0.1, 4-GPU effective batch_size=256, epoch=1000

Code

PyTorch Lightning code for SimCLR.

What have you tried?

Please see the problem description

What's your environment?

OS: Ubuntu
Packaging pip, conda
Version [e.g. 0.5.2.1]

Answered by ananyahjha93

Nov 18, 2020

@FrankXinqi For SimCLR with a batch size of 256, you can use the regular adam optimizer with learning rates 1e-4/1e-4. LARS is preferred for bigger batch sizes like 1024 and above.

Also, try using the new updated simclr from the master branch, the online fine tuning is fixed.

We'll soon have the imagenet weights in as well for SimCLR.

Also, swav has the provision of a queue to run with a batch size of 256 and the authors have shown swav to be much more robust to small batch sizes. That is something you might wanna look at.

View full answer

ananyahjha93 · 2020-11-18T17:18:24Z

ananyahjha93
Nov 18, 2020

@FrankXinqi For SimCLR with a batch size of 256, you can use the regular adam optimizer with learning rates 1e-4/1e-4. LARS is preferred for bigger batch sizes like 1024 and above.

Also, try using the new updated simclr from the master branch, the online fine tuning is fixed.

We'll soon have the imagenet weights in as well for SimCLR.

Also, swav has the provision of a queue to run with a batch size of 256 and the authors have shown swav to be much more robust to small batch sizes. That is something you might wanna look at.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set learning rate for SimCLR with multiple GPU training #373

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to set learning rate for SimCLR with multiple GPU training #373

FrankXinqi Nov 17, 2020

❓ Questions and Help

What is your question?

Code

What have you tried?

What's your environment?

Replies: 1 comment

ananyahjha93 Nov 18, 2020

FrankXinqi
Nov 17, 2020

ananyahjha93
Nov 18, 2020