You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
First, thank you for the clean and nice implementation specifically for the model part. I have some queries regarding the training of the focal transformer.
In the main.py, the learning rate was scaled for DDP training as
linear_scaled_lr = config.TRAIN.BASE_LR * config.DATA.BATCH_SIZE * dist.get_world_size() / 512.0.
The BASE_LR is set to 5e-4 in config, BATCH_SIZE (per GPU) = 128 and get_world_size()=8 (8 NVIDIA A100 according to paper). So, ultimately,
linear_scaled_lr = (5e-4 * 128 * 8 )/512 = 5e-4 * 2 = 1e-3. I guess that's why the learning rate was written 1e-3 in the paper.
I have 2 questions in this regard.
Is there any particular reason for using 512 as the normalization factor? I found it in the SWIN repo and some other repos too but couldn't find any logical explanation.
Currently, I am trying to use a variant of the Focal Transformer for my research. If I am using 4 GPU and 128 Batch per GPU (in total BATCH_SIZE=512), then do I need to change anything regarding the learning rate scaling?
It would have been great if you could kindly guide me in the above queries.
Thank you.
The text was updated successfully, but these errors were encountered:
Hello,
First, thank you for the clean and nice implementation specifically for the model part. I have some queries regarding the training of the focal transformer.
In the main.py, the learning rate was scaled for DDP training as
linear_scaled_lr = config.TRAIN.BASE_LR * config.DATA.BATCH_SIZE * dist.get_world_size() / 512.0.
The BASE_LR is set to 5e-4 in config, BATCH_SIZE (per GPU) = 128 and get_world_size()=8 (8 NVIDIA A100 according to paper). So, ultimately,
linear_scaled_lr = (5e-4 * 128 * 8 )/512 = 5e-4 * 2 = 1e-3. I guess that's why the learning rate was written 1e-3 in the paper.
I have 2 questions in this regard.
It would have been great if you could kindly guide me in the above queries.
Thank you.
The text was updated successfully, but these errors were encountered: