Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Warmup updates bug for LR < 1 #4384

Merged
merged 3 commits into from
Feb 28, 2022
Merged

Warmup updates bug for LR < 1 #4384

merged 3 commits into from
Feb 28, 2022

Conversation

emilydinan
Copy link
Contributor

@emilydinan emilydinan commented Feb 28, 2022

Patch description
#4242 introduced a bug in which _is_warming_up returns True if the last LR is < 1. This will introduce a bug for all initial LRs != 1. In particular, for models with initial learning rate < 1, the model will be "warming up" forever, and the LR will remain constant after the warmup updates have finished rather than starting to decay according to the provided schedule.

Testing steps

parlai tm -t convai2 -m transformer/generator --lr-scheduler linear --warmup-updates 10 -lstep 1 -vstep 10000000 --max-lr-steps 100 --skip-generation True --warmup-rate 0.01 -lr 0.00001 --dict-file /tmp/test123.dict -mf /tmp/test1234dsfsdf5

I had to relax restrictions to get tests to pass. If we change self._number_training_updates < self.warmup_updates -->, self._number_training_updates <= self.warmup_updates, we hit the exact max LR, but don't quite anneal to zero. Will leave it to follow up PR (Jude) to test this more robustly

Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty

@emilydinan
Copy link
Contributor Author

Tests pass locally

@emilydinan emilydinan merged commit 9d77adb into main Feb 28, 2022
@emilydinan emilydinan deleted the lrschedbug branch February 28, 2022 22:43
@stephenroller
Copy link
Contributor

@klshuster and @jxmsML confirm this does NOT affect reduceonplataeu

@stephenroller
Copy link
Contributor

@spencerp found this did NOT affect invsqrt

@juderoque juderoque mentioned this pull request Mar 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants