Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Lrsched missing step #4392

Closed
wants to merge 8 commits into from
Closed

Lrsched missing step #4392

wants to merge 8 commits into from

Conversation

juderoque
Copy link
Contributor

@juderoque juderoque commented Mar 3, 2022

Patch description
Follow up from #4384: Unrelaxed relaxed conditions in tests and modified scheduler logic to fit unrelaxed conditions.

  • changed self._number_training_updates < self.warmup_updates --> self._number_training_updates <= self.warmup_updates to hit the exact max_lr.
    • In this case the lr doesn't anneal to 0 due to a missing step.
  • Modified stopping conditions in parlai/nn/lr_scheduler.py and parlai/scripts/train.py to allow the missing step to be taken
  • Modified test_lr_schedulers.py to step from 1 -> total_steps rather than 0 -> total_steps - 1 to match the behavior in lr_scheduler.py

Testing steps

parlai tm -t convai2 -m transformer/generator --lr-scheduler linear --warmup-updates 10 -lstep 1 -vstep 10000000 --max-lr-steps 100 --skip-generation True --warmup-rate 0.01 -lr 0.00001 --dict-file /tmp/test123.dict

should now show 100 steps instead of 99

Other information
Still not sure if this is the desired behavior, is there an implicit step (0) taken?

Copy link
Contributor

@emilydinan emilydinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fix --great work!!

Can you add some tests with max_lr != 1? Also can you add additional tests to check that LR < max_lr at output[warmup_updates]

@juderoque
Copy link
Contributor Author

Tests added!

@juderoque
Copy link
Contributor Author

juderoque commented Mar 3, 2022

I noticed that the step counter used by this function (https://github.com/facebookresearch/ParlAI/blob/main/parlai/nn/lr_scheduler.py#L294) starts out at 1. This means the first step of the warmup value is 1 step ahead of the specified value, and the first step of the regular scheduler is also one step ahead of the specified max_lr value. In this patch I have made it so the last step of the warmup scheduler hits the max_lr. An alternative behavior could be that the steps start from 0, the last step of the warmup scheduler is "one step before" the max_lr, and the regular scheduler starts out at max_lr. This distinction is important in the case where there are 0 warmup-updates because here we never actually the specified max_lr, rather start at one step after it.
Thoughts @stephenroller @emilydinan ?

Edit: it is possible that PyTorch's native behavior is to start the counter at 1, in which case I wouldn't say we mess with this

Edit 2: in this current patch, setting warmup-updates=1 has the behavior I'd intuitively expect from warmup-updates=0, warmup-updates=2 gives the behavior i'd expect from warmup-updates=1, etc.

@stephenroller
Copy link
Contributor

Generally I prefer backwards compatibility even if it’s esoteric.

@juderoque juderoque requested a review from emilydinan March 4, 2022 01:42
@juderoque
Copy link
Contributor Author

juderoque commented Mar 4, 2022

Generally I prefer backwards compatibility even if it’s esoteric.

@stephenroller Do you think my adding the extra step in train_model and lr_schedulers ruin backwards compatibility?

Copy link
Contributor

@emilydinan emilydinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great job @juderoque !

@stephenroller
Copy link
Contributor

I defer to Emily. She's thought way more about this

@stephenroller
Copy link
Contributor

Bump

@stephenroller
Copy link
Contributor

(go ahead and merge main into this to fix tests, please)

@stephenroller
Copy link
Contributor

Should we reconsider this?

@github-actions
Copy link

github-actions bot commented May 3, 2022

This PR has not had activity in 30 days. Closing due to staleness.

@github-actions github-actions bot added the stale label May 3, 2022
@github-actions github-actions bot closed this May 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants