Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer #31870

richardodliu · 2024-07-09T19:07:27Z

What does this PR do?

Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer, which is based on #29341.

As mentioned in the previous PR, it implemented "a warmup period during which it increases linearly between 0 and the initial learning rate set in the optimizer." However, I recently investigated the DeepSpeed framework and noticed that their function implements additional features, namely "warmup_min_ratio" in https://github.com/richardodliu/DeepSpeed/blob/master/deepspeed/runtime/lr_schedules.py#L774, which supports a warmup start learning rate ratio differs from 0. Considering that DeepSpeed is a crucial component framework in the implementation of Transformers, I aim to ensure consistency between the two in the implementation of learning rate schedulers through this PR. This is to prevent potential confusion for users who employ both frameworks simultaneously.

Our implementation is based on improvements from previous PR, providing significant benefits. Specifically, we allow the reuse of this method without modifying any input parameters(Don't worry; if you prefer to specify the parameter, you are certainly allowed to do so by simply passing it as an argument "warmup_lr_rate", which is used to specify the ratio between the start learning rate and the initial learning rate). In such cases, the implementation is equivalent to setting "warmup_lr_rate" to "1/warmup_steps". Regarding enhancements to the training process, since it is recommended to perform "optimizer.step()" before "lr_scheduler.step()", the learning rate for the batch corresponding to the first step would be zero under the previous implementation. This means that the gradients would not be updated for that batch, effectively wasting it. Our updated method addresses this issue. While this phenomenon may not be noticeable with larger dataset sizes, it becomes significant with smaller datasets where the total number of steps is limited. Additionally, our implementation ensures that the final small learning rate is reachable, rather than being updated only after the training is completed. Overall, our approach is better suited as an improvement to the existing method rather than a complete overhaul. Therefore, we implemented our idea by creating a new function, allowing users the flexibility to choose which implementation method they prefer.

Fixes: WarmupCosineLR missed in WarmupCosineLR

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@muellerzr and @SunMarc

HuggingFaceDocBuilderDev · 2024-07-10T18:53:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

richardodliu · 2024-07-22T12:28:49Z

@muellerzr and @SunMarc : Hello, I submitted this PR over a week ago, but it hasn't been reviewed yet. I've checked the unit test results and confirmed that the failing tests don't seem to be caused by my code changes. Additionally, the test cases I've submitted are passing. Therefore, I believe this PR is ready for review.

SunMarc

LGTM ! Thanks for adding this @richardodliu !

richardodliu · 2024-08-15T12:26:27Z

@muellerzr and @ArthurZucker: Hello, I submitted this PR over three weeks ago, and @SunMarc has reviewed the PR. I've checked the unit test results and confirmed that the failing tests don't seem to be caused by my code changes. Additionally, the test cases I've submitted are passing. Therefore, I believe this PR is ready for merge.

muellerzr

Thanks, sorry for the delay. @amyeroberts is our final judgement/on watch :)

Thanks for the addition

amyeroberts

Thanks for adding this!

Just some nits

src/transformers/optimization.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

fix the error of the unclosed "("

richardodliu · 2024-09-01T18:16:16Z

I'm sorry I didn't check the relevant comments in time last fortnight due to my personal commitments, is there anything else I should be doing now? @ArthurZucker

muellerzr · 2024-09-01T18:45:30Z

@richardodliu do pip install -e .[quality]; make fixup and this should make the code quality tests pass!

remove whitespace in line 402 in order to pass the quality test

add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in trainer

3a2fd1d

SunMarc approved these changes Jul 24, 2024

View reviewed changes

SunMarc requested review from muellerzr and ArthurZucker July 24, 2024 16:15

muellerzr approved these changes Aug 15, 2024

View reviewed changes

muellerzr requested a review from amyeroberts August 15, 2024 18:37

amyeroberts approved these changes Aug 15, 2024

View reviewed changes

src/transformers/optimization.py Outdated Show resolved Hide resolved

src/transformers/optimization.py Outdated Show resolved Hide resolved

src/transformers/optimization.py Outdated Show resolved Hide resolved

src/transformers/optimization.py Outdated Show resolved Hide resolved

richardodliu and others added 2 commits September 2, 2024 01:08

Update src/transformers/optimization.py

affbb18

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Update optimization.py

380343e

fix the error of the unclosed "("

Update optimization.py

fead399

remove whitespace in line 402 in order to pass the quality test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer #31870

Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer #31870

richardodliu commented Jul 9, 2024

HuggingFaceDocBuilderDev commented Jul 10, 2024

richardodliu commented Jul 22, 2024

SunMarc left a comment

richardodliu commented Aug 15, 2024

muellerzr left a comment

amyeroberts left a comment

richardodliu commented Sep 1, 2024

muellerzr commented Sep 1, 2024

Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer #31870

Are you sure you want to change the base?

Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer #31870

Conversation

richardodliu commented Jul 9, 2024

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Jul 10, 2024

richardodliu commented Jul 22, 2024

SunMarc left a comment

Choose a reason for hiding this comment

richardodliu commented Aug 15, 2024

muellerzr left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

richardodliu commented Sep 1, 2024

muellerzr commented Sep 1, 2024