Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cosine learning rate schedule - minimum learning rate #1062

Merged
merged 4 commits into from
Jan 9, 2024

Conversation

RicardoDominguez
Copy link
Contributor

Regarding cosine learning rate decay, in the literature the learning rate is typically decayed to some percentage of the peak learning rate rather than decayed to 0 (e.g., Llama 2 decays the final learning rate down to 10% of the peak learning rate).

I am unsure if axolotl currently offers a straightforward way of setting the minimum learning rate when using cosine decay. Here is a simple implementation where the user simply needs to specify cosine_min_lr_ratio in the config file (e.g., cosine_min_lr_ratio=0.1 decays the final learning rate down to 10% of the peak learning rate).

image

@winglian
Copy link
Collaborator

winglian commented Jan 8, 2024

This is great, I think the only thing we may want to do is warn if a user sets cosine_min_lr_ratio and also uses deepspeed, that cosine_min_lr_ratio COULD be ignored depending on whether the user set a scheduler in their deepspeed JSON. the example deepspeed files in axolotl don't have this set, so should be less of an issue if people use those.

@winglian
Copy link
Collaborator

winglian commented Jan 8, 2024

@RicardoDominguez one last thing, could you update the README to include this option in the documented yaml options please?

@winglian winglian merged commit 04b978b into axolotl-ai-cloud:main Jan 9, 2024
6 checks passed
@RicardoDominguez
Copy link
Contributor Author

Great, thanks @winglian!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants