Cosine learning rate schedule - minimum learning rate #1062

RicardoDominguez · 2024-01-08T17:44:46Z

Regarding cosine learning rate decay, in the literature the learning rate is typically decayed to some percentage of the peak learning rate rather than decayed to 0 (e.g., Llama 2 decays the final learning rate down to 10% of the peak learning rate).

I am unsure if axolotl currently offers a straightforward way of setting the minimum learning rate when using cosine decay. Here is a simple implementation where the user simply needs to specify cosine_min_lr_ratio in the config file (e.g., cosine_min_lr_ratio=0.1 decays the final learning rate down to 10% of the peak learning rate).

winglian · 2024-01-08T20:05:39Z

This is great, I think the only thing we may want to do is warn if a user sets cosine_min_lr_ratio and also uses deepspeed, that cosine_min_lr_ratio COULD be ignored depending on whether the user set a scheduler in their deepspeed JSON. the example deepspeed files in axolotl don't have this set, so should be less of an issue if people use those.

winglian · 2024-01-08T20:11:36Z

@RicardoDominguez one last thing, could you update the README to include this option in the documented yaml options please?

RicardoDominguez · 2024-01-09T15:51:27Z

Great, thanks @winglian!

winglian approved these changes Jan 9, 2024

View reviewed changes

RicardoDominguez added 3 commits January 9, 2024 08:40

Cosine min lr

71341e2

Cosine min lr - warn if using deepspeed

7174ddc

cosine_min_lr_ratio readme

2796103

winglian force-pushed the cosine_min_lr branch from 2eb9f6e to 2796103 Compare January 9, 2024 13:40

chore: lint

4f37261

winglian merged commit 04b978b into axolotl-ai-cloud:main Jan 9, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cosine learning rate schedule - minimum learning rate #1062

Cosine learning rate schedule - minimum learning rate #1062

RicardoDominguez commented Jan 8, 2024

winglian commented Jan 8, 2024

winglian commented Jan 8, 2024

RicardoDominguez commented Jan 9, 2024

Cosine learning rate schedule - minimum learning rate #1062

Cosine learning rate schedule - minimum learning rate #1062

Conversation

RicardoDominguez commented Jan 8, 2024

winglian commented Jan 8, 2024

winglian commented Jan 8, 2024

RicardoDominguez commented Jan 9, 2024