Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FeatureRequest]Add AdEMAMixScheduleFree #46

Open
sdbds opened this issue Sep 25, 2024 · 3 comments
Open

[FeatureRequest]Add AdEMAMixScheduleFree #46

sdbds opened this issue Sep 25, 2024 · 3 comments

Comments

@sdbds
Copy link

sdbds commented Sep 25, 2024

image

code:https://github.com/nanowell/AdEMAMix-Optimizer-Pytorch

8bit version from bnb:https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/optim/ademamix.py

Tests have shown that AdEMAMix is better than AdamW and has little to no increase in memory.

@adefazio
Copy link
Contributor

Cool! I'll look into this.

@araleza
Copy link

araleza commented Sep 27, 2024

It's great that you're looking into this, @adefazio . Schedule-free Adam was strong, and now AdEMAMix is giving me great results too. If it turns out it's possible to combine their advantages, that would be amazing.

@And233
Copy link

And233 commented Nov 17, 2024

Cool! I'll look into this.

when will the AdEMAmixScheduleFree come true? can it be achieved by just connecting AdEMAmix with ScheduleFreeWrapper?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants