You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's great that you're looking into this, @adefazio . Schedule-free Adam was strong, and now AdEMAMix is giving me great results too. If it turns out it's possible to combine their advantages, that would be amazing.
code:https://github.com/nanowell/AdEMAMix-Optimizer-Pytorch
8bit version from bnb:https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/optim/ademamix.py
Tests have shown that AdEMAMix is better than AdamW and has little to no increase in memory.
The text was updated successfully, but these errors were encountered: