-
-
Notifications
You must be signed in to change notification settings - Fork 962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
APOLLO optimizer #2175
Comments
Thanks @fblgit! We'll look into this and see how feasible this is without a reference implementation |
https://github.com/zhuhanqing/APOLLO/tree/main/apollo_torch there's a reference implementation now |
FYI implementing an optimizer is quite independent from Axolotl. It's a huggingface transformers thing. Check out how I implemented grokAdamW |
Yes, but axo already has some optimizers of its own that it patches into TF's on its own, ie ADOPT, and it's probably much easier to get a PR doing that working here than dealing with upstream transformers |
@fblgit @fizzAI Thank you for raising a request to add APOLLO! We use SGD-like memory cost but can achieve on-par or even better performance than AdamW, validated on both pre-training and fine-tuning (even pre-training a LLaMA-7B model). Please let us know if you want to integrate APOLLO and have any issues during integration! |
@zhuhanqing hai! |
This issue can be closed now that the transformers PR is merged :) it just works ootb |
🔖 Feature description
Any plan or timeline for APOLLO optimizer?
https://arxiv.org/abs/2412.05270
looks like.. very interesting
✔️ Solution
Implementing https://arxiv.org/abs/2412.05270
❓ Alternatives
buy me 4x gpu :D
📝 Additional Context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: