Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APOLLO optimizer #2175

Open
5 tasks done
fblgit opened this issue Dec 11, 2024 · 7 comments
Open
5 tasks done

APOLLO optimizer #2175

fblgit opened this issue Dec 11, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@fblgit
Copy link

fblgit commented Dec 11, 2024

⚠️ Please check that this feature request hasn't been suggested before.

  • I searched previous Ideas in Discussions didn't find any similar feature requests.
  • I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

Any plan or timeline for APOLLO optimizer?

https://arxiv.org/abs/2412.05270

looks like.. very interesting

✔️ Solution

Implementing https://arxiv.org/abs/2412.05270

❓ Alternatives

buy me 4x gpu :D

📝 Additional Context

No response

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this feature has not been requested yet.
  • I have provided enough information for the maintainers to understand and evaluate this request.
@fblgit fblgit added the enhancement New feature or request label Dec 11, 2024
@winglian
Copy link
Collaborator

Thanks @fblgit! We'll look into this and see how feasible this is without a reference implementation

@fizzAI
Copy link

fizzAI commented Jan 21, 2025

https://github.com/zhuhanqing/APOLLO/tree/main/apollo_torch there's a reference implementation now
would be cool to see :3

@ehartford
Copy link
Collaborator

FYI implementing an optimizer is quite independent from Axolotl. It's a huggingface transformers thing.

Check out how I implemented grokAdamW

huggingface/transformers#32521

@fizzAI
Copy link

fizzAI commented Jan 22, 2025

Yes, but axo already has some optimizers of its own that it patches into TF's on its own, ie ADOPT, and it's probably much easier to get a PR doing that working here than dealing with upstream transformers

@zhuhanqing
Copy link

zhuhanqing commented Feb 9, 2025

@fblgit @fizzAI Thank you for raising a request to add APOLLO!
@winglian I am the author of APOLLO: SGD-like Memory, AdamW-level Performance. Our optimizer has been integrated into LLaMA-Factory, FluxML, and with a validated performance in the post. We are also undergoing integration into Huggingface Transformers.

We use SGD-like memory cost but can achieve on-par or even better performance than AdamW, validated on both pre-training and fine-tuning (even pre-training a LLaMA-7B model). Please let us know if you want to integrate APOLLO and have any issues during integration!

@fizzAI
Copy link

fizzAI commented Feb 9, 2025

@zhuhanqing hai!
whenever the transformers pr gets pulled, apollo should be supported in axo too because it pulls most of its optims from tf

@fizzAI
Copy link

fizzAI commented Feb 14, 2025

This issue can be closed now that the transformers PR is merged :) it just works ootb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants