APOLLO optimizer #2175

fblgit · 2024-12-11T12:28:56Z

⚠️ Please check that this feature request hasn't been suggested before.

I searched previous Ideas in Discussions didn't find any similar feature requests.
I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

Any plan or timeline for APOLLO optimizer?

https://arxiv.org/abs/2412.05270

looks like.. very interesting

✔️ Solution

Implementing https://arxiv.org/abs/2412.05270

❓ Alternatives

buy me 4x gpu :D

📝 Additional Context

No response

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this feature has not been requested yet.
I have provided enough information for the maintainers to understand and evaluate this request.

winglian · 2024-12-12T17:22:32Z

Thanks @fblgit! We'll look into this and see how feasible this is without a reference implementation

fizzAI · 2025-01-21T03:28:53Z

https://github.com/zhuhanqing/APOLLO/tree/main/apollo_torch there's a reference implementation now
would be cool to see :3

ehartford · 2025-01-21T05:13:18Z

FYI implementing an optimizer is quite independent from Axolotl. It's a huggingface transformers thing.

Check out how I implemented grokAdamW

huggingface/transformers#32521

fizzAI · 2025-01-22T13:38:05Z

Yes, but axo already has some optimizers of its own that it patches into TF's on its own, ie ADOPT, and it's probably much easier to get a PR doing that working here than dealing with upstream transformers

zhuhanqing · 2025-02-09T08:51:53Z

@fblgit @fizzAI Thank you for raising a request to add APOLLO!
@winglian I am the author of APOLLO: SGD-like Memory, AdamW-level Performance. Our optimizer has been integrated into LLaMA-Factory, FluxML, and with a validated performance in the post. We are also undergoing integration into Huggingface Transformers.

We use SGD-like memory cost but can achieve on-par or even better performance than AdamW, validated on both pre-training and fine-tuning (even pre-training a LLaMA-7B model). Please let us know if you want to integrate APOLLO and have any issues during integration!

fizzAI · 2025-02-09T22:32:44Z

@zhuhanqing hai!
whenever the transformers pr gets pulled, apollo should be supported in axo too because it pulls most of its optims from tf

fizzAI · 2025-02-14T22:54:53Z

This issue can be closed now that the transformers PR is merged :) it just works ootb

fblgit added the enhancement New feature or request label Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APOLLO optimizer #2175

APOLLO optimizer #2175

fblgit commented Dec 11, 2024

winglian commented Dec 12, 2024

fizzAI commented Jan 21, 2025

ehartford commented Jan 21, 2025

fizzAI commented Jan 22, 2025 •

edited

Loading

zhuhanqing commented Feb 9, 2025 •

edited

Loading

fizzAI commented Feb 9, 2025

fizzAI commented Feb 14, 2025

APOLLO optimizer #2175

APOLLO optimizer #2175

Comments

fblgit commented Dec 11, 2024

⚠️ Please check that this feature request hasn't been suggested before.

🔖 Feature description

✔️ Solution

❓ Alternatives

📝 Additional Context

Acknowledgements

winglian commented Dec 12, 2024

fizzAI commented Jan 21, 2025

ehartford commented Jan 21, 2025

fizzAI commented Jan 22, 2025 • edited Loading

zhuhanqing commented Feb 9, 2025 • edited Loading

fizzAI commented Feb 9, 2025

fizzAI commented Feb 14, 2025

fizzAI commented Jan 22, 2025 •

edited

Loading

zhuhanqing commented Feb 9, 2025 •

edited

Loading