-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[MODEL] Apertus
and XIELU
#23068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MODEL] Apertus
and XIELU
#23068
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the Apertus
model and its xIELU
activation function. The changes include a new model definition for Apertus
, which correctly implements QK-norm, and a new XIELU
activation layer. The model is also added to the test suite and model registries.
My review focuses on the implementation of the new activation function. I've found a critical issue in the XIELU
's forward
method that would prevent the use of its CUDA kernel under torch.compile
, leading to a performance degradation. The provided suggestion should resolve this issue. The rest of the implementation appears to be correct and follows vLLM's conventions.
732dd4d
to
e26ab90
Compare
This pull request has merge conflicts that must be resolved before it can be |
f61e6c0
to
505301f
Compare
505301f
to
a3e809c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing! I think the implementation looks good overall but please fix pre-commit
@DarkLight1337 I'm not sure what the errors are, does it just want to convert to Comments @AllenHaoHuang? |
I think you can just assert |
just a quick comment that hugging face transformers has already merged it into their main branch now, which should make next steps here easier? also beta and eps additional parameters are now also saved in the HF checkpoint. is this taken into account here yet? |
@martinjaggi yes all my implementations are the most up to date |
Co-authored-by: AllenHaoHuang <allenhuangdd@gmail.com> Signed-off-by: EduardDurech <39579228+EduardDurech@users.noreply.github.com>
581b566
to
4e29220
Compare
Signed-off-by: EduardDurech <39579228+EduardDurech@users.noreply.github.com>
4e29220
to
81ab8f7
Compare
@DarkLight1337 not sure what the current issues are but fixed pre-commit |
Fastcheck is optional so you can ignore it |
@DarkLight1337 I think the PR ci is just because huggingface/transformers#39381 was only recently merged |
Signed-off-by: EduardDurech <39579228+EduardDurech@users.noreply.github.com>
Head branch was pushed to by a user without write access
Signed-off-by: EduardDurech <39579228+EduardDurech@users.noreply.github.com>
Head branch was pushed to by a user without write access
@DarkLight1337 we have a small fix for activation parameter loading for the model #24100 |
Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />
Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />
ا |
Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />
Pre-release of Apertus from the Swiss AI Initiative
Main modifications from Llama
Associated Transformers PR huggingface/transformers#39381
Co-author: @AllenHaoHuang