-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Quantization] Add Gemma2 and Gemma3 text model GGUF support #14766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Evaluation Resultgemma-2-2b-it
gemma-2-2b-it-Q4_K_M-Result
gemma3-1b-it-unquantized
gemma3-1b-it-Q4_K_M
Not sure if it's because of the accuracy issue on xformers backend, but gsm8k score on unquantized/quantized gemma-3-1b-it models are both 0 on my side locally, so I switched to |
|
hello, any update on this ? |
|
I'm not sure if the error below is because of user error or gemma-3 gguf is not supported in 0.8.1:
edit: same error with vLLM API server version 0.8.2.dev31+g2b22290c |
|
@mpetruc You need to add |
I tried the same thing with the --hf-config-path added, but I still got the same error. Do you know what could be the issue? Command used:
My environment: Error: ValueError: GGUF model with architecture gemma3 is not supported yet. |
|
@leslliesayrus Oh, you need to also add Besides this, you need to install You can try using the command for serving: Let me update this PR with more user-friendly error messages... |
|
Hey, can I use gemma3 gguf with vision? Or is this text-only |
|
@anunknowperson This is text-only. |
|
I am getting Same command you mentioned: transformers: 4.50.0 weights used: https://huggingface.co/unsloth/gemma-3-27b-it-GGUF/blob/main/gemma-3-27b-it-Q4_K_M.gguf |
|
This pull request has merge conflicts that must be resolved before it can be |
|
What version of vllm will this pull be merged in? |
Also interested |
Hmmm, I would like to wait the transformers release of huggingface/transformers#37424, so that we don't need to pass Install |
|
Hi @Isotr0py , I believe [https://github.com/huggingface/transformers/pull/37424] pull request has been merged. When can we expect this to be merged? Thanks a lot for your efforts. |
|
LGTM |
|
@Isotr0py firstly very thank you for your efforts. Finally i got run Gemma3-4b-it GGUF through vllm without error. But there is still a issue when i run gemma3 GGUF model. |
|
Hi @DarkLight1337 @Isotr0py, I am still facing error in running the Gemma 3 GGUF models. Can you please help us as we intend to use these models in production and can't do it without the support of parallel processing through vllm. Thank you so much for your efforts. Here's the code with error
|
|
@akash-agr You should use the model path of your local gguf checkpoint. |
|
Sorry to bother you @Isotr0py I have downloaded the model and using below command. But I am getting the following error
I also tried the below script with tokeniser, I am getting the same "KeyError: 'general.name'" error. Error
I have cloned the latest code from the main branch of vllm. Please let me know if I need to update the transformer version somehow. Since this PR is merged, I am not sure if I am using compatible version. Kindly let me know if i am making some other mistake altogether. Thanks a lot for your efforts. It will help me in taking my app to production 🙏 |
|
FYI with recent triton builds, one needs to modify FWIW, even after fixing this I didn't manage to get things to work however, and I ran out of time so I will put this on the backburner for now. Hopefully this PR gets merged at some point in the main branch :) |
Hello @jayyang-zigbang, How did you manage to run it? what was your parameters? Thanks! |
@acsezen unfortunately i tried it long time ago. i didn't remember what parameters i used but i think from BF16 to F16 is not easy. Any suggestion? |
|
@Isotr0py are you still working on this? |
|
Suspended by #26189 |
FIX #14753 #15480 (link existing issues this PR will resolve)
Continue #12186 here as well, because I'm bothered to rebase it with similar modifications.