-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please support gemma arch #1627
Comments
Support Gemma soon with #1631 |
Hi @NeonBohdan Have you tried Gemma with CTranslate2? Does it generate the same output as Transformers? It seems it starts with a good generation, and then continues with repeated words. However, I might be missing something. |
@ymoslem, I haven't compiled ctranslate2 to test it yet, and I'm waiting for the release. You can try using a |
Thanks, @NeonBohdan for your response! I tried |
@ymoslem for sure you want to left It's the most stable generation case(llama.cpp issues info) The problem with quantization |
Thanks @NeonBohdan! Are you aware of any quantization implementation that solves this issue, or is it something to do with the model itself? |
Right now Gemma isn't fully fixed
So even at bfloat16 or float16 you may see problems (Maybe float16 less problematic, because of embedding scaling) And if you add quantization for this model it may be critical to leave some layers unquantized (which ctranslate2 don't support) So I don't really see a solution exept of waiting But I like ctranslate2 the most |
Thanks @NeonBohdan for the explanations. |
It's a unique model
With 256K tokenizer like MT5 but decoder only
So hoping for good multi language capabilities comapared to llama tokenizer version
Hoping it will be easy enough(like llama -> mistal)
As I see this project is now harder to maintain
But better than llama.cpp or vllm in my opinion
https://huggingface.co/google/gemma-7b-it
Maybe this will help:
ggerganov/llama.cpp#5631
vllm-project/vllm#2960
The text was updated successfully, but these errors were encountered: