-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Gemma2 supports 8192 context with sliding window, but vllm only does 4196 or fails if try 8192 #6220
Comments
Same issue with head of main: ddc369f after installing flashinfer |
is it that rope scaling is not supported? #6175 I can't be sure what the issue is. |
Yes, also have this problem. |
The warning on the third line in your last output block tells what is the problem here: You should pass |
Yes, so I guess sliding attention support should be added. |
Yes, I've encountered the same problem as you. I used the PI extension to expand the length of gemma2 to 16K. Currently, the first issue is that vLLM does not support inference beyond 4096 in length, and the second issue is that it does not support custom rope length extension. |
is there any plan to support this in the future? |
flashinfer v0.1.2 was just released with sliding window support, I think that vllm's flashinfer.py can now be updated to use it and get the 8k context window of gemma2. |
bump |
Even if
@noamgat |
I apologize if I made a mistake. I am using vLLM 0.5.4 and I set the environment variable |
@noamgat I upgrade @upskyy are you able to use 8k context length? |
I have the same issue with gemma 2 |
hello! Is there any updated (and solid) workaround to solve this issue? |
Need to add |
thank you, but what I understand from the quoted message is that, following these settings, I extend the context window up to 8162, but still without using the sliding window, am I right? |
@fgenie @FraFabbri are you able to run gemma2-9b with 8k context length? @noamgat how to do so? I upgrade flashinfer to 1.6 but still the responses I got with gemma2-9b with 8k context length is not correct check this issue |
@hahmad2008 It worked but only on 4k context length, not 8k. |
Dear William, May we know its impact on the downstream tasks' performance? How many percentage points will it lose? Thank you very much! Best regards, Shuyue |
Your current environment
🐛 Describe the bug
The text was updated successfully, but these errors were encountered: