-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: RuntimeError: CHECK_EQ(paged_kv_indptr.size(0), batch_size + 1) failed. 1 vs 257. When load gemma-2-9b-it using vllm #7070
Comments
+1 Seeing the same thing. |
Seems like a bug in flash infer 0.1.3 |
I plan to try it out tomorrow, and judging by the number of likes, it seems to be the solution! If it resolves the issue, I will close it. |
It works! |
Can we re-open it as this isn't really a solution for the Docker image right? |
@Noxoomo this is not a bug with flashinfer: We found that vllm v0.5.3 didn't integrate flashinfer correctly: flashinfer-ai/flashinfer#362 (comment) As you said, downgrade flashinfer to v0.1.2 is a temporary solution to be compatible with vllm v0.5.3. But you are encouraged to use new wheels once vllm integration was fixed. |
@yzh119 is vllm integration fixed? I'm hitting many bugs running Gemma 2 with vllm Most recently:
Edit: Solution was to upgrade |
@RylanSchaeffer yes, it's fixed in main branch and you can either wait for next vllm release or install vllm from source in main branch. |
Your current environment
🐛 Describe the bug
Hi, I found some bugs when loading the gemma-2-9b fine tuning model using the vLLM library.
The above error was resolved by setting the environment variable as shown below.
2. 'NoneType' object is not callable (type=type_error)
The above error was resolved by installing the
flashinfer
library, see #6445 .Since then, I've encountered the following issues
I've loaded and used many other models via vLLM libraries, but this is the first time I've encountered this, and I haven't found any documentation of the issue.
I tried (ignorantly) to force the size by setting the
batch_size
variable to 256 or 0 directly in the/home/user/anaconda3/envs/llm-api/lib/python3.10/site-packages/flashinfer/prefill.py
source file, but it only changed the number on both sides of the vs.Is there a way to fix this?
The text was updated successfully, but these errors were encountered: