-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: InternVL2-26B infer error:Attempted to assign 7 x 256 = 1792 multimodal tokens to 506 placeholders #7996
Comments
Try setting a larger context length ( |
Also, make sure your InternVL2 is up-to-date. |
I can actually repro this error - it seems to me that there's something changed about this model that introduced this bug. @Isotr0py do you have bandwidth to take a look at this issue? |
@DarkLight1337 |
OK, I will take a look at this later today. |
@SovereignRemedy @MasterJanus This may be caused by the chunked prefill. You can set
BTW, inference with |
@Isotr0py Thanks for the investigation! I guess we never ran into this issue previously since most VL models have small context window. We should definitely make a change to make sure chunked-prefill is disabled when serving a VLM (until we figure out how to make it compatible with chunked prefill) |
@ywang96 After deeper investigation, seems that chunked-prefill has no conflicts between VLM. When I increase In fact, the root issue is the default |
I keep forgetting about chunked prefill. Indeed, we should handle this case. |
Let me open a quick PR to increase the default value for VLM. |
I think the tricky part is how to dynamically properly set this number - I'm okay with setting an arbitrary default value for VLMs just for now. |
Thank you for your answers. I referred to the above startup parameters, and the inference will not report an error, but there will be similar garbled characters, which I find strange. Is it a problem with my pictures? My code runs on four A10GPU
|
No, this is a bug about the |
@Root970103 I have created #8055 to fix this. Please take a look :) |
Thanks for reply~ I will try again. |
Facing something similar for https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf/ on 4 H100s (vllm installed from source today):
|
Can you open a separate issue for this since it's for a different model? |
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
here is my error stack trace
If you have any questions, please feel free to contact me. I will run it exactly according to the official demo. The pictures are from my local
#6321
Is only 2B supported?
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: