-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: vllm serve works incorrect for (some) Vision LM models #10286
Comments
For easier debugging, can you try using the offline chat method ( |
Thanks for the tip! It does better than with
|
To eliminate randomness, can you set |
I set |
Can you post the image here? I can't seem to access that URL locally. |
@DarkLight1337 btw can confirm this issue is not model-specific: hosting |
Here, it looks like there is extra BOS token at the start, and a newline at the end of the prompt. This might affect the result.
|
Thanks, this indeed leads to the same results! But what about |
Tbh I'm not completely sure about this since those tokens are logged in the request... |
Adding Just in case you know, there is another similar issue with
This leads to much poorer results compared to when the text goes first (I can run it in such a setting locally):
Do you maybe know a way to fix the order when running it via |
You can try out #9919 which should fix the format of the chat template. |
Doesn't help, unfortunately. To verify, I also ran |
Is this the logged input or the actual input to the model? Based on our discussion, there seems to be some discrepancy between the two... |
Sure, to make it clear: Case 1:
Case 2:
Case 3:
|
@DarkLight1337 and another thing I've just realised: even though adding |
Oh, just realized #9919 has a bug with the chat template detection - fixed. Can you try it again? It should now output image after the text prompt. |
Apologies for my late reply, was setting up the environment. This does not help either, unfortunately. I launch the server with:
According to the logs, the image still goes first in the prompt:
This happens regardless of the value for |
Does this also happen for offline chat? |
Can you post the full logs? |
It does, absolutely the same behavior
Sure:
|
Hmm, you don't seem to be using #9919. In that PR, there should be a log message about the chat template content format. |
Apologies, indeed my bad! Here is the log for your PR's version:
Again, I've launched it with
|
Are you on the latest version of the branch? I'm running this code: from vllm import LLM, SamplingParams
from vllm.multimodal.utils import encode_image_base64
from PIL import Image
image = Image.open("debug.png")
base64 = encode_image_base64(image, format="PNG")
llm = LLM(model="Qwen/Qwen2-VL-7B-Instruct", tensor_parallel_size=2)
sampling_params = SamplingParams(temperature=0, top_p=0.1, max_tokens=32)
outputs = llm.chat(
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64}"
},
},
],
},
],
sampling_params=sampling_params,
)
print(outputs[0].prompt)
### Image goes first here
print(outputs[0].outputs[0].text)
### Gibberish output and it successfully detects the chat template format as "openai". |
Checked, many thanks @ DarkLight1337, this indeed fixes this issue! Could you please tell me whether your PR will be included in the upcoming P.s. unsure whether I need to close this issue since we revealed prompt from console logs doesn't always coincide with the genuine input to the model - please tell me if I'd better close this one. |
Let's create a separate issue for this. I'll edit my PR to close this one once it is merged. |
The PR is basically done, so it will probably make it into the next release, provided that someone approves it. |
Unfortunately this didn't make it into |
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
I am running a Vision LM model
llava-hf/llava-1.5-13b-hf
viavllm serve
, and it outputs weird outputs: official script from vllm examples with somewhat "fixed"top_p
for better determinism outputs only '\n' tokens:I launch the vllm server according to this official script:
Crucially, running the vllm server via Jupyter-notebook yields completely normal outputs, which coincide with outputs, obtained via HuggingFace's transformers from the official Llava's example
:
The inputs to the text encoder are completely normal, according to the logs:
Hence, I have a certain feeling there is a bug in how an image is processed when launching the vllm server via
vllm serve
. Could you please investigate?Before submitting a new issue...
The text was updated successfully, but these errors were encountered: