[Bug]: Crashing server running Florence-2 when trying to call as multi modal

### Your current environment

ok

### 🐛 Describe the bug

running  vLLM as follows:
``` 
export HF_HOME=path
export MODEL_ID=microsoft/Florence-2-large
export HUGGING_FACE_HUB_TOKEN=token


docker run \
--runtime nvidia \
-e VLLM_USE_V1=0 \
--gpus 0 \
--ipc=host \
-p "8000:8000" \
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \
-v "${HF_HOME}:/root/.cache/huggingface" \
vllm/vllm-openai:latest \
--tensor-parallel-size 1 \
--model ${MODEL_ID} \
--tokenizer facebook/bart-large \
--max-model-len 4096 \
--gpu-memory-utilization 0.2 \
--dtype float16 \
--trust-remote-code \
--max-num-seqs 8 
```

when trying to call it via curl

```
# Step 1: Create the base64 encoded image data
base64 -w 0 "fuca.png" > image_data.txt

# Step 2: Create a JSON file with the request payload
cat > request.json << EOF
{
    "model": "microsoft/Florence-2-large",
    "prompt": "<CAPTION>",
    "multi_modal_data": {
        "image": "data:image/png;base64,$(cat image_data.txt)"
    },
    "max_tokens": 50,
    "temperature": 0.5
}
EOF

# Step 3: Send the request using the JSON file
curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d @request.json
```
the server crashes
```
INFO:     172.17.0.1:50586 - "POST /v1/completions HTTP/1.1" 400 Bad Request
WARNING 04-02 12:48:54 [protocol.py:70] The following fields were present in the request but ignored: {'multi_modal_data'}
INFO 04-02 12:48:54 [logger.py:39] Received request cmpl-b7d0103941f94a1ea8f91ec0af09062d-0: prompt: '<CAPTION>', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.5, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=50, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: [0, 41552, 28494, 10263, 15698, 2], lora_request: None, prompt_adapter_request: None.
WARNING 04-02 12:48:54 [preprocess.py:88] Falling back on <BOS> for decoder start token id because decoder start token id is not available.
INFO 04-02 12:48:54 [engine.py:310] Added request cmpl-b7d0103941f94a1ea8f91ec0af09062d-0.
CRITICAL 04-02 12:48:54 [launcher.py:116] MQLLMEngine is already dead, terminating server process
INFO:     172.17.0.1:50850 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
ERROR 04-02 12:48:54 [engine.py:160] TypeError("'NoneType' object is not subscriptable")
ERROR 04-02 12:48:54 [engine.py:160] Traceback (most recent call last):
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 158, in start
ERROR 04-02 12:48:54 [engine.py:160]     self.run_engine_loop()
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 221, in run_engine_loop
ERROR 04-02 12:48:54 [engine.py:160]     request_outputs = self.engine_step()
ERROR 04-02 12:48:54 [engine.py:160]                       ^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 247, in engine_step
ERROR 04-02 12:48:54 [engine.py:160]     raise e
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 230, in engine_step
ERROR 04-02 12:48:54 [engine.py:160]     return self.engine.step()
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 1434, in step
ERROR 04-02 12:48:54 [engine.py:160]     outputs = self.model_executor.execute_model(
ERROR 04-02 12:48:54 [engine.py:160]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 139, in execute_model
ERROR 04-02 12:48:54 [engine.py:160]     output = self.collective_rpc("execute_model",
ERROR 04-02 12:48:54 [engine.py:160]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 04-02 12:48:54 [engine.py:160]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 04-02 12:48:54 [engine.py:160]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2255, in run_method
ERROR 04-02 12:48:54 [engine.py:160]     return func(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 420, in execute_model
ERROR 04-02 12:48:54 [engine.py:160]     output = self.model_runner.execute_model(
ERROR 04-02 12:48:54 [engine.py:160]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-02 12:48:54 [engine.py:160]     return func(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/enc_dec_model_runner.py", line 182, in execute_model
ERROR 04-02 12:48:54 [engine.py:160]     hidden_or_intermediate_states = model_executable(
ERROR 04-02 12:48:54 [engine.py:160]                                     ^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-02 12:48:54 [engine.py:160]     return self._call_impl(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-02 12:48:54 [engine.py:160]     return forward_call(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/florence2.py", line 1090, in forward
ERROR 04-02 12:48:54 [engine.py:160]     hidden_states = self.language_model(input_ids,
ERROR 04-02 12:48:54 [engine.py:160]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-02 12:48:54 [engine.py:160]     return self._call_impl(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-02 12:48:54 [engine.py:160]     return forward_call(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/florence2.py", line 700, in forward
ERROR 04-02 12:48:54 [engine.py:160]     return self.model(input_ids,
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-02 12:48:54 [engine.py:160]     return self._call_impl(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-02 12:48:54 [engine.py:160]     return forward_call(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/florence2.py", line 647, in forward
ERROR 04-02 12:48:54 [engine.py:160]     decoder_outputs = self.decoder(
ERROR 04-02 12:48:54 [engine.py:160]                       ^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-02 12:48:54 [engine.py:160]     return self._call_impl(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-02 12:48:54 [engine.py:160]     return forward_call(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/bart.py", line 692, in forward
ERROR 04-02 12:48:54 [engine.py:160]     hidden_states = decoder_layer(
ERROR 04-02 12:48:54 [engine.py:160]                     ^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-02 12:48:54 [engine.py:160]     return self._call_impl(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-02 12:48:54 [engine.py:160]     return forward_call(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/bart.py", line 512, in forward
ERROR 04-02 12:48:54 [engine.py:160]     hidden_states = self.encoder_attn(
ERROR 04-02 12:48:54 [engine.py:160]                     ^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-02 12:48:54 [engine.py:160]     return self._call_impl(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-02 12:48:54 [engine.py:160]     return forward_call(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/bart.py", line 350, in forward
ages/vllm/model_executor/models/bart.py", line 350, in forward
ERROR 04-02 12:48:54 [engine.py:160]     attn_output = self.attn(q, k, v)
ERROR 04-02 12:48:54 [engine.py:160]                   ^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-02 12:48:54 [engine.py:160]     return self._call_impl(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-02 12:48:54 [engine.py:160]     return forward_call(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 218, in forward
ERROR 04-02 12:48:54 [engine.py:160]     torch.ops.vllm.unified_attention_with_output(
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1123, in __call__
ERROR 04-02 12:48:54 [engine.py:160]     return self._op(*args, **(kwargs or {}))
ERROR 04-02 12:48:54 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 374, in unified_attention_with_output
ERROR 04-02 12:48:54 [engine.py:160]     self.impl.forward(self,
ERROR 04-02 12:48:54 [engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/attention/backends/flash_attn.py", line 783, in forward
ERROR 04-02 12:48:54 [engine.py:160]     key = key[:num_prefill_kv_tokens]
ERROR 04-02 12:48:54 [engine.py:160]           ~~~^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] TypeError: 'NoneType' object is not subscriptable
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1]
```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Crashing server running Florence-2 when trying to call as multi modal #15968

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Crashing server running Florence-2 when trying to call as multi modal #15968

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions