-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Closed
Closed
Copy link
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
ok
🐛 Describe the bug
running vLLM as follows:
export HF_HOME=path
export MODEL_ID=microsoft/Florence-2-large
export HUGGING_FACE_HUB_TOKEN=token
docker run \
--runtime nvidia \
-e VLLM_USE_V1=0 \
--gpus 0 \
--ipc=host \
-p "8000:8000" \
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \
-v "${HF_HOME}:/root/.cache/huggingface" \
vllm/vllm-openai:latest \
--tensor-parallel-size 1 \
--model ${MODEL_ID} \
--tokenizer facebook/bart-large \
--max-model-len 4096 \
--gpu-memory-utilization 0.2 \
--dtype float16 \
--trust-remote-code \
--max-num-seqs 8
when trying to call it via curl
# Step 1: Create the base64 encoded image data
base64 -w 0 "fuca.png" > image_data.txt
# Step 2: Create a JSON file with the request payload
cat > request.json << EOF
{
"model": "microsoft/Florence-2-large",
"prompt": "<CAPTION>",
"multi_modal_data": {
"image": "data:image/png;base64,$(cat image_data.txt)"
},
"max_tokens": 50,
"temperature": 0.5
}
EOF
# Step 3: Send the request using the JSON file
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d @request.json
the server crashes
INFO: 172.17.0.1:50586 - "POST /v1/completions HTTP/1.1" 400 Bad Request
WARNING 04-02 12:48:54 [protocol.py:70] The following fields were present in the request but ignored: {'multi_modal_data'}
INFO 04-02 12:48:54 [logger.py:39] Received request cmpl-b7d0103941f94a1ea8f91ec0af09062d-0: prompt: '<CAPTION>', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.5, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=50, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: [0, 41552, 28494, 10263, 15698, 2], lora_request: None, prompt_adapter_request: None.
WARNING 04-02 12:48:54 [preprocess.py:88] Falling back on <BOS> for decoder start token id because decoder start token id is not available.
INFO 04-02 12:48:54 [engine.py:310] Added request cmpl-b7d0103941f94a1ea8f91ec0af09062d-0.
CRITICAL 04-02 12:48:54 [launcher.py:116] MQLLMEngine is already dead, terminating server process
INFO: 172.17.0.1:50850 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
ERROR 04-02 12:48:54 [engine.py:160] TypeError("'NoneType' object is not subscriptable")
ERROR 04-02 12:48:54 [engine.py:160] Traceback (most recent call last):
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 158, in start
ERROR 04-02 12:48:54 [engine.py:160] self.run_engine_loop()
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 221, in run_engine_loop
ERROR 04-02 12:48:54 [engine.py:160] request_outputs = self.engine_step()
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 247, in engine_step
ERROR 04-02 12:48:54 [engine.py:160] raise e
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 230, in engine_step
ERROR 04-02 12:48:54 [engine.py:160] return self.engine.step()
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 1434, in step
ERROR 04-02 12:48:54 [engine.py:160] outputs = self.model_executor.execute_model(
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 139, in execute_model
ERROR 04-02 12:48:54 [engine.py:160] output = self.collective_rpc("execute_model",
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 04-02 12:48:54 [engine.py:160] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2255, in run_method
ERROR 04-02 12:48:54 [engine.py:160] return func(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 420, in execute_model
ERROR 04-02 12:48:54 [engine.py:160] output = self.model_runner.execute_model(
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-02 12:48:54 [engine.py:160] return func(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/enc_dec_model_runner.py", line 182, in execute_model
ERROR 04-02 12:48:54 [engine.py:160] hidden_or_intermediate_states = model_executable(
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-02 12:48:54 [engine.py:160] return self._call_impl(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-02 12:48:54 [engine.py:160] return forward_call(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/florence2.py", line 1090, in forward
ERROR 04-02 12:48:54 [engine.py:160] hidden_states = self.language_model(input_ids,
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-02 12:48:54 [engine.py:160] return self._call_impl(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-02 12:48:54 [engine.py:160] return forward_call(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/florence2.py", line 700, in forward
ERROR 04-02 12:48:54 [engine.py:160] return self.model(input_ids,
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-02 12:48:54 [engine.py:160] return self._call_impl(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-02 12:48:54 [engine.py:160] return forward_call(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/florence2.py", line 647, in forward
ERROR 04-02 12:48:54 [engine.py:160] decoder_outputs = self.decoder(
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-02 12:48:54 [engine.py:160] return self._call_impl(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-02 12:48:54 [engine.py:160] return forward_call(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/bart.py", line 692, in forward
ERROR 04-02 12:48:54 [engine.py:160] hidden_states = decoder_layer(
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-02 12:48:54 [engine.py:160] return self._call_impl(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-02 12:48:54 [engine.py:160] return forward_call(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/bart.py", line 512, in forward
ERROR 04-02 12:48:54 [engine.py:160] hidden_states = self.encoder_attn(
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-02 12:48:54 [engine.py:160] return self._call_impl(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-02 12:48:54 [engine.py:160] return forward_call(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/bart.py", line 350, in forward
ages/vllm/model_executor/models/bart.py", line 350, in forward
ERROR 04-02 12:48:54 [engine.py:160] attn_output = self.attn(q, k, v)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-02 12:48:54 [engine.py:160] return self._call_impl(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-02 12:48:54 [engine.py:160] return forward_call(*args, **kwargs)
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 218, in forward
ERROR 04-02 12:48:54 [engine.py:160] torch.ops.vllm.unified_attention_with_output(
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1123, in __call__
ERROR 04-02 12:48:54 [engine.py:160] return self._op(*args, **(kwargs or {}))
ERROR 04-02 12:48:54 [engine.py:160] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 374, in unified_attention_with_output
ERROR 04-02 12:48:54 [engine.py:160] self.impl.forward(self,
ERROR 04-02 12:48:54 [engine.py:160] File "/usr/local/lib/python3.12/dist-packages/vllm/attention/backends/flash_attn.py", line 783, in forward
ERROR 04-02 12:48:54 [engine.py:160] key = key[:num_prefill_kv_tokens]
ERROR 04-02 12:48:54 [engine.py:160] ~~~^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-02 12:48:54 [engine.py:160] TypeError: 'NoneType' object is not subscriptable
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [1]
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working