-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
🐛 Describe the bug
vllm serve Qwen/Qwen2.5-VL-3B-Instruct
INFO 04-28 02:37:03 [async_llm.py:252] Added request 13_chatcmpl-c78d39a6d1d8469f90f3bda9bd41ca6a.
INFO 04-28 02:37:03 [async_llm.py:252] Added request 14_chatcmpl-c78d39a6d1d8469f90f3bda9bd41ca6a.
INFO 04-28 02:37:03 [async_llm.py:252] Added request 15_chatcmpl-c78d39a6d1d8469f90f3bda9bd41ca6a.
ERROR 04-28 02:37:03 [core.py:398] EngineCore encountered a fatal error.
ERROR 04-28 02:37:03 [core.py:398] Traceback (most recent call last):
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 389, in run_engine_core
ERROR 04-28 02:37:03 [core.py:398] engine_core.run_busy_loop()
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 413, in run_busy_loop
ERROR 04-28 02:37:03 [core.py:398] self._process_engine_step()
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 438, in _process_engine_step
ERROR 04-28 02:37:03 [core.py:398] outputs = self.step_fn()
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 203, in step
ERROR 04-28 02:37:03 [core.py:398] output = self.model_executor.execute_model(scheduler_output)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 86, in execute_model
ERROR 04-28 02:37:03 [core.py:398] output = self.collective_rpc("execute_model",
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 04-28 02:37:03 [core.py:398] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/utils.py", line 2456, in run_method
ERROR 04-28 02:37:03 [core.py:398] return func(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-28 02:37:03 [core.py:398] return func(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 268, in execute_model
ERROR 04-28 02:37:03 [core.py:398] output = self.model_runner.execute_model(scheduler_output)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-28 02:37:03 [core.py:398] return func(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1092, in execute_model
ERROR 04-28 02:37:03 [core.py:398] output = self.model(
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-28 02:37:03 [core.py:398] return self._call_impl(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-28 02:37:03 [core.py:398] return forward_call(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 1106, in forward
ERROR 04-28 02:37:03 [core.py:398] hidden_states = self.language_model.model(
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 245, in __call__
ERROR 04-28 02:37:03 [core.py:398] model_output = self.forward(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 325, in forward
ERROR 04-28 02:37:03 [core.py:398] def forward(
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-28 02:37:03 [core.py:398] return self._call_impl(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-28 02:37:03 [core.py:398] return forward_call(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 745, in _fn
ERROR 04-28 02:37:03 [core.py:398] return fn(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 822, in call_wrapped
ERROR 04-28 02:37:03 [core.py:398] return self._wrapped_call(self, *args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 400, in __call__
ERROR 04-28 02:37:03 [core.py:398] raise e
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 387, in __call__
ERROR 04-28 02:37:03 [core.py:398] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-28 02:37:03 [core.py:398] return self._call_impl(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-28 02:37:03 [core.py:398] return forward_call(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "<eval_with_key>.74", line 270, in forward
ERROR 04-28 02:37:03 [core.py:398] submod_1 = self.submod_1(getitem, s0, getitem_1, getitem_2, getitem_3); getitem = getitem_1 = getitem_2 = submod_1 = None
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 822, in call_wrapped
ERROR 04-28 02:37:03 [core.py:398] return self._wrapped_call(self, *args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 02:37:03 [134/1860]
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 822, in call_wrapped
ERROR 04-28 02:37:03 [core.py:398] return self._wrapped_call(self, *args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 400, in __call__
ERROR 04-28 02:37:03 [core.py:398] raise e
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 387, in __call__
ERROR 04-28 02:37:03 [core.py:398] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-28 02:37:03 [core.py:398] return self._call_impl(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-28 02:37:03 [core.py:398] return forward_call(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "<eval_with_key>.2", line 5, in forward
ERROR 04-28 02:37:03 [core.py:398] unified_attention_with_output = torch.ops.vllm.unified_attention_with_output(query_2, key_2, value, output_3, 'language_model.model.layers.0.self_attn.attn'); query_2 = key_2 = value = output_3 = u
nified_attention_with_output = None
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/_ops.py", line 1123, in __call__
ERROR 04-28 02:37:03 [core.py:398] return self._op(*args, **(kwargs or {}))
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/attention/layer.py", line 416, in unified_attention_with_output
ERROR 04-28 02:37:03 [core.py:398] self.impl.forward(self,
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/attention/backends/flash_attn.py", line 598, in forward
ERROR 04-28 02:37:03 [core.py:398] cascade_attention(
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/attention/backends/flash_attn.py", line 730, in cascade_attention
ERROR 04-28 02:37:03 [core.py:398] prefix_output, prefix_lse = flash_attn_varlen_func(
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 252, in flash_attn_varlen_func
ERROR 04-28 02:37:03 [core.py:398] out, softmax_lse, _, _ = torch.ops._vllm_fa3_C.fwd(
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/_ops.py", line 1123, in __call__
ERROR 04-28 02:37:03 [core.py:398] return self._op(*args, **(kwargs or {}))
ERROR 04-28 02:37:03 [core.py:398] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] RuntimeError: scheduler_metadata must have shape (metadata_size)
Process EngineCore_0:
ERROR 04-28 02:37:03 [async_llm.py:399] AsyncLLM output_handler failed.
ERROR 04-28 02:37:03 [async_llm.py:399] Traceback (most recent call last):
ERROR 04-28 02:37:03 [async_llm.py:399] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 357, in output_handler
ERROR 04-28 02:37:03 [async_llm.py:399] outputs = await engine_core.get_output_async()
ERROR 04-28 02:37:03 [async_llm.py:399] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [async_llm.py:399] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 716, in get_output_async
ERROR 04-28 02:37:03 [async_llm.py:399] raise self._format_exception(outputs) from None
ERROR 04-28 02:37:03 [async_llm.py:399] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO 04-28 02:37:03 [async_llm.py:324] Request chatcmpl-931e30a747354f9eb969d74e0917a5b8 failed (engine dead).
INFO 04-28 02:37:03 [async_llm.py:324] Request chatcmpl-bb885cde9cc64d099805f73700577ffc failed (engine dead).
INFO 04-28 02:37:03 [async_llm.py:324] Request chatcmpl-e3401e02484243b48e9e73750055b16a failed (engine dead).```
### Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working