- 
                Notifications
    You must be signed in to change notification settings 
- Fork 530
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The environment is unrelated to this issue; therefore, including environment logs is unnecessary.
🐛 Describe the bug
The following error is encountered when running the DeepSeek model:ERROR 05-07 16:27:13 [core.py:402] EngineCore encountered a fatal error.
ERROR 05-07 16:27:13 [core.py:402] Traceback (most recent call last):
ERROR 05-07 16:27:13 [core.py:402]   File "workspace/vllm/vllm/v1/engine/core.py", line 393, in run_engine_core
ERROR 05-07 16:27:13 [core.py:402]     engine_core.run_busy_loop()
ERROR 05-07 16:27:13 [core.py:402]   File "workspace/vllm/vllm/v1/engine/core.py", line 417, in run_busy_loop
ERROR 05-07 16:27:13 [core.py:402]     self._process_engine_step()
ERROR 05-07 16:27:13 [core.py:402]   File "workspace/vllm/vllm/v1/engine/core.py", line 442, in _process_engine_step
ERROR 05-07 16:27:13 [core.py:402]     outputs = self.step_fn()
ERROR 05-07 16:27:13 [core.py:402]               ^^^^^^^^^^^^^^
ERROR 05-07 16:27:13 [core.py:402]   File "workspace/vllm/vllm/v1/engine/core.py", line 205, in step
ERROR 05-07 16:27:13 [core.py:402]     output = self.model_executor.execute_model(scheduler_output)
ERROR 05-07 16:27:13 [core.py:402]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-07 16:27:13 [core.py:402]   File "workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 158, in execute_model
ERROR 05-07 16:27:13 [core.py:402]     (output, ) = self.collective_rpc("execute_model",
ERROR 05-07 16:27:13 [core.py:402]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-07 16:27:13 [core.py:402]   File "workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc
ERROR 05-07 16:27:13 [core.py:402]     result = get_response(w, dequeue_timeout)
ERROR 05-07 16:27:13 [core.py:402]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-07 16:27:13 [core.py:402]   File "workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 202, in get_response
ERROR 05-07 16:27:13 [core.py:402]     raise RuntimeError(
ERROR 05-07 16:27:13 [core.py:402] RuntimeError: Worker failed with error 'call aclnnInplaceCopy failed, detail:EZ1001: [PID: 1123276] 2025-05-07-16:27:13.714.340 128 and 2048 cannot broadcast.
ERROR 05-07 16:27:13 [core.py:402]         TraceBack (most recent call last):
ERROR 05-07 16:27:13 [core.py:402]         the size of tensor self [17,2048] must match the size of tensor src [17,16,128].
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working