-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Closed
Labels
Milestone
Description
Your current environment
CI
🐛 Describe the bug
Processed prompts: 0% 0/100 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]ERROR 03-13 08:00:58 [core.py:337] EngineCore hit an exception: Traceback (most recent call last):
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 330, in run_engine_core
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] engine_core.run_busy_loop()
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 364, in run_busy_loop
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] outputs = step_fn()
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] ^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 192, in step
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] output = self.model_executor.execute_model(scheduler_output)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] File "/opt/venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 80, in execute_model
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] output = self.collective_rpc("execute_model",
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] File "/opt/venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] answer = run_method(self.driver_worker, method, args, kwargs)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] File "/opt/venv/lib/python3.12/site-packages/vllm/utils.py", line 2238, in run_method
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] return func(*args, **kwargs)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] ^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] return func(*args, **kwargs)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] ^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] File "/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 252, in execute_model
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] output = self.model_runner.execute_model(scheduler_output)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] return func(*args, **kwargs)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] ^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] File "/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1061, in execute_model
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] gen_lens = valid_mask.sum(dim=1).tolist()
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] ^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]
[2025-03-13T08:00:58Z] CRITICAL 03-13 08:00:58 [core_client.py:260] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
Builkite link: https://buildkite.com/vllm/ci/builds/15354#01958e6b-1224-45dc-b444-85d4a30668bf