Skip to content

[Bug]: Intermittent CUDA IMA in V1 CI tests #14777

@njhill

Description

@njhill

Your current environment

CI

🐛 Describe the bug

Processed prompts:   0% 0/100 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]ERROR 03-13 08:00:58 [core.py:337] EngineCore hit an exception: Traceback (most recent call last):
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 330, in run_engine_core
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     engine_core.run_busy_loop()
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 364, in run_busy_loop
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     outputs = step_fn()
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]               ^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 192, in step
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     output = self.model_executor.execute_model(scheduler_output)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 80, in execute_model
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     output = self.collective_rpc("execute_model",
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     answer = run_method(self.driver_worker, method, args, kwargs)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/utils.py", line 2238, in run_method
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     return func(*args, **kwargs)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]            ^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     return func(*args, **kwargs)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]            ^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 252, in execute_model
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     output = self.model_runner.execute_model(scheduler_output)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     return func(*args, **kwargs)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]            ^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1061, in execute_model
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     gen_lens = valid_mask.sum(dim=1).tolist()
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]                ^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]
[2025-03-13T08:00:58Z] CRITICAL 03-13 08:00:58 [core_client.py:260] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.

Builkite link: https://buildkite.com/vllm/ci/builds/15354#01958e6b-1224-45dc-b444-85d4a30668bf

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingv1

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions