[Bug]: Intermittent CUDA IMA in V1 CI tests

### Your current environment

CI

### 🐛 Describe the bug


```
Processed prompts:   0% 0/100 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]ERROR 03-13 08:00:58 [core.py:337] EngineCore hit an exception: Traceback (most recent call last):
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 330, in run_engine_core
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     engine_core.run_busy_loop()
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 364, in run_busy_loop
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     outputs = step_fn()
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]               ^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 192, in step
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     output = self.model_executor.execute_model(scheduler_output)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 80, in execute_model
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     output = self.collective_rpc("execute_model",
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     answer = run_method(self.driver_worker, method, args, kwargs)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/utils.py", line 2238, in run_method
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     return func(*args, **kwargs)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]            ^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     return func(*args, **kwargs)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]            ^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 252, in execute_model
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     output = self.model_runner.execute_model(scheduler_output)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     return func(*args, **kwargs)
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]            ^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1061, in execute_model
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]     gen_lens = valid_mask.sum(dim=1).tolist()
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]                ^^^^^^^^^^^^^^^^^^^^^
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]
[2025-03-13T08:00:58Z] ERROR 03-13 08:00:58 [core.py:337]
[2025-03-13T08:00:58Z] CRITICAL 03-13 08:00:58 [core_client.py:260] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
```
Builkite link: https://buildkite.com/vllm/ci/builds/15354#01958e6b-1224-45dc-b444-85d4a30668bf


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Intermittent CUDA IMA in V1 CI tests #14777

Your current environment

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Intermittent CUDA IMA in V1 CI tests #14777

Description

Your current environment

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions