-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Labels
bugSomething isn't workingSomething isn't workingci-failureIssue about an unexpected test failure in CIIssue about an unexpected test failure in CI
Description
Your current environment
N/A
🐛 Describe the bug
https://buildkite.com/vllm/ci/builds/20460/steps?jid=0196f343-0fdb-4d91-80da-728e0fb8174c
Summary:
[2025-05-21T16:00:09Z] FAILED lora/test_lora_functions.py::test_lora_functions_sync[True] - Exception: Call to add_lora method failed: CUDA error: an illegal memory access was encountered
[2025-05-21T16:00:09Z] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-21T16:00:09Z] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-21T16:00:09Z] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Stack:
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] Invocation of add_lora method failed
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] Traceback (most recent call last):
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 556, in _handle_client_request
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] output.result = method(
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] ^^^^^^^
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 314, in add_lora
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] return self.model_executor.add_lora(lora_request)
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 150, in add_lora
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] return all(self.collective_rpc("add_lora", args=(lora_request, )))
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] answer = run_method(self.driver_worker, method, args, kwargs)
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2598, in run_method
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] return func(*args, **kwargs)
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] ^^^^^^^^^^^^^^^^^^^^^
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 300, in add_lora
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] return self.model_runner.add_lora(lora_request)
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/lora_model_runner_mixin.py", line 130, in add_lora
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] return self.lora_manager.add_adapter(lora_request)
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/worker_manager.py", line 235, in add_adapter
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] lora = self._load_adapter(lora_request)
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/worker_manager.py", line 141, in _load_adapter
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] raise e
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/worker_manager.py", line 117, in _load_adapter
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] lora = self._lora_model_cls.from_local_checkpoint(
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/models.py", line 290, in from_local_checkpoint
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] return cls.from_lora_tensors(
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] ^^^^^^^^^^^^^^^^^^^^^^
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] File "/usr/local/lib/python3.12/dist-packages/vllm/lora/models.py", line 145, in from_lora_tensors
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] lora_embeddings_tensor.pin_memory())
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] RuntimeError: CUDA error: an illegal memory access was encountered
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[2025-05-21T15:50:19Z] ERROR 05-21 08:50:19 [core.py:559] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingci-failureIssue about an unexpected test failure in CIIssue about an unexpected test failure in CI
Type
Projects
Status
Done