AsyncEngineDeadError when LoRA loading fails #3310

lifuhuang · 2024-03-11T05:43:45Z

Error: when client requesting a LoRA model that cannot be loaded, AsyncLLMEngine would crash with AsyncEngineDeadError. Client HTTP session would hang indefinitely.

Expected Behavior: VLLM should either prevent unloadable LoRA during init phase to avoid user running into this error OR return 500 error immediately.

Stacktrace:

Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7f3abae836d0>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7f3ab0fef550>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7f3abae836d0>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7f3ab0fef550>)>
Traceback (most recent call last):
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 29, in _raise_exception_on_finish
    task.result()
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 414, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 393, in engine_step
    request_outputs = await self.engine.step_async()
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 189, in step_async
    all_outputs = await self._run_workers_async(
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 276, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
  File "/home/<user>/Repos/hello-vllm/.conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "<packages_root>/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "<packages_root>/vllm/worker/worker.py", line 223, in execute_model
    output = self.model_runner.execute_model(seq_group_metadata_list,
  File "<packages_root>/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "<packages_root>/vllm/worker/model_runner.py", line 574, in execute_model
    self.set_active_loras(lora_requests, lora_mapping)
  File "<packages_root>/vllm/worker/model_runner.py", line 660, in set_active_loras
    self.lora_manager.set_active_loras(lora_requests, lora_mapping)
  File "<packages_root>/vllm/lora/worker_manager.py", line 112, in set_active_loras
    self._apply_loras(lora_requests)
  File "<packages_root>/vllm/lora/worker_manager.py", line 224, in _apply_loras
    self.add_lora(lora)
  File "<packages_root>/vllm/lora/worker_manager.py", line 231, in add_lora
    lora = self._load_lora(lora_request)
  File "<packages_root>/vllm/lora/worker_manager.py", line 153, in _load_lora
    raise ValueError(
ValueError: LoRA rank 256 is greater than max_lora_rank 16.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    raise exc
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 33, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.

The repro is based on a larger than expected LoRA rank but I suppose any error from background loop would trigger the same unexpected behavior.

I was thinking about sending a PR to propagate the error from the error from background loop to the http response, but I would love to confirm if this would be the ideal solution, or you have better suggestions how this should be fixed.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-30T02:00:20Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions · 2024-11-29T02:07:52Z

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

This was referenced May 28, 2024

[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already. #5060

Open

bug fixed: cuda out of memory lead to 'AsyncEngineDeadError: Background loop has errored already. #5173

Open

robertgshaw2-redhat mentioned this issue Jun 27, 2024

[Bug]: TRACKING ISSUE: AsyncEngineDeadError #5901

Open

18 tasks

github-actions bot added the stale label Oct 30, 2024

JinhyunBang mentioned this issue Nov 25, 2024

[Misc] Allow LoRA to adaptively increase rank and remove possible_max_ranks #10623

Open

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AsyncEngineDeadError when LoRA loading fails #3310

AsyncEngineDeadError when LoRA loading fails #3310

lifuhuang commented Mar 11, 2024

github-actions bot commented Oct 30, 2024

github-actions bot commented Nov 29, 2024

AsyncEngineDeadError when LoRA loading fails #3310

AsyncEngineDeadError when LoRA loading fails #3310

Comments

lifuhuang commented Mar 11, 2024

github-actions bot commented Oct 30, 2024

github-actions bot commented Nov 29, 2024