Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncEngineDeadError when LoRA loading fails #3310

Closed
Tracked by #5901
lifuhuang opened this issue Mar 11, 2024 · 2 comments · May be fixed by #5173
Closed
Tracked by #5901

AsyncEngineDeadError when LoRA loading fails #3310

lifuhuang opened this issue Mar 11, 2024 · 2 comments · May be fixed by #5173
Labels

Comments

@lifuhuang
Copy link

Error: when client requesting a LoRA model that cannot be loaded, AsyncLLMEngine would crash with AsyncEngineDeadError. Client HTTP session would hang indefinitely.

Expected Behavior: VLLM should either prevent unloadable LoRA during init phase to avoid user running into this error OR return 500 error immediately.

Stacktrace:

Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7f3abae836d0>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7f3ab0fef550>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7f3abae836d0>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7f3ab0fef550>)>
Traceback (most recent call last):
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 29, in _raise_exception_on_finish
    task.result()
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 414, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 393, in engine_step
    request_outputs = await self.engine.step_async()
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 189, in step_async
    all_outputs = await self._run_workers_async(
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 276, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
  File "/home/<user>/Repos/hello-vllm/.conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "<packages_root>/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "<packages_root>/vllm/worker/worker.py", line 223, in execute_model
    output = self.model_runner.execute_model(seq_group_metadata_list,
  File "<packages_root>/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "<packages_root>/vllm/worker/model_runner.py", line 574, in execute_model
    self.set_active_loras(lora_requests, lora_mapping)
  File "<packages_root>/vllm/worker/model_runner.py", line 660, in set_active_loras
    self.lora_manager.set_active_loras(lora_requests, lora_mapping)
  File "<packages_root>/vllm/lora/worker_manager.py", line 112, in set_active_loras
    self._apply_loras(lora_requests)
  File "<packages_root>/vllm/lora/worker_manager.py", line 224, in _apply_loras
    self.add_lora(lora)
  File "<packages_root>/vllm/lora/worker_manager.py", line 231, in add_lora
    lora = self._load_lora(lora_request)
  File "<packages_root>/vllm/lora/worker_manager.py", line 153, in _load_lora
    raise ValueError(
ValueError: LoRA rank 256 is greater than max_lora_rank 16.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    raise exc
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 33, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.

The repro is based on a larger than expected LoRA rank but I suppose any error from background loop would trigger the same unexpected behavior.

I was thinking about sending a PR to propagate the error from the error from background loop to the http response, but I would love to confirm if this would be the ideal solution, or you have better suggestions how this should be fixed.

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

Copy link

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant