You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Error: when client requesting a LoRA model that cannot be loaded, AsyncLLMEngine would crash with AsyncEngineDeadError. Client HTTP session would hang indefinitely.
Expected Behavior: VLLM should either prevent unloadable LoRA during init phase to avoid user running into this error OR return 500 error immediately.
Stacktrace:
Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7f3abae836d0>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7f3ab0fef550>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7f3abae836d0>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7f3ab0fef550>)>
Traceback (most recent call last):
File "<packages_root>/vllm/engine/async_llm_engine.py", line 29, in _raise_exception_on_finish
task.result()
File "<packages_root>/vllm/engine/async_llm_engine.py", line 414, in run_engine_loop
has_requests_in_progress = await self.engine_step()
File "<packages_root>/vllm/engine/async_llm_engine.py", line 393, in engine_step
request_outputs = await self.engine.step_async()
File "<packages_root>/vllm/engine/async_llm_engine.py", line 189, in step_async
all_outputs = await self._run_workers_async(
File "<packages_root>/vllm/engine/async_llm_engine.py", line 276, in _run_workers_async
all_outputs = await asyncio.gather(*coros)
File "/home/<user>/Repos/hello-vllm/.conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "<packages_root>/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "<packages_root>/vllm/worker/worker.py", line 223, in execute_model
output = self.model_runner.execute_model(seq_group_metadata_list,
File "<packages_root>/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "<packages_root>/vllm/worker/model_runner.py", line 574, in execute_model
self.set_active_loras(lora_requests, lora_mapping)
File "<packages_root>/vllm/worker/model_runner.py", line 660, in set_active_loras
self.lora_manager.set_active_loras(lora_requests, lora_mapping)
File "<packages_root>/vllm/lora/worker_manager.py", line 112, in set_active_loras
self._apply_loras(lora_requests)
File "<packages_root>/vllm/lora/worker_manager.py", line 224, in _apply_loras
self.add_lora(lora)
File "<packages_root>/vllm/lora/worker_manager.py", line 231, in add_lora
lora = self._load_lora(lora_request)
File "<packages_root>/vllm/lora/worker_manager.py", line 153, in _load_lora
raise ValueError(
ValueError: LoRA rank 256 is greater than max_lora_rank 16.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
File "<packages_root>/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
raise exc
File "<packages_root>/vllm/engine/async_llm_engine.py", line 33, in _raise_exception_on_finish
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
The repro is based on a larger than expected LoRA rank but I suppose any error from background loop would trigger the same unexpected behavior.
I was thinking about sending a PR to propagate the error from the error from background loop to the http response, but I would love to confirm if this would be the ideal solution, or you have better suggestions how this should be fixed.
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Error: when client requesting a LoRA model that cannot be loaded, AsyncLLMEngine would crash with AsyncEngineDeadError. Client HTTP session would hang indefinitely.
Expected Behavior: VLLM should either prevent unloadable LoRA during init phase to avoid user running into this error OR return 500 error immediately.
Stacktrace:
The repro is based on a larger than expected LoRA rank but I suppose any error from background loop would trigger the same unexpected behavior.
I was thinking about sending a PR to propagate the error from the error from background loop to the http response, but I would love to confirm if this would be the ideal solution, or you have better suggestions how this should be fixed.
The text was updated successfully, but these errors were encountered: