-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Engine iteration timed out. This should never happen occurred when vllm 0.4.1 deployed llama3. #4293
Comments
looks like the same issue with #4135 |
Having the same error with Mixtral-8x7B-Instruct-v0.1-GPTQ and tensor_parallel_size=2
|
Would you kindly share the specifications of the GPU you utilized while encountering these issues? Also A800-80G? |
@ericzhou571 @JPonsa @blackblue9 @supdizh Could you try |
@ywang96 the issue persists when launching the server with |
I encountered a similar issue in version 0.4.2. |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
Your current environment
🐛 Describe the bug
I'm doing this via python -m vllm.entrypoints.openai.api_server --port 7801 --host 0.0.0.0 --model /mnt/model/llama3-70B-instruct --served-model-name vllm_llama3_70B_instruct --tensor-parallel- size 4 --trust-remote-code After deploying llama3, we started to perform performance tests on the model concurrently. The model could reply normally at the beginning, but about 10 minutes into the test, vllm reported the following error (I installed vllm through the source code 0.4.1):
`
RROR 04-23 16:19:04 async_llm_engine.py:499] Engine iteration timed out. This should never happen!
ERROR 04-23 16:19:04 async_llm_engine.py:43] Engine background task failed
ERROR 04-23 16:19:04 async_llm_engine.py:43] Traceback (most recent call last):
ERROR 04-23 16:19:04 async_llm_engine.py:43] File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 470, in engine_step
ERROR 04-23 16:19:04 async_llm_engine.py:43] request_outputs = await self.engine.step_async()
ERROR 04-23 16:19:04 async_llm_engine.py:43] File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
ERROR 04-23 16:19:04 async_llm_engine.py:43] output = await self.model_executor.execute_model_async(
ERROR 04-23 16:19:04 async_llm_engine.py:43] File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 424, in execute_model_async
ERROR 04-23 16:19:04 async_llm_engine.py:43] all_outputs = await self._run_workers_async(
ERROR 04-23 16:19:04 async_llm_engine.py:43] File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 414, in _run_workers_async
ERROR 04-23 16:19:04 async_llm_engine.py:43] all_outputs = await asyncio.gather(*coros)
ERROR 04-23 16:19:04 async_llm_engine.py:43] asyncio.exceptions.CancelledError
ERROR 04-23 16:19:04 async_llm_engine.py:43]
ERROR 04-23 16:19:04 async_llm_engine.py:43] During handling of the above exception, another exception occurred:
ERROR 04-23 16:19:04 async_llm_engine.py:43]
ERROR 04-23 16:19:04 async_llm_engine.py:43] Traceback (most recent call last):
ERROR 04-23 16:19:04 async_llm_engine.py:43] File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
ERROR 04-23 16:19:04 async_llm_engine.py:43] return fut.result()
ERROR 04-23 16:19:04 async_llm_engine.py:43] asyncio.exceptions.CancelledError
ERROR 04-23 16:19:04 async_llm_engine.py:43]
ERROR 04-23 16:19:04 async_llm_engine.py:43] The above exception was the direct cause of the following exception:
ERROR 04-23 16:19:04 async_llm_engine.py:43]
ERROR 04-23 16:19:04 async_llm_engine.py:43] Traceback (most recent call last):
ERROR 04-23 16:19:04 async_llm_engine.py:43] File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
ERROR 04-23 16:19:04 async_llm_engine.py:43] task.result()
ERROR 04-23 16:19:04 async_llm_engine.py:43] File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 496, in run_engine_loop
ERROR 04-23 16:19:04 async_llm_engine.py:43] has_requests_in_progress = await asyncio.wait_for(
ERROR 04-23 16:19:04 async_llm_engine.py:43] File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
ERROR 04-23 16:19:04 async_llm_engine.py:43] raise exceptions.TimeoutError() from exc
ERROR 04-23 16:19:04 async_llm_engine.py:43] asyncio.exceptions.TimeoutError
ERROR:asyncio:Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7f4124ad08b0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f412c39bac0>>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7f4124ad08b0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f412c39bac0>>)>
Traceback (most recent call last):
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 470, in engine_step
request_outputs = await self.engine.step_async()
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
output = await self.model_executor.execute_model_async(
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 424, in execute_model_async
all_outputs = await self._run_workers_async(
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 414, in _run_workers_async
all_outputs = await asyncio.gather(*coros)
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
task.result()
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 496, in run_engine_loop
has_requests_in_progress = await asyncio.wait_for(
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 45, in _raise_exception_on_finish
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
INFO 04-23 16:19:04 async_llm_engine.py:154] Aborted request cmpl-cb9ae6d5b74b48a28f23d9f4c323a104.
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 470, in engine_step
request_outputs = await self.engine.step_async()
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
output = await self.model_executor.execute_model_async(
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 424, in execute_model_async
all_outputs = await self._run_workers_async(
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 414, in _run_workers_async
all_outputs = await asyncio.gather(*coros)
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call
return await self.app(scope, receive, send)
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/starlette/applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call
raise exc
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call
await self.app(scope, receive, _send)
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in call
await self.app(scope, receive, send)
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/starlette/routing.py", line 758, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
await route.handle(scope, receive, send)
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
await self.app(scope, receive, send)
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
response = await func(request)
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app
raise e
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
raw_response = await run_endpoint_function(
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 89, in create_chat_completion
generator = await openai_serving_chat.create_chat_completion(
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_chat.py", line 95, in create_chat_completion
return await self.chat_completion_full_generator(
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_chat.py", line 258, in chat_completion_full_generator
async for res in result_generator:
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 661, in generate
raise e
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 655, in generate
async for request_output in stream:
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 77, in anext
raise result
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
task.result()
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 496, in run_engine_loop
has_requests_in_progress = await asyncio.wait_for(
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
INFO 04-23 16:19:04 async_llm_engine.py:154] Aborted request cmpl-ca47d0961c59407ab90792f513566cbd.
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 470, in engine_step
request_outputs = await self.engine.step_async()
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
output = await self.model_executor.execute_model_async(
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 424, in execute_model_async
all_outputs = await self._run_workers_async(
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 414, in _run_workers_async
all_outputs = await asyncio.gather(*coros)
asyncio.exceptions.CancelledError
`
How should I solve this problem? ?
The text was updated successfully, but these errors were encountered: