[Bug]: AsyncEngineDeadError: Background loop is stopped after invalid parameter in request #5822

guillaumerenault · 2024-06-25T11:23:02Z

Your current environment

Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.29.5
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.5.0-41-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla V100S-PCIE-32GB
Nvidia driver version: 550.54.15
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      40 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             15
On-line CPU(s) list:                0-14
Vendor ID:                          GenuineIntel
Model name:                         Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz
CPU family:                         6
Model:                              85
Thread(s) per core:                 1
Core(s) per socket:                 1
Socket(s):                          15
Stepping:                           7
BogoMIPS:                           5786.40
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat vnmi umip pku ospke avx512_vnni md_clear arch_capabilities
Virtualization:                     VT-x
Hypervisor vendor:                  KVM
Virtualization type:                full
L1d cache:                          480 KiB (15 instances)
L1i cache:                          480 KiB (15 instances)
L2 cache:                           60 MiB (15 instances)
L3 cache:                           240 MiB (15 instances)
NUMA node(s):                       1
NUMA node0 CPU(s):                  0-14
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit:        KVM: Mitigation: VMX disabled
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:             Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI Syscall hardening, KVM SW loop
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Mitigation; TSX disabled

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] torch==2.3.0
[pip3] transformers==4.41.2
[pip3] triton==2.3.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.5.0
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      0-14    0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

After making a call to POST /v1/chat/completions with the following content:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/models/Meta-Llama-3-8B-Instruct",
    "logit_bias": {
        "AI": -100
    },
    "messages": [
        {
            "role": "system",
            "content": "You are a a helpful assistant."
        },
        {
            "role": "user",
            "content": "What can I do with AI? Provide a very short answer."
        }
    ]
}'

(The logit_bias parameter is invalid, the mapping should be int to int not string to int: https://platform.openai.com/docs/api-reference/chat/create#chat-create-logit_bias)

The model used is: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

vLLM returns an error and then falls into AsyncEngineDeadError. From this point, the inference server is unable to serve any request and /health returns a HTTP 500 Internal Server Error.

Logs:

2024-06-25 12:27:28.192 TRACE:    172.19.0.1:56910 - HTTP connection made
2024-06-25 12:27:28.214 TRACE:    172.19.0.1:56910 - ASGI [1281] Started scope={'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.4'}, 'http_version': '1.1', 'server': ('172.19.0.3', 8000), 'client': ('172.19.0.1', 56910), 'scheme': 'http', 'root_path': '', 'headers': '<...>', 'state': {}, 'method': 'POST', 'path': '/v1/chat/completions', 'raw_path': b'/v1/chat/completions', 'query_string': b''}
2024-06-25 12:27:28.221 TRACE:    172.19.0.1:56910 - ASGI [1281] Receive {'type': 'http.request', 'body': '<223 bytes>', 'more_body': False}
2024-06-25 12:27:28.236 INFO 06-25 10:27:28 async_llm_engine.py:561] Received request cmpl-dbbcd0dea34644228aab6c59085edc42: prompt: '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWhat can I do with AI? Provide a very short answer.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=8157, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [128000, 128006, 9125, 128007, 271, 2675, 527, 264, 264, 11190, 18328, 13, 128009, 128006, 882, 128007, 271, 3923, 649, 358, 656, 449, 15592, 30, 40665, 264, 1633, 2875, 4320, 13, 128009, 128006, 78191, 128007, 271], lora_request: None.
2024-06-25 12:27:28.238 DEBUG 06-25 10:27:28 async_llm_engine.py:524] Got new requests!
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52] Engine background task failed
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52] Traceback (most recent call last):
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     return_value = task.result()
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 529, in run_engine_loop
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     has_requests_in_progress = await asyncio.wait_for(
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     return fut.result()
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 503, in engine_step
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     request_outputs = await self.engine.step_async()
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 235, in step_async
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     output = await self.model_executor.execute_model_async(
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 117, in execute_model_async
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     output = await make_async(self.driver_worker.execute_model
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     result = self.fn(*self.args, **self.kwargs)
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     return func(*args, **kwargs)
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 272, in execute_model
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     output = self.model_runner.execute_model(seq_group_metadata_list,
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     return func(*args, **kwargs)
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 747, in execute_model
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     logits = self.model.compute_logits(hidden_states, sampling_metadata)
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 377, in compute_logits
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     logits = self.logits_processor(self.lm_head.weight, hidden_states,
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     return self._call_impl(*args, **kwargs)
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     return forward_call(*args, **kwargs)
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/logits_processor.py", line 59, in forward
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     logits = _apply_logits_processors(logits, sampling_metadata)
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/logits_processor.py", line 116, in _apply_logits_processors
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     logits_row = logits_processor(past_tokens_ids,
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/protocol.py", line 245, in logit_bias_logits_processor
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     logits[int(token_id)] += bias
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52] ValueError: invalid literal for int() with base 10: 'AI'
2024-06-25 12:27:28.331 Exception in callback functools.partial(<function _log_task_completion at 0x751ff1613760>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x751fd98b9ea0>>)
2024-06-25 12:27:28.331 handle: <Handle functools.partial(<function _log_task_completion at 0x751ff1613760>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x751fd98b9ea0>>)>
2024-06-25 12:27:28.331 Traceback (most recent call last):
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion
2024-06-25 12:27:28.331     return_value = task.result()
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 529, in run_engine_loop
2024-06-25 12:27:28.331     has_requests_in_progress = await asyncio.wait_for(
2024-06-25 12:27:28.331   File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
2024-06-25 12:27:28.331     return fut.result()
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 503, in engine_step
2024-06-25 12:27:28.331     request_outputs = await self.engine.step_async()
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 235, in step_async
2024-06-25 12:27:28.331     output = await self.model_executor.execute_model_async(
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 117, in execute_model_async
2024-06-25 12:27:28.331     output = await make_async(self.driver_worker.execute_model
2024-06-25 12:27:28.331   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
2024-06-25 12:27:28.331     result = self.fn(*self.args, **self.kwargs)
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-06-25 12:27:28.331     return func(*args, **kwargs)
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 272, in execute_model
2024-06-25 12:27:28.331     output = self.model_runner.execute_model(seq_group_metadata_list,
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-06-25 12:27:28.331     return func(*args, **kwargs)
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 747, in execute_model
2024-06-25 12:27:28.331     logits = self.model.compute_logits(hidden_states, sampling_metadata)
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 377, in compute_logits
2024-06-25 12:27:28.331     logits = self.logits_processor(self.lm_head.weight, hidden_states,
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
2024-06-25 12:27:28.331     return self._call_impl(*args, **kwargs)



2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
2024-06-25 12:27:28.331     return forward_call(*args, **kwargs)
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/logits_processor.py", line 59, in forward
2024-06-25 12:27:28.331     logits = _apply_logits_processors(logits, sampling_metadata)
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/logits_processor.py", line 116, in _apply_logits_processors
2024-06-25 12:27:28.331     logits_row = logits_processor(past_tokens_ids,
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/protocol.py", line 245, in logit_bias_logits_processor
2024-06-25 12:27:28.331     logits[int(token_id)] += bias
2024-06-25 12:27:28.331 ValueError: invalid literal for int() with base 10: 'AI'
2024-06-25 12:27:28.331 
2024-06-25 12:27:28.331 The above exception was the direct cause of the following exception:
2024-06-25 12:27:28.331 
2024-06-25 12:27:28.331 Traceback (most recent call last):
2024-06-25 12:27:28.331   File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
2024-06-25 12:27:28.331     raise AsyncEngineDeadError(
2024-06-25 12:27:28.331 vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for theactual cause.
2024-06-25 12:27:28.333 INFO 06-25 10:27:28 async_llm_engine.py:167] Aborted request cmpl-dbbcd0dea34644228aab6c59085edc42.
2024-06-25 12:27:28.335 TRACE:    172.19.0.1:56910 - ASGI [1281] Send {'type': 'http.response.start', 'status': 400, 'headers': '<...>'}
2024-06-25 12:27:28.336 INFO:     172.19.0.1:56910 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
2024-06-25 12:27:28.337 TRACE:    172.19.0.1:56910 - ASGI [1281] Send {'type': 'http.response.body', 'body': '<124 bytes>'}
2024-06-25 12:27:28.339 TRACE:    172.19.0.1:56910 - ASGI [1281] Completed
2024-06-25 12:27:28.697 TRACE:    10.0.3.98:50040 - HTTP connection lost
2024-06-25 12:27:29.886 TRACE:    10.0.1.65:50586 - HTTP connection made
2024-06-25 12:27:29.889 TRACE:    10.0.1.65:50586 - ASGI [1282] Started scope={'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.4'}, 'http_version': '1.1', 'server': ('172.19.0.3', 8000), 'client': ('10.0.1.65', 50586), 'scheme': 'http', 'root_path': '', 'headers': '<...>', 'state': {}, 'method': 'GET', 'path': '/health', 'raw_path': b'/health', 'query_string': b''}
2024-06-25 12:27:29.895 DEBUG 06-25 10:27:29 async_llm_engine.py:837] Starting health check...
2024-06-25 12:27:29.897 TRACE:    10.0.1.65:50586 - ASGI [1282] Send {'type': 'http.response.start', 'status': 500, 'headers': '<...>'}
2024-06-25 12:27:29.898 INFO:     10.0.1.65:50586 - "GET /health HTTP/1.1" 500 Internal Server Error
2024-06-25 12:27:29.899 TRACE:    10.0.1.65:50586 - ASGI [1282] Send {'type': 'http.response.body', 'body': '<21 bytes>'}
2024-06-25 12:27:29.900 TRACE:    10.0.1.65:50586 - ASGI [1282] Raised exception
2024-06-25 12:27:29.905 ERROR:    Exception in ASGI application
2024-06-25 12:27:29.905 Traceback (most recent call last):
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
2024-06-25 12:27:29.905     result = await app(  # type: ignore[func-returns-value]
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
2024-06-25 12:27:29.905     return await self.app(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/message_logger.py", line 84, in __call__
2024-06-25 12:27:29.905     raise exc from None
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/message_logger.py", line 80, in __call__
2024-06-25 12:27:29.905     await self.app(scope, inner_receive, inner_send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
2024-06-25 12:27:29.905     await super().__call__(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
2024-06-25 12:27:29.905     await self.middleware_stack(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
2024-06-25 12:27:29.905     raise exc
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
2024-06-25 12:27:29.905     await self.app(scope, receive, _send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
2024-06-25 12:27:29.905     await self.app(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
2024-06-25 12:27:29.905     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-06-25 12:27:29.905     raise exc
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-06-25 12:27:29.905     await app(scope, receive, sender)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
2024-06-25 12:27:29.905     await self.middleware_stack(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
2024-06-25 12:27:29.905     await route.handle(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
2024-06-25 12:27:29.905     await self.app(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
2024-06-25 12:27:29.905     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-06-25 12:27:29.905     raise exc
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-06-25 12:27:29.905     await app(scope, receive, sender)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
2024-06-25 12:27:29.905     response = await func(request)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
2024-06-25 12:27:29.905     raw_response = await run_endpoint_function(
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
2024-06-25 12:27:29.905     return await dependant.call(**values)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 84, in health
2024-06-25 12:27:29.905     await openai_serving_chat.engine.check_health()
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 839, in check_health
2024-06-25 12:27:29.905     raise AsyncEngineDeadError("Background loop is stopped.")
2024-06-25 12:27:29.905 vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.
2024-06-25 12:27:29.905 TRACE:    10.0.1.65:50586 - HTTP connection lost

I am able to reproduce the bug 100% of the time.

The text was updated successfully, but these errors were encountered:

robertgshaw2-neuralmagic · 2024-06-27T12:46:21Z

#5903 resolves this issue

guillaumerenault added the bug Something isn't working label Jun 25, 2024

DarkLight1337 mentioned this issue Jun 25, 2024

[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already. #5060

Open

This was referenced Jun 27, 2024

[Bug]: TRACKING ISSUE: AsyncEngineDeadError #5901

Open

[Bugfix] Fix Engine Failing After Invalid Request - AsyncEngineDeadError #5903

Closed

robertgshaw2-neuralmagic closed this as completed Jun 27, 2024

robertgshaw2-neuralmagic reopened this Jun 27, 2024

robertgshaw2-neuralmagic mentioned this issue Jun 28, 2024

[Bugfix] Fix Engine Failing After Invalid Request - AsyncEngineDeadError #5963

Merged

robertgshaw2-neuralmagic closed this as completed in #5963 Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: AsyncEngineDeadError: Background loop is stopped after invalid parameter in request #5822

[Bug]: AsyncEngineDeadError: Background loop is stopped after invalid parameter in request #5822

guillaumerenault commented Jun 25, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Jun 27, 2024

[Bug]: AsyncEngineDeadError: Background loop is stopped after invalid parameter in request #5822

[Bug]: AsyncEngineDeadError: Background loop is stopped after invalid parameter in request #5822

Comments

guillaumerenault commented Jun 25, 2024 • edited Loading

Your current environment

🐛 Describe the bug

robertgshaw2-neuralmagic commented Jun 27, 2024

guillaumerenault commented Jun 25, 2024 •

edited

Loading