Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: AsyncEngineDeadError: Background loop is stopped after invalid parameter in request #5822

Closed
Tracked by #5901
guillaumerenault opened this issue Jun 25, 2024 · 1 comment · Fixed by #5963
Closed
Tracked by #5901
Labels
bug Something isn't working

Comments

@guillaumerenault
Copy link

guillaumerenault commented Jun 25, 2024

Your current environment

Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.29.5
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.5.0-41-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla V100S-PCIE-32GB
Nvidia driver version: 550.54.15
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      40 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             15
On-line CPU(s) list:                0-14
Vendor ID:                          GenuineIntel
Model name:                         Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz
CPU family:                         6
Model:                              85
Thread(s) per core:                 1
Core(s) per socket:                 1
Socket(s):                          15
Stepping:                           7
BogoMIPS:                           5786.40
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat vnmi umip pku ospke avx512_vnni md_clear arch_capabilities
Virtualization:                     VT-x
Hypervisor vendor:                  KVM
Virtualization type:                full
L1d cache:                          480 KiB (15 instances)
L1i cache:                          480 KiB (15 instances)
L2 cache:                           60 MiB (15 instances)
L3 cache:                           240 MiB (15 instances)
NUMA node(s):                       1
NUMA node0 CPU(s):                  0-14
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit:        KVM: Mitigation: VMX disabled
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:             Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI Syscall hardening, KVM SW loop
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Mitigation; TSX disabled

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] torch==2.3.0
[pip3] transformers==4.41.2
[pip3] triton==2.3.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.5.0
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      0-14    0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

After making a call to POST /v1/chat/completions with the following content:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/models/Meta-Llama-3-8B-Instruct",
    "logit_bias": {
        "AI": -100
    },
    "messages": [
        {
            "role": "system",
            "content": "You are a a helpful assistant."
        },
        {
            "role": "user",
            "content": "What can I do with AI? Provide a very short answer."
        }
    ]
}'

(The logit_bias parameter is invalid, the mapping should be int to int not string to int: https://platform.openai.com/docs/api-reference/chat/create#chat-create-logit_bias)

The model used is: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

vLLM returns an error and then falls into AsyncEngineDeadError. From this point, the inference server is unable to serve any request and /health returns a HTTP 500 Internal Server Error.

Logs:

2024-06-25 12:27:28.192 TRACE:    172.19.0.1:56910 - HTTP connection made
2024-06-25 12:27:28.214 TRACE:    172.19.0.1:56910 - ASGI [1281] Started scope={'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.4'}, 'http_version': '1.1', 'server': ('172.19.0.3', 8000), 'client': ('172.19.0.1', 56910), 'scheme': 'http', 'root_path': '', 'headers': '<...>', 'state': {}, 'method': 'POST', 'path': '/v1/chat/completions', 'raw_path': b'/v1/chat/completions', 'query_string': b''}
2024-06-25 12:27:28.221 TRACE:    172.19.0.1:56910 - ASGI [1281] Receive {'type': 'http.request', 'body': '<223 bytes>', 'more_body': False}
2024-06-25 12:27:28.236 INFO 06-25 10:27:28 async_llm_engine.py:561] Received request cmpl-dbbcd0dea34644228aab6c59085edc42: prompt: '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWhat can I do with AI? Provide a very short answer.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=8157, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [128000, 128006, 9125, 128007, 271, 2675, 527, 264, 264, 11190, 18328, 13, 128009, 128006, 882, 128007, 271, 3923, 649, 358, 656, 449, 15592, 30, 40665, 264, 1633, 2875, 4320, 13, 128009, 128006, 78191, 128007, 271], lora_request: None.
2024-06-25 12:27:28.238 DEBUG 06-25 10:27:28 async_llm_engine.py:524] Got new requests!
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52] Engine background task failed
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52] Traceback (most recent call last):
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     return_value = task.result()
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 529, in run_engine_loop
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     has_requests_in_progress = await asyncio.wait_for(
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     return fut.result()
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 503, in engine_step
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     request_outputs = await self.engine.step_async()
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 235, in step_async
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     output = await self.model_executor.execute_model_async(
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 117, in execute_model_async
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     output = await make_async(self.driver_worker.execute_model
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     result = self.fn(*self.args, **self.kwargs)
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     return func(*args, **kwargs)
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 272, in execute_model
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     output = self.model_runner.execute_model(seq_group_metadata_list,
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     return func(*args, **kwargs)
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 747, in execute_model
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     logits = self.model.compute_logits(hidden_states, sampling_metadata)
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 377, in compute_logits
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     logits = self.logits_processor(self.lm_head.weight, hidden_states,
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     return self._call_impl(*args, **kwargs)
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     return forward_call(*args, **kwargs)
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/logits_processor.py", line 59, in forward
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     logits = _apply_logits_processors(logits, sampling_metadata)
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/logits_processor.py", line 116, in _apply_logits_processors
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     logits_row = logits_processor(past_tokens_ids,
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/protocol.py", line 245, in logit_bias_logits_processor
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52]     logits[int(token_id)] += bias
2024-06-25 12:27:28.328 ERROR 06-25 10:27:28 async_llm_engine.py:52] ValueError: invalid literal for int() with base 10: 'AI'
2024-06-25 12:27:28.331 Exception in callback functools.partial(<function _log_task_completion at 0x751ff1613760>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x751fd98b9ea0>>)
2024-06-25 12:27:28.331 handle: <Handle functools.partial(<function _log_task_completion at 0x751ff1613760>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x751fd98b9ea0>>)>
2024-06-25 12:27:28.331 Traceback (most recent call last):
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion
2024-06-25 12:27:28.331     return_value = task.result()
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 529, in run_engine_loop
2024-06-25 12:27:28.331     has_requests_in_progress = await asyncio.wait_for(
2024-06-25 12:27:28.331   File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
2024-06-25 12:27:28.331     return fut.result()
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 503, in engine_step
2024-06-25 12:27:28.331     request_outputs = await self.engine.step_async()
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 235, in step_async
2024-06-25 12:27:28.331     output = await self.model_executor.execute_model_async(
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 117, in execute_model_async
2024-06-25 12:27:28.331     output = await make_async(self.driver_worker.execute_model
2024-06-25 12:27:28.331   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
2024-06-25 12:27:28.331     result = self.fn(*self.args, **self.kwargs)
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-06-25 12:27:28.331     return func(*args, **kwargs)
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 272, in execute_model
2024-06-25 12:27:28.331     output = self.model_runner.execute_model(seq_group_metadata_list,
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-06-25 12:27:28.331     return func(*args, **kwargs)
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 747, in execute_model
2024-06-25 12:27:28.331     logits = self.model.compute_logits(hidden_states, sampling_metadata)
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 377, in compute_logits
2024-06-25 12:27:28.331     logits = self.logits_processor(self.lm_head.weight, hidden_states,
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
2024-06-25 12:27:28.331     return self._call_impl(*args, **kwargs)



2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
2024-06-25 12:27:28.331     return forward_call(*args, **kwargs)
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/logits_processor.py", line 59, in forward
2024-06-25 12:27:28.331     logits = _apply_logits_processors(logits, sampling_metadata)
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/logits_processor.py", line 116, in _apply_logits_processors
2024-06-25 12:27:28.331     logits_row = logits_processor(past_tokens_ids,
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/protocol.py", line 245, in logit_bias_logits_processor
2024-06-25 12:27:28.331     logits[int(token_id)] += bias
2024-06-25 12:27:28.331 ValueError: invalid literal for int() with base 10: 'AI'
2024-06-25 12:27:28.331 
2024-06-25 12:27:28.331 The above exception was the direct cause of the following exception:
2024-06-25 12:27:28.331 
2024-06-25 12:27:28.331 Traceback (most recent call last):
2024-06-25 12:27:28.331   File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
2024-06-25 12:27:28.331   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
2024-06-25 12:27:28.331     raise AsyncEngineDeadError(
2024-06-25 12:27:28.331 vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for theactual cause.
2024-06-25 12:27:28.333 INFO 06-25 10:27:28 async_llm_engine.py:167] Aborted request cmpl-dbbcd0dea34644228aab6c59085edc42.
2024-06-25 12:27:28.335 TRACE:    172.19.0.1:56910 - ASGI [1281] Send {'type': 'http.response.start', 'status': 400, 'headers': '<...>'}
2024-06-25 12:27:28.336 INFO:     172.19.0.1:56910 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
2024-06-25 12:27:28.337 TRACE:    172.19.0.1:56910 - ASGI [1281] Send {'type': 'http.response.body', 'body': '<124 bytes>'}
2024-06-25 12:27:28.339 TRACE:    172.19.0.1:56910 - ASGI [1281] Completed
2024-06-25 12:27:28.697 TRACE:    10.0.3.98:50040 - HTTP connection lost
2024-06-25 12:27:29.886 TRACE:    10.0.1.65:50586 - HTTP connection made
2024-06-25 12:27:29.889 TRACE:    10.0.1.65:50586 - ASGI [1282] Started scope={'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.4'}, 'http_version': '1.1', 'server': ('172.19.0.3', 8000), 'client': ('10.0.1.65', 50586), 'scheme': 'http', 'root_path': '', 'headers': '<...>', 'state': {}, 'method': 'GET', 'path': '/health', 'raw_path': b'/health', 'query_string': b''}
2024-06-25 12:27:29.895 DEBUG 06-25 10:27:29 async_llm_engine.py:837] Starting health check...
2024-06-25 12:27:29.897 TRACE:    10.0.1.65:50586 - ASGI [1282] Send {'type': 'http.response.start', 'status': 500, 'headers': '<...>'}
2024-06-25 12:27:29.898 INFO:     10.0.1.65:50586 - "GET /health HTTP/1.1" 500 Internal Server Error
2024-06-25 12:27:29.899 TRACE:    10.0.1.65:50586 - ASGI [1282] Send {'type': 'http.response.body', 'body': '<21 bytes>'}
2024-06-25 12:27:29.900 TRACE:    10.0.1.65:50586 - ASGI [1282] Raised exception
2024-06-25 12:27:29.905 ERROR:    Exception in ASGI application
2024-06-25 12:27:29.905 Traceback (most recent call last):
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
2024-06-25 12:27:29.905     result = await app(  # type: ignore[func-returns-value]
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
2024-06-25 12:27:29.905     return await self.app(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/message_logger.py", line 84, in __call__
2024-06-25 12:27:29.905     raise exc from None
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/message_logger.py", line 80, in __call__
2024-06-25 12:27:29.905     await self.app(scope, inner_receive, inner_send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
2024-06-25 12:27:29.905     await super().__call__(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
2024-06-25 12:27:29.905     await self.middleware_stack(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
2024-06-25 12:27:29.905     raise exc
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
2024-06-25 12:27:29.905     await self.app(scope, receive, _send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
2024-06-25 12:27:29.905     await self.app(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
2024-06-25 12:27:29.905     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-06-25 12:27:29.905     raise exc
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-06-25 12:27:29.905     await app(scope, receive, sender)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
2024-06-25 12:27:29.905     await self.middleware_stack(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
2024-06-25 12:27:29.905     await route.handle(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
2024-06-25 12:27:29.905     await self.app(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
2024-06-25 12:27:29.905     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-06-25 12:27:29.905     raise exc
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-06-25 12:27:29.905     await app(scope, receive, sender)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
2024-06-25 12:27:29.905     response = await func(request)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
2024-06-25 12:27:29.905     raw_response = await run_endpoint_function(
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
2024-06-25 12:27:29.905     return await dependant.call(**values)
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 84, in health
2024-06-25 12:27:29.905     await openai_serving_chat.engine.check_health()
2024-06-25 12:27:29.905   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 839, in check_health
2024-06-25 12:27:29.905     raise AsyncEngineDeadError("Background loop is stopped.")
2024-06-25 12:27:29.905 vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.
2024-06-25 12:27:29.905 TRACE:    10.0.1.65:50586 - HTTP connection lost

I am able to reproduce the bug 100% of the time.

@robertgshaw2-neuralmagic
Copy link
Collaborator

#5903 resolves this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants