Skip to content

[Bug]: meta-llama/Llama-3.2-90B-Vision-Instruct and Qwen/Qwen2-VL-72B-Instruct models fails with asyncio.exceptions.CancelledError when using wiki image URLs #10904

@atanikan

Description

@atanikan

Your current environment

The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: NVIDIA DGX Server (x86_64)
GCC version: (GCC) 11.4.1 20231218 (Red Hat 11.4.1-3)
Clang version: 17.0.6 (https://github.com/llvm/llvm-project.git 6009708b4367171ccdbf4b5905cb6a803753fe18)
CMake version: version 3.26.5
Libc version: glibc-2.34

Python version: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34
Is CUDA available: True
CUDA runtime version: 12.6.20
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA A100-SXM4-80GB
GPU 1: NVIDIA A100-SXM4-80GB
GPU 2: NVIDIA A100-SXM4-80GB
GPU 3: NVIDIA A100-SXM4-80GB
GPU 4: NVIDIA A100-SXM4-80GB
GPU 5: NVIDIA A100-SXM4-80GB
GPU 6: NVIDIA A100-SXM4-80GB
GPU 7: NVIDIA A100-SXM4-80GB

Nvidia driver version: 550.54.15
cuDNN version: Probably one of the following:
/usr/lib64/libcudnn.so.9.1.1
/usr/lib64/libcudnn_adv.so.9.1.1
/usr/lib64/libcudnn_cnn.so.9.1.1
/usr/lib64/libcudnn_engines_precompiled.so.9.1.1
/usr/lib64/libcudnn_engines_runtime_compiled.so.9.1.1
/usr/lib64/libcudnn_graph.so.9.1.1
/usr/lib64/libcudnn_heuristic.so.9.1.1
/usr/lib64/libcudnn_ops.so.9.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      43 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             256
On-line CPU(s) list:                0-255
Vendor ID:                          AuthenticAMD
Model name:                         AMD EPYC 7742 64-Core Processor
CPU family:                         23
Model:                              49
Thread(s) per core:                 2
Core(s) per socket:                 64
Socket(s):                          2
Stepping:                           0
Frequency boost:                    enabled
CPU(s) scaling MHz:                 98%
CPU max MHz:                        2250.0000
CPU min MHz:                        1500.0000
BogoMIPS:                           4491.55
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es
Virtualization:                     AMD-V
L1d cache:                          4 MiB (128 instances)
L1i cache:                          4 MiB (128 instances)
L2 cache:                           64 MiB (128 instances)
L3 cache:                           512 MiB (32 instances)
NUMA node(s):                       8
NUMA node0 CPU(s):                  0-15,128-143
NUMA node1 CPU(s):                  16-31,144-159
NUMA node2 CPU(s):                  32-47,160-175
NUMA node3 CPU(s):                  48-63,176-191
NUMA node4 CPU(s):                  64-79,192-207
NUMA node5 CPU(s):                  80-95,208-223
NUMA node6 CPU(s):                  96-111,224-239
NUMA node7 CPU(s):                  112-127,240-255
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow: Mitigation; Safe RET
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pyzmq==26.1.0
[pip3] torch==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.46.2
[pip3] triton==3.1.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-cublas-cu12        12.4.5.8                 pypi_0    pypi
[conda] nvidia-cuda-cupti-cu12    12.4.127                 pypi_0    pypi
[conda] nvidia-cuda-nvrtc-cu12    12.4.127                 pypi_0    pypi
[conda] nvidia-cuda-runtime-cu12  12.4.127                 pypi_0    pypi
[conda] nvidia-cudnn-cu12         9.1.0.70                 pypi_0    pypi
[conda] nvidia-cufft-cu12         11.2.1.3                 pypi_0    pypi
[conda] nvidia-curand-cu12        10.3.5.147               pypi_0    pypi
[conda] nvidia-cusolver-cu12      11.6.1.9                 pypi_0    pypi
[conda] nvidia-cusparse-cu12      12.3.1.170               pypi_0    pypi
[conda] nvidia-ml-py              12.560.30                pypi_0    pypi
[conda] nvidia-nccl-cu12          2.21.5                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.4.127                 pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.4.127                 pypi_0    pypi
[conda] pyzmq                     26.1.0                   pypi_0    pypi
[conda] torch                     2.5.1                    pypi_0    pypi
[conda] torchvision               0.20.1                   pypi_0    pypi
[conda] transformers              4.46.2                   pypi_0    pypi
[conda] triton                    3.1.0                    pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.4.post1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	NIC0	NIC1	NIC2	NIC3	NIC4	NIC5	NIC6	NIC7	NIC8	NIC9	NIC10	NIC11	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	NV12	NV12	NV12	NV12	NV12	NV12	NV12	PXB	PXB	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	48-63,176-191	3		N/A
GPU1	NV12	 X 	NV12	NV12	NV12	NV12	NV12	NV12	PXB	PXB	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	48-63,176-191	3		N/A
GPU2	NV12	NV12	 X 	NV12	NV12	NV12	NV12	NV12	SYS	SYS	PXB	PXB	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	16-31,144-159	1		N/A
GPU3	NV12	NV12	NV12	 X 	NV12	NV12	NV12	NV12	SYS	SYS	PXB	PXB	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	16-31,144-159	1		N/A
GPU4	NV12	NV12	NV12	NV12	 X 	NV12	NV12	NV12	SYS	SYS	SYS	SYS	SYS	SYS	PXB	PXB	SYS	SYS	SYS	SYS	112-127,240-255	7		N/A
GPU5	NV12	NV12	NV12	NV12	NV12	 X 	NV12	NV12	SYS	SYS	SYS	SYS	SYS	SYS	PXB	PXB	SYS	SYS	SYS	SYS	112-127,240-255	7		N/A
GPU6	NV12	NV12	NV12	NV12	NV12	NV12	 X 	NV12	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	PXB	PXB	SYS	SYS	80-95,208-223	5		N/A
GPU7	NV12	NV12	NV12	NV12	NV12	NV12	NV12	 X 	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	PXB	PXB	SYS	SYS	80-95,208-223	5		N/A
NIC0	PXB	PXB	SYS	SYS	SYS	SYS	SYS	SYS	 X 	PXB	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS				
NIC1	PXB	PXB	SYS	SYS	SYS	SYS	SYS	SYS	PXB	 X 	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS				
NIC2	SYS	SYS	PXB	PXB	SYS	SYS	SYS	SYS	SYS	SYS	 X 	PXB	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS				
NIC3	SYS	SYS	PXB	PXB	SYS	SYS	SYS	SYS	SYS	SYS	PXB	 X 	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS				
NIC4	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	 X 	PIX	SYS	SYS	SYS	SYS	SYS	SYS				
NIC5	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	PIX	 X 	SYS	SYS	SYS	SYS	SYS	SYS				
NIC6	SYS	SYS	SYS	SYS	PXB	PXB	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	 X 	PXB	SYS	SYS	SYS	SYS				
NIC7	SYS	SYS	SYS	SYS	PXB	PXB	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	PXB	 X 	SYS	SYS	SYS	SYS				
NIC8	SYS	SYS	SYS	SYS	SYS	SYS	PXB	PXB	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	 X 	PXB	SYS	SYS				
NIC9	SYS	SYS	SYS	SYS	SYS	SYS	PXB	PXB	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	PXB	 X 	SYS	SYS				
NIC10	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	 X 	PIX				
NIC11	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	PIX	 X 				

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2
  NIC3: mlx5_3
  NIC4: mlx5_4
  NIC5: mlx5_5
  NIC6: mlx5_6
  NIC7: mlx5_7
  NIC8: mlx5_8
  NIC9: mlx5_9
  NIC10: mlx5_10
  NIC11: mlx5_11

NCCL_SOCKET_IFNAME=bond0
CUDA_LAUNCH_BLOCKING=1
CUDA_PATH=/soft/compilers/cudatoolkit/cuda-12.6.0/
CUDA_TOOLKIT_BASE=/soft/compilers/cudatoolkit/cuda-12.6.0/
LD_LIBRARY_PATH=/lus/eagle/projects/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/cv2/../../lib64:/soft/compilers/cudatoolkit/cuda-12.6.0/extras/CUPTI/lib64:/soft/compilers/cudatoolkit/cuda-12.6.0/lib64:/soft/libraries/trt/TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-12.0/lib:/soft/libraries/nccl/nccl_2.22.3-1+cuda12.6_x86_64/lib:/soft/libraries/cudnn/cudnn-cuda12-linux-x64-v9.3.0.75/lib:/soft/libraries/hdf5/1.14.4.3-openmpi-5.0.3/lib:/soft/libraries/ucx/1.17.0/lib:/soft/compilers/openmpi/5.0.3/lib:/soft/compilers/clang/17.0.6/lib
OMP_NUM_THREADS=4
CUDA_HOME=/soft/compilers/cudatoolkit/cuda-12.6.0/
CUDA_HOME=/soft/compilers/cudatoolkit/cuda-12.6.0/
CUDA_MODULE_LOADING=LAZY

Model Input Dumps

No response

🐛 Describe the bug

Model Serving as follows on 8 A100 80G GPUs

Model Serving

setup_environment
# Define model parameters
export CUDA_LAUNCH_BLOCKING=1

model_name="Llama-3.2-90B-Vision-Instruct"
model_command="CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 vllm serve meta-llama/Llama-3.2-90B-Vision-Instruct --host 127.0.0.1 --port 8000 \
--tensor-parallel-size 8 --gpu-memory-utilization 0.99 \
--disable-log-requests --max-model-len 32768 --enforce-eager \
--multi-step-stream-outputs False --disable-log-stats --max-num-seqs 64 --disable-frontend-multiprocessing \
--ssl-keyfile ~/certificates/mykey.key --ssl-certfile ~/certificates/mycert.crt"
log_file="$PWD/logfile_sophia_vllm_${model_name}_$(hostname).log"


# Initialize retry counter for the model
retry_counter_model_1=0

# Start the model
while true; do
    echo "Starting models sequence..."
    if ! start_model "$model_name" "$model_command" "$log_file" retry_counter_model_1; then
        continue  # Restart from the beginning if this fails
    fi
    echo "All models started successfully."
    break
done

OpenAI Client API call

from openai import OpenAI
import socket
import json
import os
import time
import httpx

# Determine the hostname
start_time  = time.time()
hostname = socket.gethostname()
os.environ['no_proxy'] = f"localhost,{hostname},127.0.0.1"
# Construct the base_url
base_url = f"https://127.0.0.1:8000/v1"

client = OpenAI(
    base_url=base_url,
    api_key="cxvff_xxxx",
    http_client = httpx.Client(verify=False)

)


data = {
    "temperature": 0.2,
    "max_tokens": 50,
    "model": "meta-llama/Llama-3.2-90B-Vision-Instruct",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text", 
                    "text": "What is in this image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                    #"url": "https://apod.nasa.gov/apod/image/2412/MarsClouds_Perseverance_2048.jpg",
                    #"url":"https://www.stockvault.net/data/2010/06/01/113952/preview16.jpg",
                    "url":"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                    },
                },
            ],
        }
    ],
}

response = client.chat.completions.create(**data)
# Print the response
if hasattr(response, "choices") and response.choices:
    print(response.choices[0].message.content)
else:
    print("No valid response received from the API.")

print("Total time",time.time()-start_time)

I see the following error

INFO:     Uvicorn running on https://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     127.0.0.1:50478 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 696, in _request
    conn = await self._connector.connect(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 544, in connect
    proto = await self._create_connection(req, traces, timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1050, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1363, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1109, in _wrap_create_connection
    sock = await aiohappyeyeballs.start_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/impl.py", line 89, in start_connection
    sock, _, _ = await _staggered.staggered_race(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/_staggered.py", line 160, in staggered_race
    done = await _wait_one(
           ^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/_staggered.py", line 41, in _wait_one
    return await wait_next
           ^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 185, in __call__
    with collapse_excgroups():
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_utils.py", line 82, in collapse_excgroups
    raise exc
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 187, in __call__
    response = await self.dispatch_func(request, call_next)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in add_request_id
    response = await call_next(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 163, in call_next
    raise app_exc
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 149, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
    response = await f(request)
               ^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 347, in create_chat_completion
    generator = await handler.create_chat_completion(request, raw_request)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 155, in create_chat_completion
    ) = await self._preprocess_chat(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 470, in _preprocess_chat
    mm_data = await mm_data_future
              ^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/chat_utils.py", line 293, in all_mm_data
    items = await asyncio.gather(*self._items)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/multimodal/utils.py", line 287, in async_get_and_parse_image
    image = await async_fetch_image(
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/multimodal/utils.py", line 110, in async_fetch_image
    image_raw = await global_http_connection.async_get_bytes(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/connections.py", line 92, in async_get_bytes
    async with await self.get_async_response(url, timeout=timeout) as r:
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 1418, in __aenter__
    self._resp: _RetType = await self._coro
                           ^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 602, in _request
    with timer:
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/helpers.py", line 671, in __exit__
    raise asyncio.TimeoutError from exc_val
TimeoutError
INFO:     127.0.0.1:50490 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 696, in _request
    conn = await self._connector.connect(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 544, in connect
    proto = await self._create_connection(req, traces, timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1050, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1363, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1109, in _wrap_create_connection
    sock = await aiohappyeyeballs.start_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/impl.py", line 89, in start_connection
    sock, _, _ = await _staggered.staggered_race(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/_staggered.py", line 160, in staggered_race
    done = await _wait_one(
           ^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/_staggered.py", line 41, in _wait_one
    return await wait_next
           ^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 185, in __call__
    with collapse_excgroups():
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_utils.py", line 82, in collapse_excgroups
    raise exc
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 187, in __call__
    response = await self.dispatch_func(request, call_next)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in add_request_id
    response = await call_next(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 163, in call_next
    raise app_exc
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 149, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
    response = await f(request)
               ^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 347, in create_chat_completion
    generator = await handler.create_chat_completion(request, raw_request)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 155, in create_chat_completion
    ) = await self._preprocess_chat(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 470, in _preprocess_chat
    mm_data = await mm_data_future
              ^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/chat_utils.py", line 293, in all_mm_data
    items = await asyncio.gather(*self._items)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/multimodal/utils.py", line 287, in async_get_and_parse_image
    image = await async_fetch_image(
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/multimodal/utils.py", line 110, in async_fetch_image
    image_raw = await global_http_connection.async_get_bytes(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/connections.py", line 92, in async_get_bytes
    async with await self.get_async_response(url, timeout=timeout) as r:
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 1418, in __aenter__
    self._resp: _RetType = await self._coro
                           ^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 602, in _request
    with timer:
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/helpers.py", line 671, in __exit__
    raise asyncio.TimeoutError from exc_val
TimeoutError
INFO:     127.0.0.1:33666 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 696, in _request
    conn = await self._connector.connect(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 544, in connect
    proto = await self._create_connection(req, traces, timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1050, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1363, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1109, in _wrap_create_connection
    sock = await aiohappyeyeballs.start_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/impl.py", line 89, in start_connection
    sock, _, _ = await _staggered.staggered_race(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/_staggered.py", line 160, in staggered_race
    done = await _wait_one(
           ^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/_staggered.py", line 41, in _wait_one
    return await wait_next
           ^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 185, in __call__
    with collapse_excgroups():
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_utils.py", line 82, in collapse_excgroups
    raise exc
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 187, in __call__
    response = await self.dispatch_func(request, call_next)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in add_request_id
    response = await call_next(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 163, in call_next
    raise app_exc
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 149, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
    response = await f(request)
               ^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 347, in create_chat_completion
    generator = await handler.create_chat_completion(request, raw_request)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 155, in create_chat_completion
    ) = await self._preprocess_chat(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 470, in _preprocess_chat
    mm_data = await mm_data_future
              ^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/chat_utils.py", line 293, in all_mm_data
    items = await asyncio.gather(*self._items)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/multimodal/utils.py", line 287, in async_get_and_parse_image
    image = await async_fetch_image(
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/multimodal/utils.py", line 110, in async_fetch_image
    image_raw = await global_http_connection.async_get_bytes(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/connections.py", line 92, in async_get_bytes
    async with await self.get_async_response(url, timeout=timeout) as r:
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 1418, in __aenter__
    self._resp: _RetType = await self._coro
                           ^^^^^^^^^^^^^^^^
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 602, in _request
    with timer:
  File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/helpers.py", line 671, in __exit__
    raise asyncio.TimeoutError from exc_val
TimeoutError

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions