-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Closed as not planned
Labels
bugSomething isn't workingSomething isn't workingstaleOver 90 days of inactivityOver 90 days of inactivity
Description
Your current environment
The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A
OS: NVIDIA DGX Server (x86_64)
GCC version: (GCC) 11.4.1 20231218 (Red Hat 11.4.1-3)
Clang version: 17.0.6 (https://github.com/llvm/llvm-project.git 6009708b4367171ccdbf4b5905cb6a803753fe18)
CMake version: version 3.26.5
Libc version: glibc-2.34
Python version: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.14.0-362.24.1.el9_3.x86_64-x86_64-with-glibc2.34
Is CUDA available: True
CUDA runtime version: 12.6.20
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-80GB
GPU 1: NVIDIA A100-SXM4-80GB
GPU 2: NVIDIA A100-SXM4-80GB
GPU 3: NVIDIA A100-SXM4-80GB
GPU 4: NVIDIA A100-SXM4-80GB
GPU 5: NVIDIA A100-SXM4-80GB
GPU 6: NVIDIA A100-SXM4-80GB
GPU 7: NVIDIA A100-SXM4-80GB
Nvidia driver version: 550.54.15
cuDNN version: Probably one of the following:
/usr/lib64/libcudnn.so.9.1.1
/usr/lib64/libcudnn_adv.so.9.1.1
/usr/lib64/libcudnn_cnn.so.9.1.1
/usr/lib64/libcudnn_engines_precompiled.so.9.1.1
/usr/lib64/libcudnn_engines_runtime_compiled.so.9.1.1
/usr/lib64/libcudnn_graph.so.9.1.1
/usr/lib64/libcudnn_heuristic.so.9.1.1
/usr/lib64/libcudnn_ops.so.9.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 43 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7742 64-Core Processor
CPU family: 23
Model: 49
Thread(s) per core: 2
Core(s) per socket: 64
Socket(s): 2
Stepping: 0
Frequency boost: enabled
CPU(s) scaling MHz: 98%
CPU max MHz: 2250.0000
CPU min MHz: 1500.0000
BogoMIPS: 4491.55
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es
Virtualization: AMD-V
L1d cache: 4 MiB (128 instances)
L1i cache: 4 MiB (128 instances)
L2 cache: 64 MiB (128 instances)
L3 cache: 512 MiB (32 instances)
NUMA node(s): 8
NUMA node0 CPU(s): 0-15,128-143
NUMA node1 CPU(s): 16-31,144-159
NUMA node2 CPU(s): 32-47,160-175
NUMA node3 CPU(s): 48-63,176-191
NUMA node4 CPU(s): 64-79,192-207
NUMA node5 CPU(s): 80-95,208-223
NUMA node6 CPU(s): 96-111,224-239
NUMA node7 CPU(s): 112-127,240-255
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow: Mitigation; Safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pyzmq==26.1.0
[pip3] torch==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.46.2
[pip3] triton==3.1.0
[conda] numpy 1.26.4 pypi_0 pypi
[conda] nvidia-cublas-cu12 12.4.5.8 pypi_0 pypi
[conda] nvidia-cuda-cupti-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-cuda-nvrtc-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
[conda] nvidia-cufft-cu12 11.2.1.3 pypi_0 pypi
[conda] nvidia-curand-cu12 10.3.5.147 pypi_0 pypi
[conda] nvidia-cusolver-cu12 11.6.1.9 pypi_0 pypi
[conda] nvidia-cusparse-cu12 12.3.1.170 pypi_0 pypi
[conda] nvidia-ml-py 12.560.30 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.21.5 pypi_0 pypi
[conda] nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-nvtx-cu12 12.4.127 pypi_0 pypi
[conda] pyzmq 26.1.0 pypi_0 pypi
[conda] torch 2.5.1 pypi_0 pypi
[conda] torchvision 0.20.1 pypi_0 pypi
[conda] transformers 4.46.2 pypi_0 pypi
[conda] triton 3.1.0 pypi_0 pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.4.post1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 NIC8 NIC9 NIC10 NIC11 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV12 NV12 NV12 NV12 NV12 NV12 NV12 PXB PXB SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS 48-63,176-191 3 N/A
GPU1 NV12 X NV12 NV12 NV12 NV12 NV12 NV12 PXB PXB SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS 48-63,176-191 3 N/A
GPU2 NV12 NV12 X NV12 NV12 NV12 NV12 NV12 SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS SYS SYS 16-31,144-159 1 N/A
GPU3 NV12 NV12 NV12 X NV12 NV12 NV12 NV12 SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS SYS SYS 16-31,144-159 1 N/A
GPU4 NV12 NV12 NV12 NV12 X NV12 NV12 NV12 SYS SYS SYS SYS SYS SYS PXB PXB SYS SYS SYS SYS 112-127,240-255 7 N/A
GPU5 NV12 NV12 NV12 NV12 NV12 X NV12 NV12 SYS SYS SYS SYS SYS SYS PXB PXB SYS SYS SYS SYS 112-127,240-255 7 N/A
GPU6 NV12 NV12 NV12 NV12 NV12 NV12 X NV12 SYS SYS SYS SYS SYS SYS SYS SYS PXB PXB SYS SYS 80-95,208-223 5 N/A
GPU7 NV12 NV12 NV12 NV12 NV12 NV12 NV12 X SYS SYS SYS SYS SYS SYS SYS SYS PXB PXB SYS SYS 80-95,208-223 5 N/A
NIC0 PXB PXB SYS SYS SYS SYS SYS SYS X PXB SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS
NIC1 PXB PXB SYS SYS SYS SYS SYS SYS PXB X SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS
NIC2 SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS X PXB SYS SYS SYS SYS SYS SYS SYS SYS
NIC3 SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS PXB X SYS SYS SYS SYS SYS SYS SYS SYS
NIC4 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS X PIX SYS SYS SYS SYS SYS SYS
NIC5 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS PIX X SYS SYS SYS SYS SYS SYS
NIC6 SYS SYS SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS SYS SYS X PXB SYS SYS SYS SYS
NIC7 SYS SYS SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS SYS SYS PXB X SYS SYS SYS SYS
NIC8 SYS SYS SYS SYS SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS SYS SYS X PXB SYS SYS
NIC9 SYS SYS SYS SYS SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS SYS SYS PXB X SYS SYS
NIC10 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS X PIX
NIC11 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
NIC6: mlx5_6
NIC7: mlx5_7
NIC8: mlx5_8
NIC9: mlx5_9
NIC10: mlx5_10
NIC11: mlx5_11
NCCL_SOCKET_IFNAME=bond0
CUDA_LAUNCH_BLOCKING=1
CUDA_PATH=/soft/compilers/cudatoolkit/cuda-12.6.0/
CUDA_TOOLKIT_BASE=/soft/compilers/cudatoolkit/cuda-12.6.0/
LD_LIBRARY_PATH=/lus/eagle/projects/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/cv2/../../lib64:/soft/compilers/cudatoolkit/cuda-12.6.0/extras/CUPTI/lib64:/soft/compilers/cudatoolkit/cuda-12.6.0/lib64:/soft/libraries/trt/TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-12.0/lib:/soft/libraries/nccl/nccl_2.22.3-1+cuda12.6_x86_64/lib:/soft/libraries/cudnn/cudnn-cuda12-linux-x64-v9.3.0.75/lib:/soft/libraries/hdf5/1.14.4.3-openmpi-5.0.3/lib:/soft/libraries/ucx/1.17.0/lib:/soft/compilers/openmpi/5.0.3/lib:/soft/compilers/clang/17.0.6/lib
OMP_NUM_THREADS=4
CUDA_HOME=/soft/compilers/cudatoolkit/cuda-12.6.0/
CUDA_HOME=/soft/compilers/cudatoolkit/cuda-12.6.0/
CUDA_MODULE_LOADING=LAZY
Model Input Dumps
No response
🐛 Describe the bug
Model Serving as follows on 8 A100 80G GPUs
Model Serving
setup_environment
# Define model parameters
export CUDA_LAUNCH_BLOCKING=1
model_name="Llama-3.2-90B-Vision-Instruct"
model_command="CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 vllm serve meta-llama/Llama-3.2-90B-Vision-Instruct --host 127.0.0.1 --port 8000 \
--tensor-parallel-size 8 --gpu-memory-utilization 0.99 \
--disable-log-requests --max-model-len 32768 --enforce-eager \
--multi-step-stream-outputs False --disable-log-stats --max-num-seqs 64 --disable-frontend-multiprocessing \
--ssl-keyfile ~/certificates/mykey.key --ssl-certfile ~/certificates/mycert.crt"
log_file="$PWD/logfile_sophia_vllm_${model_name}_$(hostname).log"
# Initialize retry counter for the model
retry_counter_model_1=0
# Start the model
while true; do
echo "Starting models sequence..."
if ! start_model "$model_name" "$model_command" "$log_file" retry_counter_model_1; then
continue # Restart from the beginning if this fails
fi
echo "All models started successfully."
break
doneOpenAI Client API call
from openai import OpenAI
import socket
import json
import os
import time
import httpx
# Determine the hostname
start_time = time.time()
hostname = socket.gethostname()
os.environ['no_proxy'] = f"localhost,{hostname},127.0.0.1"
# Construct the base_url
base_url = f"https://127.0.0.1:8000/v1"
client = OpenAI(
base_url=base_url,
api_key="cxvff_xxxx",
http_client = httpx.Client(verify=False)
)
data = {
"temperature": 0.2,
"max_tokens": 50,
"model": "meta-llama/Llama-3.2-90B-Vision-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?"
},
{
"type": "image_url",
"image_url": {
#"url": "https://apod.nasa.gov/apod/image/2412/MarsClouds_Perseverance_2048.jpg",
#"url":"https://www.stockvault.net/data/2010/06/01/113952/preview16.jpg",
"url":"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
},
},
],
}
],
}
response = client.chat.completions.create(**data)
# Print the response
if hasattr(response, "choices") and response.choices:
print(response.choices[0].message.content)
else:
print("No valid response received from the API.")
print("Total time",time.time()-start_time)I see the following error
INFO: Uvicorn running on https://127.0.0.1:8000 (Press CTRL+C to quit)
INFO: 127.0.0.1:50478 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 696, in _request
conn = await self._connector.connect(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 544, in connect
proto = await self._create_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1050, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1363, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1109, in _wrap_create_connection
sock = await aiohappyeyeballs.start_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/impl.py", line 89, in start_connection
sock, _, _ = await _staggered.staggered_race(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/_staggered.py", line 160, in staggered_race
done = await _wait_one(
^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/_staggered.py", line 41, in _wait_one
return await wait_next
^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/applications.py", line 113, in __call__
await self.middleware_stack(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 185, in __call__
with collapse_excgroups():
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/contextlib.py", line 158, in __exit__
self.gen.throw(typ, value, traceback)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_utils.py", line 82, in collapse_excgroups
raise exc
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 187, in __call__
response = await self.dispatch_func(request, call_next)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in add_request_id
response = await call_next(request)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 163, in call_next
raise app_exc
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 149, in coro
await self.app(scope, receive_or_disconnect, send_no_error)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
await self.middleware_stack(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
response = await f(request)
^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 347, in create_chat_completion
generator = await handler.create_chat_completion(request, raw_request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 155, in create_chat_completion
) = await self._preprocess_chat(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 470, in _preprocess_chat
mm_data = await mm_data_future
^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/chat_utils.py", line 293, in all_mm_data
items = await asyncio.gather(*self._items)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/multimodal/utils.py", line 287, in async_get_and_parse_image
image = await async_fetch_image(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/multimodal/utils.py", line 110, in async_fetch_image
image_raw = await global_http_connection.async_get_bytes(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/connections.py", line 92, in async_get_bytes
async with await self.get_async_response(url, timeout=timeout) as r:
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 1418, in __aenter__
self._resp: _RetType = await self._coro
^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 602, in _request
with timer:
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/helpers.py", line 671, in __exit__
raise asyncio.TimeoutError from exc_val
TimeoutError
INFO: 127.0.0.1:50490 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 696, in _request
conn = await self._connector.connect(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 544, in connect
proto = await self._create_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1050, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1363, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1109, in _wrap_create_connection
sock = await aiohappyeyeballs.start_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/impl.py", line 89, in start_connection
sock, _, _ = await _staggered.staggered_race(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/_staggered.py", line 160, in staggered_race
done = await _wait_one(
^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/_staggered.py", line 41, in _wait_one
return await wait_next
^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/applications.py", line 113, in __call__
await self.middleware_stack(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 185, in __call__
with collapse_excgroups():
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/contextlib.py", line 158, in __exit__
self.gen.throw(typ, value, traceback)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_utils.py", line 82, in collapse_excgroups
raise exc
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 187, in __call__
response = await self.dispatch_func(request, call_next)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in add_request_id
response = await call_next(request)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 163, in call_next
raise app_exc
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 149, in coro
await self.app(scope, receive_or_disconnect, send_no_error)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
await self.middleware_stack(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
response = await f(request)
^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 347, in create_chat_completion
generator = await handler.create_chat_completion(request, raw_request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 155, in create_chat_completion
) = await self._preprocess_chat(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 470, in _preprocess_chat
mm_data = await mm_data_future
^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/chat_utils.py", line 293, in all_mm_data
items = await asyncio.gather(*self._items)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/multimodal/utils.py", line 287, in async_get_and_parse_image
image = await async_fetch_image(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/multimodal/utils.py", line 110, in async_fetch_image
image_raw = await global_http_connection.async_get_bytes(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/connections.py", line 92, in async_get_bytes
async with await self.get_async_response(url, timeout=timeout) as r:
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 1418, in __aenter__
self._resp: _RetType = await self._coro
^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 602, in _request
with timer:
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/helpers.py", line 671, in __exit__
raise asyncio.TimeoutError from exc_val
TimeoutError
INFO: 127.0.0.1:33666 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 696, in _request
conn = await self._connector.connect(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 544, in connect
proto = await self._create_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1050, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1363, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/connector.py", line 1109, in _wrap_create_connection
sock = await aiohappyeyeballs.start_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/impl.py", line 89, in start_connection
sock, _, _ = await _staggered.staggered_race(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/_staggered.py", line 160, in staggered_race
done = await _wait_one(
^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohappyeyeballs/_staggered.py", line 41, in _wait_one
return await wait_next
^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/applications.py", line 113, in __call__
await self.middleware_stack(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 185, in __call__
with collapse_excgroups():
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/contextlib.py", line 158, in __exit__
self.gen.throw(typ, value, traceback)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_utils.py", line 82, in collapse_excgroups
raise exc
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 187, in __call__
response = await self.dispatch_func(request, call_next)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in add_request_id
response = await call_next(request)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 163, in call_next
raise app_exc
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/base.py", line 149, in coro
await self.app(scope, receive_or_disconnect, send_no_error)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
await self.middleware_stack(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
response = await f(request)
^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 347, in create_chat_completion
generator = await handler.create_chat_completion(request, raw_request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 155, in create_chat_completion
) = await self._preprocess_chat(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 470, in _preprocess_chat
mm_data = await mm_data_future
^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/entrypoints/chat_utils.py", line 293, in all_mm_data
items = await asyncio.gather(*self._items)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/multimodal/utils.py", line 287, in async_get_and_parse_image
image = await async_fetch_image(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/multimodal/utils.py", line 110, in async_fetch_image
image_raw = await global_http_connection.async_get_bytes(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/vllm/connections.py", line 92, in async_get_bytes
async with await self.get_async_response(url, timeout=timeout) as r:
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 1418, in __aenter__
self._resp: _RetType = await self._coro
^^^^^^^^^^^^^^^^
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/client.py", line 602, in _request
with timer:
File "/eagle/argonne_tpc/inference-gateway/envs/vllmv0.6.4.post1/lib/python3.11/site-packages/aiohttp/helpers.py", line 671, in __exit__
raise asyncio.TimeoutError from exc_val
TimeoutError
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaleOver 90 days of inactivityOver 90 days of inactivity