-
-
Couldn't load subscription status.
- Fork 10.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of python collect_env.py
INFO 06-29 19:05:58 [__init__.py:244] Automatically detected platform cuda.
Collecting environment information...
==============================
System Info
==============================
OS : Red Hat Enterprise Linux release 8.10 (Ootpa) (x86_64)
GCC version : (GCC) 8.5.0 20210514 (Red Hat 8.5.0-26)
Clang version : Could not collect
CMake version : Could not collect
Libc version : glibc-2.28
==============================
PyTorch Info
==============================
PyTorch version : 2.7.0+cu126
Is debug build : False
CUDA used to build PyTorch : 12.6
ROCM used to build PyTorch : N/A
==============================
Python Environment
==============================
Python version : 3.11.11 (main, Dec 9 2024, 15:32:27) [GCC 8.5.0 20210514 (Red Hat 8.5.0-22)] (64-bit runtime)
Python platform : Linux-4.18.0-553.52.1.el8_10.x86_64-x86_64-with-glibc2.28
==============================
CUDA / GPU Info
==============================
Is CUDA available : True
CUDA runtime version : 12.2.140
CUDA_MODULE_LOADING set to : LAZY
GPU models and configuration :
GPU 0: NVIDIA L40S-48C
GPU 1: NVIDIA L40S-48C
GPU 2: NVIDIA L40S-48C
GPU 3: NVIDIA L40S-48C
Nvidia driver version : 535.129.03
cuDNN version : Probably one of the following:
/usr/lib64/libcudnn.so.8.9.7
/usr/lib64/libcudnn.so.9.3.0
/usr/lib64/libcudnn_adv.so.9.3.0
/usr/lib64/libcudnn_adv_infer.so.8.9.7
/usr/lib64/libcudnn_adv_train.so.8.9.7
/usr/lib64/libcudnn_cnn.so.9.3.0
/usr/lib64/libcudnn_cnn_infer.so.8.9.7
/usr/lib64/libcudnn_cnn_train.so.8.9.7
/usr/lib64/libcudnn_engines_precompiled.so.9.3.0
/usr/lib64/libcudnn_engines_runtime_compiled.so.9.3.0
/usr/lib64/libcudnn_graph.so.9.3.0
/usr/lib64/libcudnn_heuristic.so.9.3.0
/usr/lib64/libcudnn_ops.so.9.3.0
/usr/lib64/libcudnn_ops_infer.so.8.9.7
/usr/lib64/libcudnn_ops_train.so.8.9.7
HIP runtime version : N/A
MIOpen runtime version : N/A
Is XNNPACK available : True
==============================
CPU Info
==============================
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 6
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 143
Model name: Intel(R) Xeon(R) Platinum 8462Y+
Stepping: 8
CPU MHz: 2799.999
BogoMIPS: 5599.99
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 48K
L1i cache: 32K
L2 cache: 2048K
L3 cache: 61440K
NUMA node0 CPU(s): 0-11
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid cldemote movdiri movdir64b fsrm md_clear flush_l1d arch_capabilities
==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.6.4.1
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-cudnn-cu12==9.5.1.17
[pip3] nvidia-cufft-cu12==11.3.0.4
[pip3] nvidia-cufile-cu12==1.11.1.6
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusolver-cu12==11.7.1.2
[pip3] nvidia-cusparse-cu12==12.5.4.2
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-nccl-cu12==2.26.2
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] pyzmq==27.0.0
[pip3] torch==2.7.0
[pip3] torchaudio==2.7.0
[pip3] torchvision==0.22.0
[pip3] transformers==4.53.0
[pip3] triton==3.3.0
[conda] Could not collect
==============================
vLLM Info
==============================
ROCM Version : Could not collect
Neuron SDK Version : N/A
vLLM Version : 0.9.1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PIX PIX PIX 0-11 0 N/A
GPU1 PIX X PIX PIX 0-11 0 N/A
GPU2 PIX PIX X PIX 0-11 0 N/A
GPU3 PIX PIX PIX X 0-11 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
==============================
Environment Variables
==============================
CUDACXX=/usr/local/cuda/bin/nvcc
CUDA_HOME=/usr/local/cuda-12.2
CUDA_HOME=/usr/local/cuda-12.2
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY
🐛 Describe the bug
I am trying to Gemma 3 27B with vllm==0.9.1, but I am getting this error
(VllmWorker rank=1 pid=305150) DEBUG 06-29 19:19:58 [shm_broadcast.py:313] Connecting to ipc:///tmp/69ea0bbd-db87-4112-a486-0ef61eab157b
(VllmWorker rank=0 pid=305149) DEBUG 06-29 19:19:58 [shm_broadcast.py:313] Connecting to ipc:///tmp/69ea0bbd-db87-4112-a486-0ef61eab157b
(VllmWorker rank=1 pid=305150) DEBUG 06-29 19:19:58 [shm_broadcast.py:243] Binding to ipc:///tmp/9fd45adb-80db-4c70-8475-5e256abfecb0
(VllmWorker rank=0 pid=305149) DEBUG 06-29 19:19:58 [shm_broadcast.py:243] Binding to ipc:///tmp/ba3fa5c4-fda8-43d8-8abe-5a3dcb4ff26d
(VllmWorker rank=1 pid=305150) INFO 06-29 19:19:58 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_026ab1a7'), local_subscribe_addr='ipc:///tmp/9fd45adb-80db-4c70-8475-5e256abfecb0', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorker rank=0 pid=305149) INFO 06-29 19:19:58 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_e6722dfc'), local_subscribe_addr='ipc:///tmp/ba3fa5c4-fda8-43d8-8abe-5a3dcb4ff26d', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorker rank=0 pid=305149) DEBUG 06-29 19:19:59 [parallel_state.py:918] world_size=2 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:40035 backend=nccl
(VllmWorker rank=1 pid=305150) DEBUG 06-29 19:19:59 [parallel_state.py:918] world_size=2 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:40035 backend=nccl
(VllmWorker rank=1 pid=305150) INFO 06-29 19:19:59 [utils.py:1126] Found nccl from library libnccl.so.2
(VllmWorker rank=0 pid=305149) INFO 06-29 19:19:59 [utils.py:1126] Found nccl from library libnccl.so.2
(VllmWorker rank=1 pid=305150) INFO 06-29 19:19:59 [pynccl.py:70] vLLM is using nccl==2.26.2
(VllmWorker rank=0 pid=305149) INFO 06-29 19:19:59 [pynccl.py:70] vLLM is using nccl==2.26.2
vllm_node:305149:305149 [0] NCCL INFO Bootstrap: Using eth0:159.103.253.238<0>
vllm_node:305149:305149 [0] NCCL INFO cudaDriverVersion 12020
vllm_node:305149:305149 [0] NCCL INFO NCCL version 2.26.2+cuda12.2
vllm_node:305150:305150 [1] NCCL INFO cudaDriverVersion 12020
vllm_node:305150:305150 [1] NCCL INFO Bootstrap: Using eth0:159.103.253.238<0>
vllm_node:305150:305150 [1] NCCL INFO NCCL version 2.26.2+cuda12.2
vllm_node:305150:305150 [1] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal net plugin.
vllm_node:305149:305149 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal net plugin.
vllm_node:305150:305150 [1] NCCL INFO NET/IB : No device found.
vllm_node:305149:305149 [0] NCCL INFO NET/IB : No device found.
vllm_node:305150:305150 [1] NCCL INFO NET/IB : Using [RO]; OOB eth0:159.103.253.238<0>
vllm_node:305149:305149 [0] NCCL INFO NET/IB : Using [RO]; OOB eth0:159.103.253.238<0>
vllm_node:305150:305150 [1] NCCL INFO NET/Socket : Using [0]eth0:159.103.253.238<0>
vllm_node:305149:305149 [0] NCCL INFO NET/Socket : Using [0]eth0:159.103.253.238<0>
vllm_node:305150:305150 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
vllm_node:305149:305149 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
vllm_node:305150:305150 [1] NCCL INFO Using network Socket
vllm_node:305149:305149 [0] NCCL INFO Using network Socket
[2025-06-29 19:19:59] vllm_node:305149:305149 [0] init.cc:416 NCCL WARN Cuda failure 'operation not supported'
vllm_node:305149:305149 [0] NCCL INFO init.cc:1397 -> 1
vllm_node:305149:305149 [0] NCCL INFO init.cc:1704 -> 1
[2025-06-29 19:19:59] vllm_node:305150:305150 [1] init.cc:416 NCCL WARN Cuda failure 'operation not supported'
vllm_node:305150:305150 [1] NCCL INFO init.cc:1397 -> 1
vllm_node:305150:305150 [1] NCCL INFO init.cc:1704 -> 1
vllm_node:305149:305149 [0] NCCL INFO init.cc:1730 -> 1
vllm_node:305150:305150 [1] NCCL INFO init.cc:1730 -> 1
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] WorkerProc failed to start.
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] Traceback (most recent call last):
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 466, in worker_main
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] worker = WorkerProc(*args, **kwargs)
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 362, in __init__
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] self.worker.init_device()
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/worker/worker_base.py", line 606, in init_device
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] self.worker.init_device() # type: ignore
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 153, in init_device
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] init_worker_distributed_environment(self.vllm_config, self.rank,
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 370, in init_worker_distributed_environment
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/parallel_state.py", line 1084, in ensure_model_parallel_initialized
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] initialize_model_parallel(tensor_model_parallel_size,
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/parallel_state.py", line 1026, in initialize_model_parallel
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] _TP = init_model_parallel_group(group_ranks,
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/parallel_state.py", line 831, in init_model_parallel_group
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] return GroupCoordinator(
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/parallel_state.py", line 255, in __init__
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] self.device_communicator = device_comm_cls(
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 47, in __init__
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] self.pynccl_comm = PyNcclCommunicator(
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/device_communicators/pynccl.py", line 100, in __init__
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] self.comm: ncclComm_t = self.nccl.ncclCommInitRank(
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 278, in ncclCommInitRank
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] self.NCCL_CHECK(self._funcs["ncclCommInitRank"](ctypes.byref(comm),
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 257, in NCCL_CHECK
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] WorkerProc failed to start.
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] raise RuntimeError(f"NCCL error: {error_str}")
(VllmWorker rank=0 pid=305149) ERROR 06-29 19:19:59 [multiproc_executor.py:492] RuntimeError: NCCL error: unhandled cuda error (run with NCCL_DEBUG=INFO for details)
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] Traceback (most recent call last):
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 466, in worker_main
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] worker = WorkerProc(*args, **kwargs)
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 362, in __init__
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] self.worker.init_device()
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/worker/worker_base.py", line 606, in init_device
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] self.worker.init_device() # type: ignore
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 153, in init_device
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] init_worker_distributed_environment(self.vllm_config, self.rank,
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 370, in init_worker_distributed_environment
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/parallel_state.py", line 1084, in ensure_model_parallel_initialized
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] initialize_model_parallel(tensor_model_parallel_size,
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/parallel_state.py", line 1026, in initialize_model_parallel
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] _TP = init_model_parallel_group(group_ranks,
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/parallel_state.py", line 831, in init_model_parallel_group
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] return GroupCoordinator(
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/parallel_state.py", line 255, in __init__
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] self.device_communicator = device_comm_cls(
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 47, in __init__
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] self.pynccl_comm = PyNcclCommunicator(
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/device_communicators/pynccl.py", line 100, in __init__
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] self.comm: ncclComm_t = self.nccl.ncclCommInitRank(
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 278, in ncclCommInitRank
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] self.NCCL_CHECK(self._funcs["ncclCommInitRank"](ctypes.byref(comm),
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] File "/environments/venv/lib64/python3.11/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 257, in NCCL_CHECK
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] raise RuntimeError(f"NCCL error: {error_str}")
(VllmWorker rank=1 pid=305150) ERROR 06-29 19:19:59 [multiproc_executor.py:492] RuntimeError: NCCL error: unhandled cuda error (run with NCCL_DEBUG=INFO for details)
[rank0]:[W629 19:20:00.193655839 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
ERROR 06-29 19:20:00 [core.py:515] EngineCore failed to start.
ERROR 06-29 19:20:00 [core.py:515] Traceback (most recent call last):
ERROR 06-29 19:20:00 [core.py:515] File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core
ERROR 06-29 19:20:00 [core.py:515] engine_core = EngineCoreProc(*args, **kwargs)
ERROR 06-29 19:20:00 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-29 19:20:00 [core.py:515] File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__
ERROR 06-29 19:20:00 [core.py:515] super().__init__(vllm_config, executor_class, log_stats,
ERROR 06-29 19:20:00 [core.py:515] File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/engine/core.py", line 76, in __init__
ERROR 06-29 19:20:00 [core.py:515] self.model_executor = executor_class(vllm_config)
ERROR 06-29 19:20:00 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-29 19:20:00 [core.py:515] File "/environments/venv/lib64/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__
ERROR 06-29 19:20:00 [core.py:515] self._init_executor()
ERROR 06-29 19:20:00 [core.py:515] File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 98, in _init_executor
ERROR 06-29 19:20:00 [core.py:515] self.workers = WorkerProc.wait_for_ready(unready_workers)
ERROR 06-29 19:20:00 [core.py:515] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-29 19:20:00 [core.py:515] File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 427, in wait_for_ready
ERROR 06-29 19:20:00 [core.py:515] raise e from None
ERROR 06-29 19:20:00 [core.py:515] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
Process EngineCore_0:
Traceback (most recent call last):
File "/usr/lib64/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib64/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/engine/core.py", line 519, in run_engine_core
raise e
File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/engine/core.py", line 506, in run_engine_core
engine_core = EngineCoreProc(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/engine/core.py", line 390, in __init__
super().__init__(vllm_config, executor_class, log_stats,
File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/engine/core.py", line 76, in __init__
self.model_executor = executor_class(vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/environments/venv/lib64/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__
self._init_executor()
File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 98, in _init_executor
self.workers = WorkerProc.wait_for_ready(unready_workers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 427, in wait_for_ready
raise e from None
Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
Traceback (most recent call last):
File "/environments/venv/bin/vllm", line 8, in <module>
sys.exit(main())
^^^^^^
File "/environments/venv/lib64/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 59, in main
args.dispatch_function(args)
File "/environments/venv/lib64/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 58, in cmd
uvloop.run(run_server(args))
File "/environments/venv/lib64/python3.11/site-packages/uvloop/__init__.py", line 105, in run
return runner.run(wrapper())
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/environments/venv/lib64/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/environments/venv/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1323, in run_server
await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
File "/environments/venv/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1343, in run_server_worker
async with build_async_engine_client(args, client_config) as engine_client:
File "/usr/lib64/python3.11/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/environments/venv/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 155, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/usr/lib64/python3.11/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/environments/venv/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 191, in build_async_engine_client_from_engine_args
async_llm = AsyncLLM.from_vllm_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 162, in from_vllm_config
return cls(
^^^^
File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 124, in __init__
self.engine_core = EngineCoreClient.make_async_mp_client(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/engine/core_client.py", line 93, in make_async_mp_client
return AsyncMPClient(vllm_config, executor_class, log_stats,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/engine/core_client.py", line 716, in __init__
super().__init__(
File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/engine/core_client.py", line 422, in __init__
self._init_engines_direct(vllm_config, local_only,
File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/engine/core_client.py", line 491, in _init_engines_direct
self._wait_for_engine_startup(handshake_socket, input_address,
File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/engine/core_client.py", line 511, in _wait_for_engine_startup
wait_for_engine_startup(
File "/environments/venv/lib64/python3.11/site-packages/vllm/v1/utils.py", line 494, in wait_for_engine_startup
raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
Code to reproduce the error:
export VLLM_LOGGING_LEVEL=DEBUG
export NCCL_DEBUG=TRACE
export CUDA_VISIBLE_DEVICES=0,1,2,3
vllm serve /models/gemma-3-27b-it-FP8-Dynamic/ --tensor-parallel-size 2 --gpu-memory-utilization 0.8Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working