-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Description
🐛 Describe the bug
vLLM Engine Version 0.
Parameters:
{"tensor_parallel_size": 1, "limit_mm_per_prompt": {"image": 2}, "max_seq_len_to_capture": 131072, "enable_lora": true, "max_loras":1, "max_lora_rank": 320, "lora_modules": {"vision": "vision-lora"}}
INFO 04-14 05:43:22 [model_runner.py:1146] Model loading took 9.7031 GiB and 2.520181 seconds
[rank0]: Traceback (most recent call last):
[rank0]: File "/code/score.py", line 260, in <module>
[rank0]: engine = AsyncLLMEngine.from_engine_args(
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 680, in from_engine_args
[rank0]: return async_engine_cls.from_vllm_config(
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 653, in from_vllm_config
[rank0]: return cls(
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 608, in __init__
[rank0]: self.engine = self._engine_class(*args, **kwargs)
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 267, in __init__
[rank0]: super().__init__(*args, **kwargs)
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 284, in __init__
[rank0]: self._initialize_kv_caches()
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 433, in _initialize_kv_caches
[rank0]: self.model_executor.determine_num_available_blocks())
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/executor/executor_base.py", line 103, in determine_num_available_blocks
[rank0]: results = self.collective_rpc("determine_num_available_blocks")
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
[rank0]: answer = run_method(self.driver_worker, method, args, kwargs)
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/utils.py", line 2347, in run_method
[rank0]: return func(*args, **kwargs)
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/worker/worker.py", line 229, in determine_num_available_blocks
[rank0]: self.model_runner.profile_run()
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1243, in profile_run
[rank0]: self._dummy_run(max_num_batched_tokens, max_num_seqs)
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1369, in _dummy_run
[rank0]: self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1697, in execute_model
[rank0]: self.set_active_loras(model_input.lora_requests,
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1385, in set_active_loras
[rank0]: self.lora_manager.set_active_adapters(lora_requests, lora_mapping)
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/lora/worker_manager.py", line 167, in set_active_adapters
[rank0]: set_active_adapters_worker(requests, mapping, self._apply_adapters,
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/adapter_commons/utils.py", line 54, in set_active_adapters_worker
[rank0]: apply_adapters_func(requests)
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/lora/worker_manager.py", line 227, in _apply_adapters
[rank0]: self.add_adapter(lora)
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/lora/worker_manager.py", line 250, in add_adapter
[rank0]: self._adapter_manager.activate_adapter(lora_request.lora_int_id)
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/lora/models.py", line 752, in activate_adapter
[rank0]: self._active_adapters.touch(lora_id)
[rank0]: File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/utils.py", line 275, in touch
[rank0]: self._LRUCache__update(key) # type: ignore
[rank0]: AttributeError: 'LoRALRUCache' object has no attribute '_LRUCache__update'
INFO 04-14 05:58:52 [multiproc_worker_utils.py:124] Killing local vLLM worker processes
[rank0]:[W414 05:58:52.627573283 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
/opt/miniconda/envs/python39/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
Your current environment
OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-1026-azure-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.1.66
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A100 80GB PCIe
GPU 1: NVIDIA A100 80GB PCIe
GPU 2: NVIDIA A100 80GB PCIe
GPU 3: NVIDIA A100 80GB PCIe
Nvidia driver version: 550.120
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.8.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 1
Core(s) per socket: 48
Socket(s): 2
NUMA node(s): 4
Vendor ID: AuthenticAMD
CPU family: 25
Model: 1
Model name: AMD EPYC 7V13 64-Core Processor
Stepping: 1
CPU MHz: 2445.441
BogoMIPS: 4890.88
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 3 MiB
L1i cache: 3 MiB
L2 cache: 48 MiB
L3 cache: 384 MiB
NUMA node0 CPU(s): 0-23
NUMA node1 CPU(s): 24-47
NUMA node2 CPU(s): 48-71
NUMA node3 CPU(s): 72-95
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Vulnerable: Safe RET, no microcode
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves user_shstk clzero xsaveerptr rdpru arat umip vaes vpclmulqdq rdpid fsrm
Versions of relevant libraries:
[pip3] numpy==2.0.2
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-cusparselt-cu12==0.6.2
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pyzmq==26.4.0
[pip3] torch==2.6.0
[pip3] torchaudio==2.6.0
[pip3] torchvision==0.21.0
[pip3] transformers==4.51.2
[pip3] triton==3.2.0
[conda] No relevant packages
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.8.3
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 GPU1 GPU2 GPU3 NIC0 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV12 SYS SYS NODE 0-23 0 N/A
GPU1 NV12 X SYS SYS SYS 24-47 1 N/A
GPU2 SYS SYS X NV12 SYS 48-71 2 N/A
GPU3 SYS SYS NV12 X SYS 72-95 3 N/A
NIC0 NODE SYS SYS SYS X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NVIDIA_VISIBLE_DEVICES=all
CUBLAS_VERSION=12.1.0.26
NVIDIA_REQUIRE_CUDA=cuda>=9.0
CUDA_CACHE_DISABLE=1
NCCL_VERSION=2.17.1
NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
NVIDIA_PRODUCT_NAME=Triton Server
CUDA_VERSION=12.1.0.023
CUDNN_VERSION=8.8.1.3+cuda12.0
NVIDIA_TRITON_SERVER_VERSION=23.03
LD_LIBRARY_PATH=/opt/tritonserver/backends/onnxruntime:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda-11/lib64
NVIDIA_BUILD_ID=56086596
CUDA_DRIVER_VERSION=530.30.02
NVIDIA_REQUIRE_JETPACK_HOST_MOUNTS=
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY
</details>