-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
Description
Your current environment
==============================
Versions of relevant libraries
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.6.4.1
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-cudnn-cu12==9.5.1.17
[pip3] nvidia-cufft-cu12==11.3.0.4
[pip3] nvidia-cufile-cu12==1.11.1.6
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusolver-cu12==11.7.1.2
[pip3] nvidia-cusparse-cu12==12.5.4.2
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-ml-py==13.580.65
[pip3] nvidia-nccl-cu12==2.26.2
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] pyzmq==27.0.0
[pip3] torch==2.7.1
[pip3] torchaudio==2.7.1
[pip3] torchvision==0.22.1
[pip3] transformers==4.55.2
[pip3] triton==3.3.1
[conda] numpy 2.2.6 pypi_0 pypi
[conda] nvidia-cublas-cu12 12.6.4.1 pypi_0 pypi
[conda] nvidia-cuda-cupti-cu12 12.6.80 pypi_0 pypi
[conda] nvidia-cuda-nvrtc-cu12 12.6.77 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu12 12.6.77 pypi_0 pypi
[conda] nvidia-cudnn-cu12 9.5.1.17 pypi_0 pypi
[conda] nvidia-cufft-cu12 11.3.0.4 pypi_0 pypi
[conda] nvidia-cufile-cu12 1.11.1.6 pypi_0 pypi
[conda] nvidia-curand-cu12 10.3.7.77 pypi_0 pypi
[conda] nvidia-cusolver-cu12 11.7.1.2 pypi_0 pypi
[conda] nvidia-cusparse-cu12 12.5.4.2 pypi_0 pypi
[conda] nvidia-cusparselt-cu12 0.6.3 pypi_0 pypi
[conda] nvidia-ml-py 13.580.65 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.26.2 pypi_0 pypi
[conda] nvidia-nvjitlink-cu12 12.6.85 pypi_0 pypi
[conda] nvidia-nvtx-cu12 12.6.77 pypi_0 pypi
[conda] pyzmq 27.0.0 pypi_0 pypi
[conda] torch 2.7.1 pypi_0 pypi
[conda] torchaudio 2.7.1 pypi_0 pypi
[conda] torchvision 0.22.1 pypi_0 pypi
[conda] transformers 4.55.2 pypi_0 pypi
[conda] triton 3.3.1 pypi_0 pypi
==============================
vLLM Info
ROCM Version : Could not collect
Neuron SDK Version : N/A
vLLM Version : 0.10.1.1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PIX NODE NODE SYS SYS SYS SYS NODE NODE SYS SYS 0-23,48-71 0 N/A
GPU1 PIX X NODE NODE SYS SYS SYS SYS NODE NODE SYS SYS 0-23,48-71 0 N/A
GPU2 NODE NODE X PIX SYS SYS SYS SYS NODE NODE SYS SYS 0-23,48-71 0 N/A
GPU3 NODE NODE PIX X SYS SYS SYS SYS NODE NODE SYS SYS 0-23,48-71 0 N/A
GPU4 SYS SYS SYS SYS X PIX NODE NODE SYS SYS NODE NODE 24-47,72-95 1 N/A
GPU5 SYS SYS SYS SYS PIX X NODE NODE SYS SYS NODE NODE 24-47,72-95 1 N/A
GPU6 SYS SYS SYS SYS NODE NODE X PIX SYS SYS NODE NODE 24-47,72-95 1 N/A
GPU7 SYS SYS SYS SYS NODE NODE PIX X SYS SYS NODE NODE 24-47,72-95 1 N/A
NIC0 NODE NODE NODE NODE SYS SYS SYS SYS X PIX SYS SYS
NIC1 NODE NODE NODE NODE SYS SYS SYS SYS PIX X SYS SYS
NIC2 SYS SYS SYS SYS NODE NODE NODE NODE SYS SYS X PIX
NIC3 SYS SYS SYS SYS NODE NODE NODE NODE SYS SYS PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
==============================
Environment Variables
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_REQUIRE_CUDA=cuda>=12.1 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=525,driver<526 brand=unknown,driver>=525,driver<526 brand=nvidia,driver>=525,driver<526 brand=nvidiartx,driver>=525,driver<526 brand=geforce,driver>=525,driver<526 brand=geforcertx,driver>=525,driver<526 brand=quadro,driver>=525,driver<526 brand=quadrortx,driver>=525,driver<526 brand=titan,driver>=525,driver<526 brand=titanrtx,driver>=525,driver<526
NCCL_VERSION=2.17.1-1
NVIDIA_DRIVER_CAPABILITIES=compute,utility
NVIDIA_PRODUCT_NAME=CUDA
NVIDIA_CUDA_END_OF_LIFE=1
CUDA_VERSION=12.1.0
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY
🐛 Describe the bug
In vLLM version v0.10.1.1, the eagle3 speculative decoding method restricts target models to llama and qwen types, as defined in vllm/config.py. However, I found that modifying the model_type in the config.json file of the qwen3-14b-eagle3 draft model to "llama" allows vLLM to successfully load and run the model with eagle3. This suggests that qwen3 models are compatible with eagle3, but the current type checking is overly restrictive.
In vllm/config.py, the eagle3 method checks the target model's model_type:
eagle3_target_supported = ["llama", "qwen"]
if self.method == "eagle3" and self.target_model_config and not any(
supported_model in
self.target_model_config.hf_text_config.model_type
for supported_model in eagle3_target_supported):
raise ValueError(
f"Eagle3 is only supported for {eagle3_target_supported} models. " # noqa: E501
f"Got {self.target_model_config.hf_text_config.model_type=}")
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.