[Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support #23337

PapaGoose · 2025-08-21T11:22:16Z

Purpose

The purpose of this PR is to add missed functions for Qwen2ForCausalLM #21835

Test Plan

Since this is a two-line bug fix, the plan was to run it locally.

Test Result

With this change I successfully started Qwen2ForCausalLM with Eagle3

github-actions · 2025-08-21T11:22:27Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This PR aims to add Eagle3 support for Qwen2, but it's missing a critical piece: Qwen2ForCausalLM must inherit from the SupportsEagle3 protocol. Without this, the model will not be recognized as supporting Eagle3. Please update the class definition and add the necessary import. I've also identified an issue in get_eagle3_aux_hidden_state_layers where it can produce invalid or duplicate layer indices for smaller models, and I've provided a more robust implementation. Lastly, please be aware of a potential pre-existing bug in Qwen2Model.forward related to how aux_hidden_state_layers is indexed when pipeline parallelism is used.

gemini-code-assist · 2025-08-21T11:27:15Z

vllm/model_executor/models/qwen2.py

The current implementation of get_eagle3_aux_hidden_state_layers can produce duplicate or out-of-bounds indices for models with a small number of layers. For example, with num_layers=4, it returns (2, 2, 1), which contains a duplicate. With num_layers=2, it returns (2, 1, -1), where index 2 is out of bounds and -1 is invalid as a layer index.

To make this more robust, I suggest filtering for valid, unique layer indices. This will ensure the function behaves correctly for any model size.

def get_eagle3_aux_hidden_state_layers(self) -> tuple[int, ...]: num_layers = len(self.model.layers) if num_layers < 4: # For models with fewer than 4 layers, the heuristic is not applicable. # Returning a single middle layer is a safer default. return (num_layers // 2,) if num_layers > 0 else () layers = ( 2, num_layers // 2, num_layers - 3, ) # Filter for unique and valid layer indices. valid_layers = sorted(list(set( layer for layer in layers if 0 <= layer < num_layers ))) return tuple(valid_layers)

DarkLight1337

Can you have the model explicitly inherit from SupportsEagle3 so it's easier to check which models support it?

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

…oject#23162) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

…3309) Signed-off-by: zhuangqh <zhuangqhc@gmail.com> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

yewentao256

Thanks for the work!
Could you also add E2E results? lm-eval for accuracy and vllm bench for performance comparison for with/without eagle3

PapaGoose · 2025-08-22T14:13:40Z

@yewentao256, I used not hf model and spec dec that not enough good but there is some increase in throughput
config.json

{
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 125778,
  "eos_token_id": 125780,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 27648,
  "max_position_embeddings": 32768,
  "max_window_layers": 70,
  "model_type": "qwen2",
  "num_attention_heads": 40,
  "num_hidden_layers": 64,
  "num_key_value_heads": 8,
  "quantization_config": {
    "batch_size": 1,
    "bits": 8,
    "block_name_to_quantize": null,
    "cache_block_outputs": true,
    "damp_percent": 0.1,
    "desc_act": false,
    "exllama_config": {
      "version": 1
    },
    "group_size": 128,
    "max_input_length": null,
    "model_seqlen": null,
    "module_name_preceding_first_block": null,
    "modules_in_block_to_quantize": null,
    "pad_token_id": null,
    "quant_method": "gptq",
    "sym": true,
    "tokenizer": null,
    "true_sequential": true,
    "use_cuda_fp16": false,
    "use_exllama": true
  },
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "rope_scaling": {"factor":4.0,"rope_type":"yarn","original_max_position_embeddings":32768},
  "torch_dtype": "float16",
  "transformers_version": "4.48.0",
  "use_cache": false,
  "use_sliding_window": false,
  "vocab_size": 125800
}

lm_eval script

lm_eval --model local-completions --model_args base_url=http://localhost:8999/v1/completions,model=...,num_concurrent=256,tokenizer=... --tasks gsm8k --limit 200

vllm bench script

vllm bench throughput \
    --tensor_parallel_size=2 \
    --enforce-eager \
    --quantization gptq \
    --dataset-name=hf \
    --dataset-path=likaixin/InstructCoder \
    --model=... \
    --input-len=1000 \
    --output-len=100 \
    --num-prompts=2048 \
    --async-engine \
    --speculative-config '{"model": "...", "num_speculative_tokens": 3, "method": "eagle3"}'

With SpecDec

Throughput: 14.24 requests/s, 4242.10 total tokens/s, 1423.53 output tokens/s
Total num prompt tokens: 405502
Total num output tokens: 204800

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.57	±	0.0351
		strict-match	5	exact_match	↑	0.81	±	0.0278

Without SpecDec

Throughput: 14.18 requests/s, 4224.43 total tokens/s, 1417.60 output tokens/s
Total num prompt tokens: 405502
Total num output tokens: 204800

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.57	±	0.0351
		strict-match	5	exact_match	↑	0.82	±	0.0272

DarkLight1337

Thanks for fixing

…roject#23337)

…roject#23337) Signed-off-by: Xiao Yu <xiao.yu@amd.com>

…roject#23337)

PapaGoose requested a review from sighingnow as a code owner August 21, 2025 11:22

mergify bot added the qwen Related to Qwen models label Aug 21, 2025

gemini-code-assist bot reviewed Aug 21, 2025

View reviewed changes

PapaGoose force-pushed the spec_eagle3_qwen2 branch from dfd959c to 9bbe7e2 Compare August 21, 2025 11:41

DarkLight1337 reviewed Aug 21, 2025

View reviewed changes

DarkLight1337 and others added 9 commits August 21, 2025 18:50

[Refactor] Simplify code for MM budget (vllm-project#23310)

410d9d3

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

Qwen2ForCausalLM updated

5cd5083

Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

SupportsEagle3 added

52cd91c

Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

[Doc] Fix batch-level DP example (vllm-project#23325)

47375bc

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

[Performance] V1 Pooling Models E2E Performance Optimization (vllm-pr…

276270a

…oject#23162) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

[V1] Remove unnecessary check for main thread (vllm-project#23298)

b5124b3

Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

[Bugfix] set system_message in phi4mini chat template (vllm-project#2…

4115f76

…3309) Signed-off-by: zhuangqh <zhuangqhc@gmail.com> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

[Multimodal] Always enable hashing mm data (vllm-project#23308)

bdf8b8d

Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

[ci/build] Fix abi tag for aarch64 (vllm-project#23329)

e21887e

Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>

PapaGoose force-pushed the spec_eagle3_qwen2 branch from 58868dc to e21887e Compare August 21, 2025 15:51

PapaGoose requested review from WoosukKwon, alexm-redhat, comaniac, hmellor, houseroad, mgoin, njhill, patrickvonplaten, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256, youkaichao and ywang96 as code owners August 21, 2025 15:51

Merge branch 'main' into spec_eagle3_qwen2

287d017

mergify bot removed the tpu Related to Google TPUs label Aug 21, 2025

yewentao256 reviewed Aug 21, 2025

View reviewed changes

PapaGoose requested review from DarkLight1337 and yewentao256 August 22, 2025 14:14

DarkLight1337 approved these changes Aug 22, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) August 22, 2025 14:18

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 22, 2025

DarkLight1337 mentioned this pull request Aug 22, 2025

[feat] Support EAGLE for Qwen2 #23158

Open

4 tasks

Merge branch 'main' into spec_eagle3_qwen2

2c29de4

DarkLight1337 merged commit 88491c1 into vllm-project:main Aug 22, 2025
40 checks passed

github-project-automation bot moved this to Done in Tool Calling Aug 22, 2025

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support (vllm-p…

f1d030d

…roject#23337)

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support (vllm-p…

36e2a87

…roject#23337) Signed-off-by: Xiao Yu <xiao.yu@amd.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support (vllm-p…

555e30c

…roject#23337)

mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025

[Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support (vllm-p…

19f9dc2

…roject#23337)

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025

[Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support (vllm-p…

f9fb0f1

…roject#23337)

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support (vllm-p…

182e1b3

…roject#23337)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support #23337

[Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support #23337

Uh oh!

PapaGoose commented Aug 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 21, 2025

Uh oh!

DarkLight1337 left a comment •

edited

Loading

Uh oh!

yewentao256 left a comment

Uh oh!

PapaGoose commented Aug 22, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Uh oh!

[Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support #23337

[Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support #23337

Uh oh!

Conversation

PapaGoose commented Aug 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

PapaGoose commented Aug 22, 2025

With SpecDec

Without SpecDec

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

PapaGoose commented Aug 21, 2025 •

edited by github-actions bot

Loading

DarkLight1337 left a comment •

edited

Loading