[WIP][V1][Spec Decode] EAGLE tree-attention #17560

wwl2755 · 2025-05-01T20:48:34Z

As mentioned in #15901, currently we only support top-1 selection from the candidates from the EAGLE model (we call it chain-draft), and in EAGLE and EAGLE-2, both are claim select top-k tokens from each forward pass can benefit the acceptance rate, so we want to support it (we call it tree-draft).

As this would be a big change, I would like to work on a WIP PR and would be appreciated to receive any comments/suggestions/discussion during implementation.

Design Doc: https://docs.google.com/document/d/1mMoSicPPMMzaE_T5Zk2SnTderw1OXRUs2T16JxfVGCQ/edit?usp=sharing

cc: @LiuXiaoxuanPKU @WoosukKwon

Construct tree structure
Select top-k instead of top-1 from the logits from targer model
Selection in Level-0 and level-1
Selection in Level-2 & 2+
Node/path selection after expansion
Attention metadata & attention mask, customized ROPE embedding
Now the nodes without ancestor-relation could see each other (which is incorrect)
Rejection logic
KV cache & CUDA graph (future PRs)
E2e and unit tests

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

github-actions · 2025-05-01T20:48:44Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

morgendave · 2025-05-02T16:37:58Z

vllm/v1/spec_decode/eagle.py

 PADDING_SLOT_ID = -1


+class TreeArray:


Expanding in real time is going to be very costly. Dynamic tree in actual production could be less effective

Good point! That's why I pre-allocate the "max_nodes" in advance. The difference from chain drafting is the size would be larger and number of tokens passed to forward pass is larger. The benefit can be longer acceptance length, which could reduce forward passes in target model.

I understand that. We also need to have logits shifting logic for sampling and tree dynamic in actual use might have less efficiency in drafting. Maybe we could start with support for a static tree as well

Yes. the sampling logic should be changed also. It is not included yet. You can see a tracker of the progress in: https://docs.google.com/document/d/1mMoSicPPMMzaE_T5Zk2SnTderw1OXRUs2T16JxfVGCQ/edit?usp=sharing

And IMO, from static tree to dynamic tree, it won't introduce much difference (select all/top-k to expand & rerank logic). The major differene are from the tree structure comparing with the chain draft. But I'm open to community's opinions on which should we target on first.

morgendave · 2025-05-02T17:05:16Z

vllm/v1/spec_decode/eagle.py

+            with set_forward_context(attn_metadata,
+                                     self.vllm_config,
+                                     num_tokens=input_batch_size):
+                last_hidden_states, output_hidden_states = self.model(


Curious in ROPE kernel do we already take into consideration that positions can be customized?

Thanks for bringing this up! IIUC, in the rotary_embedding.py, we could pass an offsets to the forward function.

We have to custom the logic since different path have been mixed together and I would categorize it in "Attention metadata & attention mask" in the tracker. For now, it is only a place-holder.

mergify · 2025-05-23T17:27:35Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wwl2755.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

mergify · 2025-07-31T17:00:34Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wwl2755.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

liunianxuxie · 2025-08-26T15:17:43Z

vllm/v1/spec_decode/eagle.py

+            with set_forward_context(tree_per_layer_attn_metadata,
+                                     self.vllm_config,
+                                     num_tokens=input_batch_size):
+                last_hidden_states, output_hidden_states = self.model(


I tried to use the tree_draft_propsoe you provided, but I got error like this

wwl2755 · 2025-09-01T21:15:40Z

Close because tree attention was supported in #20401

wwl2755 added 3 commits May 1, 2025 19:53

[1/N] EAGLE tree-attention

ad2f7ec

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

minor fix

1b96b5f

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

clean

b3a8333

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

wwl2755 requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners May 1, 2025 20:48

mergify bot added the v1 label May 1, 2025

wwl2755 marked this pull request as draft May 1, 2025 20:49

morgendave reviewed May 2, 2025

View reviewed changes

houseroad requested a review from zixi-qi May 5, 2025 05:26

markmc added the speculative-decoding label May 8, 2025

wwl2755 mentioned this pull request May 19, 2025

[Feature]: Tree-Attention Support for Speculative Decoding #18327

Open

1 task

yesredpig mentioned this pull request May 23, 2025

[Roadmap] vLLM Roadmap Q2 2024 #3861

Closed

65 tasks

mergify bot added the needs-rebase label May 23, 2025

wwl2755 added 2 commits May 27, 2025 06:14

use global prob

a83b144

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

merge main

cebcdb7

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

mergify bot removed the needs-rebase label May 27, 2025

fix

6ae5a6e

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

markmc mentioned this pull request May 27, 2025

Add documentation for expectations and standards related to model formats/configs vllm-project/speculators#24

Open

mergify bot added the needs-rebase label Jul 31, 2025

liunianxuxie reviewed Aug 26, 2025

View reviewed changes

wwl2755 closed this Sep 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP][V1][Spec Decode] EAGLE tree-attention #17560

[WIP][V1][Spec Decode] EAGLE tree-attention #17560

Uh oh!

wwl2755 commented May 1, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 1, 2025

Uh oh!

morgendave May 2, 2025

Uh oh!

wwl2755 May 2, 2025

Uh oh!

morgendave May 2, 2025

Uh oh!

wwl2755 May 2, 2025

Uh oh!

morgendave May 2, 2025

Uh oh!

wwl2755 May 2, 2025

Uh oh!

mergify bot commented May 23, 2025

Uh oh!

mergify bot commented Jul 31, 2025

Uh oh!

liunianxuxie Aug 26, 2025

Uh oh!

wwl2755 commented Sep 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[WIP][V1][Spec Decode] EAGLE tree-attention #17560

[WIP][V1][Spec Decode] EAGLE tree-attention #17560

Uh oh!

Conversation

wwl2755 commented May 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 1, 2025

Uh oh!

morgendave May 2, 2025

Choose a reason for hiding this comment

Uh oh!

wwl2755 May 2, 2025

Choose a reason for hiding this comment

Uh oh!

morgendave May 2, 2025

Choose a reason for hiding this comment

Uh oh!

wwl2755 May 2, 2025

Choose a reason for hiding this comment

Uh oh!

morgendave May 2, 2025

Choose a reason for hiding this comment

Uh oh!

wwl2755 May 2, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented May 23, 2025

Uh oh!

mergify bot commented Jul 31, 2025

Uh oh!

liunianxuxie Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

wwl2755 commented Sep 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wwl2755 commented May 1, 2025 •

edited by github-actions bot

Loading