-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support #23337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support #23337
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR aims to add Eagle3 support for Qwen2, but it's missing a critical piece: Qwen2ForCausalLM must inherit from the SupportsEagle3 protocol. Without this, the model will not be recognized as supporting Eagle3. Please update the class definition and add the necessary import. I've also identified an issue in get_eagle3_aux_hidden_state_layers where it can produce invalid or duplicate layer indices for smaller models, and I've provided a more robust implementation. Lastly, please be aware of a potential pre-existing bug in Qwen2Model.forward related to how aux_hidden_state_layers is indexed when pipeline parallelism is used.
vllm/model_executor/models/qwen2.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of get_eagle3_aux_hidden_state_layers can produce duplicate or out-of-bounds indices for models with a small number of layers. For example, with num_layers=4, it returns (2, 2, 1), which contains a duplicate. With num_layers=2, it returns (2, 1, -1), where index 2 is out of bounds and -1 is invalid as a layer index.
To make this more robust, I suggest filtering for valid, unique layer indices. This will ensure the function behaves correctly for any model size.
def get_eagle3_aux_hidden_state_layers(self) -> tuple[int, ...]:
num_layers = len(self.model.layers)
if num_layers < 4:
# For models with fewer than 4 layers, the heuristic is not applicable.
# Returning a single middle layer is a safer default.
return (num_layers // 2,) if num_layers > 0 else ()
layers = (
2,
num_layers // 2,
num_layers - 3,
)
# Filter for unique and valid layer indices.
valid_layers = sorted(list(set(
layer for layer in layers if 0 <= layer < num_layers
)))
return tuple(valid_layers)dfd959c to
9bbe7e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you have the model explicitly inherit from SupportsEagle3 so it's easier to check which models support it?
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>
Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>
Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>
…oject#23162) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>
Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>
…3309) Signed-off-by: zhuangqh <zhuangqhc@gmail.com> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>
Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>
Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Danila Kirichko <d.kirichko@mts.ai>
58868dc to
e21887e
Compare
yewentao256
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work!
Could you also add E2E results? lm-eval for accuracy and vllm bench for performance comparison for with/without eagle3
|
@yewentao256, I used not hf model and spec dec that not enough good but there is some increase in throughput lm_eval script vllm bench script With SpecDecThroughput: 14.24 requests/s, 4242.10 total tokens/s, 1423.53 output tokens/s
Without SpecDecThroughput: 14.18 requests/s, 4224.43 total tokens/s, 1417.60 output tokens/s
|
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing
…roject#23337) Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Purpose
The purpose of this PR is to add missed functions for Qwen2ForCausalLM #21835
Test Plan
Since this is a two-line bug fix, the plan was to run it locally.
Test Result
With this change I successfully started Qwen2ForCausalLM with Eagle3