-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Speculators][Speculative Decoding] Add Qwen Eagle3 Support #21835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for Qwen models in speculative decoding by adding a generic 'speculators' configuration format. The changes are extensive, touching model configuration, argument parsing, and adding new config classes for speculators.
My review has identified a couple of critical bugs:
- An incorrect loop logic in
vllm/config.pythat would lead to faulty validation of supported models. - A potential
TypeErrorinvllm/transformers_utils/configs/speculators/base.pydue to unsafe dictionary access.
I've also pointed out a weak assertion in a new test file that should be strengthened to prevent future regressions.
Please address these issues to ensure the stability and correctness of this new feature.
|
This pull request has merge conflicts that must be resolved before it can be |
e347164 to
f766c5a
Compare
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
6612d13 to
95eb570
Compare
| self.make_empty_intermediate_tensors = ( | ||
| self.model.make_empty_intermediate_tensors) | ||
|
|
||
| def set_aux_hidden_state_layers(self, layers: tuple[int]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add an explicit interface for supporting eagle3 (SupportsEagle3) in another PR
…ject#21835) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
…ject#21835) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
…ject#21835) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Noam Gat <noamgat@gmail.com>
…ject#21835) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
…ject#21835) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
…ject#21835) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
…ject#21835) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Purpose
speculatorsconfig support #21345Test Plan