-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
[BugFix][Spec Decode] Fix out-of-range index triggered by eagle3; re-enable test for LlamaForCausalLMEagle3 #24392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes an IndexError that occurred with eagle3 speculative decoding models. The fix in vllm/model_executor/models/llama.py properly adds a bounds check before accessing the layer_types list, preventing a crash. The corresponding change in vllm/model_executor/models/registry.py to re-enable the LlamaForCausalLMEagle3 model is a logical consequence of this bug fix. The changes are accurate and well-implemented, and I have no further recommendations.
|
Shall we re-enable everything that was disabled in #22611? |
Sure. Updated on the new commit. Local pytest has passed through |
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
ywang96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - left a nit
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
vllm/model_executor/models/llama.py
Outdated
| sliding_window = None | ||
| if layer_types := getattr(config, "layer_types", None): | ||
| if (layer_types := getattr(config, "layer_types", | ||
| None)) and layer_idx < len(layer_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we instead have an assert that layer_idx < len(layer_types)? That would help catch bug which would get pass the current if check silently and might manifest in some quality drop if layer_idx is over layer_types due to misconfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, which model is using sliding window in the eagle head?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In AngelSlim/Qwen3-14B_eagle3, it has layer_types: ['full_attention'] and layer_idx: 40. If we use assertion, it will be stopped here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct to assume that layer_idx of 40 is layer idx corresponding to the eagle 3 draft head? What is the value of len(layer_types) when layer_idx is 40?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I printed the number right before the bug.
(EngineCore_0 pid=830911) layer_types: ['full_attention']
(EngineCore_0 pid=830911) layer_idx: 40
which model is using sliding window in the eagle head?
None in my understanding, it is only to avoid entering the codepath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current fix wont work for draft having SWA. The fix bypasses applying the SWA if draft had layer_types: ['sliding_attention']. The foolproof fix would be to redefine layer_types in draft as draft.layer_types = target.layer_types + draft.layer_types and this new value would be passed to llama.py from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. But I'm not sure whether other eagle3 model holds this assumption. Let me have some tests.
| # Eagle model name should follow naming convention of | ||
| # LlamaForCausalLM -> EagleLlamaForCausalLM | ||
| # LlamaForCausalLM -> Eagle3LlamaForCausalLM | ||
| # LlamaForCausalLMEagle3 -> LlamaForCausalLMEagle3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt get this part as to why do we need both LlamaForCausalLMEagle3 and Eagle3LlamaForCausalLM to represent the same thing, i.e., llama Eagle 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment means we are able to handle both cases. There are two different names because people who creates the eagle model defines it that way. We cannot control what they have in the HF repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I definitely agree we should unify it (maybe in the future PR). It took me some time to understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Just to confirm,
LlamaForCausalLMandLlamaForCausalLMEagle3are model names defined in HF?- there is no
LlamaForCausalLMEagleto define EAGLE 1 in HF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Yes. Both
yuhuili/EAGLE-LLaMA3.1-Instruct-8Bandyuhuili/EAGLE3-LLaMA3.1-Instruct-8BusedLlamaForCausalLM, even though they are eagle1 and eagle3. Newer models likeAngelSlim/Qwen3-8B_eagle3usesLlamaForCausalLMEagle3 -
None, to my understanding. If there is a popular one, there should be some bug reported.
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
@ekagra-ranjan Updated according to your suggestion. PTAL. I changed a little bit by using Tested with |
ekagra-ranjan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing SWA as well. LGTM
|
error: |
|
This PR is causing Basic Models Test to fail on main |
Hi @DarkLight1337 Do you know how to reproduce in the env as in the CI? Directly run I pulled the latest main and it works fine e2e. 😕 |
|
Maybe you can use the error log to help debug: https://buildkite.com/vllm/ci/builds/30179#01993427-9309-4bf2-a37a-e63f8b87dc4f |
Thanks! It could be related to how we use the layer_idx. I'm looking closer at it. |
|
Hi @ggg-s I replied to you at: #23464 (comment) |
|
My CI is failing on an PR (#24127) that doesn't touch code here https://buildkite.com/vllm/ci/builds/30242#01993572-2b87-4ab4-a155-8c7b1a4a6570 Can we disable the CI test again as it's blocking forward progress for other PRs? |
Just a sec, on the final testing on a PR to fix it. I saw your PR is only metric-related, so it should be okay to go if it is the only failure. It is known, you can ask in the slack. Sorry for the inconvience. |
|
Fixing PR in: #24613 |
…enable test for LlamaForCausalLMEagle3 (vllm-project#24392) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
…enable test for LlamaForCausalLMEagle3 (vllm-project#24392) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
…enable test for LlamaForCausalLMEagle3 (vllm-project#24392) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…enable test for LlamaForCausalLMEagle3 (vllm-project#24392) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Purpose
qwen3Models ineagle3Speculative Decoding #23464cc: @lec77 @22quinn
Test Result
Before PR:
After PR:
Server successfully launched.
Command:
Unit tests:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.