-
-
Couldn't load subscription status.
- Fork 10.8k
[BUG] Qwen3-next MTP. Fix attn metadata build bug #26564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Qwen3-next MTP. Fix attn metadata build bug #26564
Conversation
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes a bug in attention metadata building for Qwen3-next with MTP by introducing a more robust method _get_attention_metadata_builder to find the correct attention metadata builder. The change is logical and effectively addresses the issue. I've added one high-severity comment to handle a potential IndexError in the new helper function if no draft attention layers are found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
vllm/vllm/v1/spec_decode/eagle.py
Lines 1039 to 1044 in 8c913a5
| chosen_layer = self.attn_layer_names[0] | |
| for kv_cache_group in self.runner.attn_groups: | |
| for attn_group in kv_cache_group: | |
| if chosen_layer in attn_group.layer_names: | |
| builder = attn_group.get_metadata_builder() |
The previous implementation selected the attention metadata builder using dbo_current_ubatch_id() so each DBO ubatch used its own builder instance. _get_attention_metadata_builder() now calls attn_group.get_metadata_builder() with no argument, which defaults to ubatch_id=0. When parallel_config.enable_dbo is true (the runner allocates two builders per group), both ubatches will now reuse builder 0 and can overwrite each other’s buffers while building metadata, leading to incorrect attention metadata or races during drafting. This helper should still select the builder corresponding to the current ubatch id.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice; theres already infra, LGTM!
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Head branch was pushed to by a user without write access
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Purpose
After fixing #24486 Qwen3-next with FlashInfer full attn start working without MTP.
But with MTP it fails.
The reason we choose incorrect attn metadata type for draft model (choose GDN instead of full attn).
Fix it.
Test Result
Qwen3-next with MTP works now.