Skip to content

Conversation

@vadiklyutiy
Copy link
Contributor

@vadiklyutiy vadiklyutiy commented Oct 10, 2025

Purpose

After fixing #24486 Qwen3-next with FlashInfer full attn start working without MTP.
But with MTP it fails.

The reason we choose incorrect attn metadata type for draft model (choose GDN instead of full attn).

Fix it.

Test Result

Qwen3-next with MTP works now.

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug in attention metadata building for Qwen3-next with MTP by introducing a more robust method _get_attention_metadata_builder to find the correct attention metadata builder. The change is logical and effectively addresses the issue. I've added one high-severity comment to handle a potential IndexError in the new helper function if no draft attention layers are found.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

chosen_layer = self.attn_layer_names[0]
for kv_cache_group in self.runner.attn_groups:
for attn_group in kv_cache_group:
if chosen_layer in attn_group.layer_names:
builder = attn_group.get_metadata_builder()

P1 Badge Respect current ubatch when fetching attention metadata builder

The previous implementation selected the attention metadata builder using dbo_current_ubatch_id() so each DBO ubatch used its own builder instance. _get_attention_metadata_builder() now calls attn_group.get_metadata_builder() with no argument, which defaults to ubatch_id=0. When parallel_config.enable_dbo is true (the runner allocates two builders per group), both ubatches will now reuse builder 0 and can overwrite each other’s buffers while building metadata, leading to incorrect attention metadata or races during drafting. This helper should still select the builder corresponding to the current ubatch id.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice; theres already infra, LGTM!

@LucasWilkinson LucasWilkinson enabled auto-merge (squash) October 10, 2025 05:12
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 10, 2025
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
auto-merge was automatically disabled October 10, 2025 15:24

Head branch was pushed to by a user without write access

@benchislett benchislett merged commit e94cfd5 into vllm-project:main Oct 10, 2025
45 checks passed
@benchislett benchislett deleted the vadim/qwen3-next-mtp-fix branch October 10, 2025 18:59
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants