Skip to content

Conversation

@MatthewBonanni
Copy link
Contributor

@MatthewBonanni MatthewBonanni commented Oct 9, 2025

Purpose

This PR implements speculative decoding support for the FlashMLA backend.

NOTE: the comment about the intermittent test failure was true prior to this PR, I just made a note of it.

cc @LucasWilkinson

Test Plan

pytest tests/v1/attention/test_mla_backends.py

Test Result

Passes


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the v1 label Oct 9, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces speculative decoding support for the FlashMLA backend, which is a significant feature enhancement. The changes are well-structured, particularly the introduction of the QueryLenSupport enum to clearly define the capabilities of different backends. The test cases have been updated appropriately to cover speculative decoding scenarios. My main feedback is a minor performance improvement in the test suite by moving a repeated import out of a loop.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM but lets have @benchislett look too

@LucasWilkinson LucasWilkinson enabled auto-merge (squash) October 13, 2025 01:40
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 13, 2025
@mergify
Copy link

mergify bot commented Oct 13, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 13, 2025
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@mergify mergify bot removed the needs-rebase label Oct 13, 2025
Copy link
Collaborator

@benchislett benchislett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor issue flagged. otherwise LGTM, please ping me when resolved

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@MatthewBonanni
Copy link
Contributor Author

@benchislett thanks for your review! I've addressed your comments

Copy link
Collaborator

@benchislett benchislett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks!

@benchislett benchislett enabled auto-merge (squash) October 14, 2025 17:10
@benchislett benchislett merged commit 82af928 into vllm-project:main Oct 14, 2025
48 checks passed
@MatthewBonanni MatthewBonanni deleted the flashmla_spec_decode branch October 14, 2025 19:42
Jonahcb pushed a commit to Jonahcb/vllm that referenced this pull request Oct 15, 2025
…6541)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Jonah Bernard <jb2528@cornell.edu>
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
…6541)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…6541)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…6541)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…6541)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…6541)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants