-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
[Attention][Spec Decode] FlashMLA spec decode support #26541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Attention][Spec Decode] FlashMLA spec decode support #26541
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces speculative decoding support for the FlashMLA backend, which is a significant feature enhancement. The changes are well-structured, particularly the introduction of the QueryLenSupport enum to clearly define the capabilities of different backends. The test cases have been updated appropriately to cover speculative decoding scenarios. My main feedback is a minor performance improvement in the test suite by moving a repeated import out of a loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
dae6cb0 to
b11a56e
Compare
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! LGTM but lets have @benchislett look too
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor issue flagged. otherwise LGTM, please ping me when resolved
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
@benchislett thanks for your review! I've addressed your comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks!
…6541) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Jonah Bernard <jb2528@cornell.edu>
…6541) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: bbartels <benjamin@bartels.dev>
…6541) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
…6541) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
…6541) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…6541) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
This PR implements speculative decoding support for the FlashMLA backend.
NOTE: the comment about the intermittent test failure was true prior to this PR, I just made a note of it.
cc @LucasWilkinson
Test Plan
pytest tests/v1/attention/test_mla_backends.pyTest Result
Passes
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.