-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Bugfix]: Assertion error when using FlashInfer backend #25933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix]: Assertion error when using FlashInfer backend #25933
Conversation
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes an AssertionError that occurs when using the FlashInfer backend. The original assertion, assert self.block_quant is None, was incorrect as self.block_quant is a boolean value and can never be None. The change to assert not self.block_quant accurately reflects the requirement that block quantization should be disabled for this code path. The addition of a type hint for the block_quant attribute is also a good improvement for code clarity. The changes are sound and resolve the bug.
|
Thank you! I'm testing this |
|
Works on my end on L4 with |
Sorry, I haven't finish the test, will continue later |
|
Sorry for the late response. I can confirm that the fix works. |
|
Great to hear, then this PR should be ready for review. |
yewentao256
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the work!
…#25933) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
…#25933) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>
…#25933) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…#25933) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…#25933) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…#25933) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…#25933) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…#25933) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Purpose
Fixes #25928.
Test Plan
The following does not err any more:
and
with Qwen/Qwen3-4B-Instruct-2507-FP8.
Test Result
Runs successfully without error on a single L4 node.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.