[Bugfix]: Assertion error when using FlashInfer backend #25933

simondanielsson · 2025-09-30T06:54:43Z

Purpose

Test Plan

The following does not err any more:

OMP_NUM_THREADS=8 VLLM_USE_AITER_UNIFIED_ATTENTION=1 VLLM_ATTENTION_BACKEND=FLASHINFER VLLM_USE_FLASHINFER_MOE_FP8=1 vllm serve --async-scheduling --gpu-memory-utilization 0.8 --enable-auto-tool-choice --tool-call-parser hermes --model=Qwen/Qwen3-30B-A3B-Instruct-2507-FP8

and
with Qwen/Qwen3-4B-Instruct-2507-FP8.

Test Result

Runs successfully without error on a single L4 node.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

gemini-code-assist

Code Review

This pull request correctly fixes an AssertionError that occurs when using the FlashInfer backend. The original assertion, assert self.block_quant is None, was incorrect as self.block_quant is a boolean value and can never be None. The change to assert not self.block_quant accurately reflects the requirement that block quantization should be disabled for this code path. The addition of a type hint for the block_quant attribute is also a good improvement for code clarity. The changes are sound and resolve the bug.

jasl · 2025-09-30T08:42:43Z

Thank you! I'm testing this

simondanielsson · 2025-09-30T18:44:12Z

Works on my end on L4 with Qwen/Qwen3-4B-Instruct-2507-FP8 (don't have enough mem for 30B) - does it work for you @jasl?

jasl · 2025-09-30T18:46:58Z

Works on my end on L4 with Qwen/Qwen3-4B-Instruct-2507-FP8 (don't have enough mem for 30B) - does it work for you @jasl?

Sorry, I haven't finish the test, will continue later

jasl · 2025-09-30T21:10:04Z

Sorry for the late response. I can confirm that the fix works.
Thank you!

simondanielsson · 2025-10-01T07:56:50Z

Great to hear, then this PR should be ready for review.

yewentao256

LGTM, thanks for the work!

…#25933) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

…#25933) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>

…#25933) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

…#25933) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…#25933) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

…#25933) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…#25933) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

Update assertion to use boolean compare

17b81fb

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

simondanielsson requested review from mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners September 30, 2025 06:54

gemini-code-assist bot reviewed Sep 30, 2025

View reviewed changes

yewentao256 approved these changes Oct 2, 2025

View reviewed changes

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 2, 2025

Merge branch 'main' into bugfix/flashinfer-assertion-error

6602705

DarkLight1337 merged commit 432e1cb into vllm-project:main Oct 5, 2025
52 checks passed

micah-wil mentioned this pull request Nov 3, 2025

[Bug]: meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 vllm bench throughput regression on 2.9 RC on B200 #26320

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix]: Assertion error when using FlashInfer backend #25933

[Bugfix]: Assertion error when using FlashInfer backend #25933

Uh oh!

simondanielsson commented Sep 30, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

jasl commented Sep 30, 2025

Uh oh!

simondanielsson commented Sep 30, 2025

Uh oh!

jasl commented Sep 30, 2025

Uh oh!

jasl commented Sep 30, 2025

Uh oh!

simondanielsson commented Oct 1, 2025

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Bugfix]: Assertion error when using FlashInfer backend #25933

[Bugfix]: Assertion error when using FlashInfer backend #25933

Uh oh!

Conversation

simondanielsson commented Sep 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Runs successfully without error on a single L4 node.

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

jasl commented Sep 30, 2025

Uh oh!

simondanielsson commented Sep 30, 2025

Uh oh!

jasl commented Sep 30, 2025

Uh oh!

jasl commented Sep 30, 2025

Uh oh!

simondanielsson commented Oct 1, 2025

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

simondanielsson commented Sep 30, 2025 •

edited by github-actions bot

Loading