Skip to content

Conversation

@mmangkad
Copy link
Contributor

@mmangkad mmangkad commented Sep 20, 2025

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades FlashInfer to version v0.4.0rc1. The version updates in the Dockerfiles and setup.py are consistent with this goal. However, there is a critical issue in vllm/v1/attention/backends/flashinfer.py where an API call for the new FlashInfer version appears to be only partially updated. This is likely to cause a runtime error. Please see the specific comment for details.


try:
# Make sure we pass exactly 15 arguments for tensor core version
# Make sure we pass exactly 18 arguments for tensor core version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

While you've correctly updated the internal _cached_module.plan call for the new flashinfer version, the corresponding public self.plan method call (at lines 1025-1043) seems to have been missed. This public method is used for the initial warm-up when CUDA graphs are enabled.

If the public plan API also changed (which is highly likely given the internal API change), this will cause a TypeError at runtime during the warm-up. Please update the call at lines 1025-1043 to include any new arguments. Based on the changes to the internal call, it's likely that arguments such as window_right and allow_fp16_qk_reduction need to be added.

@mgoin
Copy link
Member

mgoin commented Oct 9, 2025

Resolved by #26326

@mgoin mgoin closed this Oct 9, 2025
@mmangkad mmangkad deleted the flashinfer-upgrade branch October 9, 2025 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants