Skip to content

Conversation

@mgoin
Copy link
Member

@mgoin mgoin commented Oct 15, 2025

Purpose

There have been increasing reports of correctness issues or IMA with FlashInfer's top-p & top-k sampling kernel (see #26480 (comment)). For instance, it seems it can generates the same output even when the temperature is quite high (even though the seed is not set). vLLM generates different results (expectedly) once the kernel is disabled.

Since flashinfer-python is a default dep of vLLM CUDA now, many more users would be using this kernel by default. Let us disable it by default for now so users can opt-in

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: mgoin <mgoin64@gmail.com>
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 15, 2025
@mergify mergify bot added the v1 label Oct 15, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly disables the FlashInfer sampler by default, requiring users to opt-in by setting VLLM_USE_FLASHINFER_SAMPLER=1. The change from envs.VLLM_USE_FLASHINFER_SAMPLER is not False to a direct boolean check simplifies the logic and makes the default behavior consistent. The corresponding log message is also appropriately changed from a warning to a debug message. The implementation is sound and improves code clarity.

@tlrmchlsmth tlrmchlsmth enabled auto-merge (squash) October 15, 2025 00:50
@tlrmchlsmth tlrmchlsmth merged commit e66d787 into vllm-project:main Oct 15, 2025
52 of 53 checks passed
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
@jeejeelee jeejeelee mentioned this pull request Oct 16, 2025
5 tasks
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
npanpaliya pushed a commit to odh-on-pz/vllm-cpu that referenced this pull request Oct 30, 2025
npanpaliya pushed a commit to odh-on-pz/vllm-cpu that referenced this pull request Oct 30, 2025
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants