Disable FlashInfer sampler by default #26859

mgoin · 2025-10-15T00:24:31Z

Purpose

There have been increasing reports of correctness issues or IMA with FlashInfer's top-p & top-k sampling kernel (see #26480 (comment)). For instance, it seems it can generates the same output even when the temperature is quite high (even though the seed is not set). vLLM generates different results (expectedly) once the kernel is disabled.

Since flashinfer-python is a default dep of vLLM CUDA now, many more users would be using this kernel by default. Let us disable it by default for now so users can opt-in

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: mgoin <mgoin64@gmail.com>

gemini-code-assist

Code Review

This pull request correctly disables the FlashInfer sampler by default, requiring users to opt-in by setting VLLM_USE_FLASHINFER_SAMPLER=1. The change from envs.VLLM_USE_FLASHINFER_SAMPLER is not False to a direct boolean check simplifies the logic and makes the default behavior consistent. The corresponding log message is also appropriately changed from a warning to a debug message. The implementation is sound and improves code clarity.

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>

Signed-off-by: mgoin <mgoin64@gmail.com>

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

Signed-off-by: mgoin <mgoin64@gmail.com>

…#26859) (#295) vllm-project/vllm#26859 Signed-off-by: mgoin <mgoin64@gmail.com>

Signed-off-by: mgoin <mgoin64@gmail.com>

Disable FlashInfer sampler by default

9bd64cf

Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin requested review from 22quinn, houseroad and njhill as code owners October 15, 2025 00:24

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 15, 2025

mergify bot added the v1 label Oct 15, 2025

gemini-code-assist bot reviewed Oct 15, 2025

View reviewed changes

tlrmchlsmth approved these changes Oct 15, 2025

View reviewed changes

tlrmchlsmth enabled auto-merge (squash) October 15, 2025 00:50

tlrmchlsmth merged commit e66d787 into vllm-project:main Oct 15, 2025
52 of 53 checks passed

mgoin mentioned this pull request Oct 15, 2025

[UX] Fallback to native implementation when flashinfer sampler failed to compile #26799

Open

5 tasks

bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025

Disable FlashInfer sampler by default (vllm-project#26859)

75ceedc

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>

jeejeelee mentioned this pull request Oct 16, 2025

[Kernel] Lazy import FlashInfer #26977

Merged

5 tasks

DarkLight1337 mentioned this pull request Oct 16, 2025

[Bug]: Hybrid Attention models broken after switching to flashinfer 0.4 (tested on Granite 4.0 H, Qwen3-Next, Jamba-3B, Nemotron-H-8b) #26936

Open

1 task

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

Disable FlashInfer sampler by default (vllm-project#26859)

a89590d

Signed-off-by: mgoin <mgoin64@gmail.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

Disable FlashInfer sampler by default (vllm-project#26859)

4a94dbd

Signed-off-by: mgoin <mgoin64@gmail.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

Disable FlashInfer sampler by default (vllm-project#26859)

5cf31eb

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

Disable FlashInfer sampler by default (vllm-project#26859)

4d281f8

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

Disable FlashInfer sampler by default (vllm-project#26859)

fa36257

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

Disable FlashInfer sampler by default (vllm-project#26859)

a2e43af

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

noooop mentioned this pull request Oct 28, 2025

[Installation]: FlashInfer Dependency issue due to pre-release apache-tvm-ffi #27476

Closed

1 task

npanpaliya pushed a commit to odh-on-pz/vllm-cpu that referenced this pull request Oct 30, 2025

Disable FlashInfer sampler by default (vllm-project/vllm#26859)

16f6b81

Signed-off-by: mgoin <mgoin64@gmail.com>

npanpaliya pushed a commit to odh-on-pz/vllm-cpu that referenced this pull request Oct 30, 2025

Cherry-pick: Disable FlashInfer sampler by default (vllm-project/vllm…

9b63d25

…#26859) (#295) vllm-project/vllm#26859 Signed-off-by: mgoin <mgoin64@gmail.com>

LucasWilkinson mentioned this pull request Oct 31, 2025

[Bug]: illegal memory access when there are multiple concurrent request #23814

Open

1 task

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

Disable FlashInfer sampler by default (vllm-project#26859)

dbe9856

Signed-off-by: mgoin <mgoin64@gmail.com>

Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025

Disable FlashInfer sampler by default (vllm-project#26859)

001d3bc

Signed-off-by: mgoin <mgoin64@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Disable FlashInfer sampler by default #26859

Disable FlashInfer sampler by default #26859

Uh oh!

mgoin commented Oct 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Disable FlashInfer sampler by default #26859

Disable FlashInfer sampler by default #26859

Uh oh!

Conversation

mgoin commented Oct 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mgoin commented Oct 15, 2025 •

edited by github-actions bot

Loading