Skip to content

Conversation

@elvischenv
Copy link
Contributor

@elvischenv elvischenv commented Sep 25, 2025

Purpose

Support FP8-qkv Flashinfer TRTLLM sinks attention.
Note: require flashinfer v0.4.0rc4(updating in #26326)
flashinfer-ai/flashinfer#1758

Test Plan && Test Result

Kernel unit test:

tests/kernels/attention/test_flashinfer_trtllm_attention.py

===== 224 passed, 16 skipped in 30.94s ====

E2E accuracy:

kv_cache_dtype=fp8

[{'eval_name': 'gpqa', 'model_name': 'gpt-oss-120b-high_temp1.0_20250925_081936', 'metric': 0.7904040404040404}]
[{'eval_name': 'aime25', 'model_name': 'gpt-oss-120b-high_temp1.0_20250925_084243', 'metric': 0.925}]

kv_cache_dtype=auto

[{'eval_name': 'gpqa', 'model_name': 'gpt-oss-120b-high_temp1.0_20250925_090839', 'metric': 0.7910353535353535}]
[{'eval_name': 'aime25', 'model_name': 'gpt-oss-120b-high_temp1.0_20250925_093344', 'metric': 0.9125}]

E2E perf:

kv_cache_dtype=fp8, 6.9% perf gain

Output token throughput (tok/s):         22427.23

kv_cache_dtype=auto

Output token throughput (tok/s):         20977.07

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the gpt-oss Related to GPT-OSS models label Sep 25, 2025
@elvischenv elvischenv force-pushed the elvischenv/support-fp8-sinks-trtllm-attn branch from 63d414b to ac1611c Compare September 26, 2025 07:19
@mergify
Copy link

mergify bot commented Oct 7, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @elvischenv.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 7, 2025
@elvischenv elvischenv force-pushed the elvischenv/support-fp8-sinks-trtllm-attn branch from ac1611c to 8d6c871 Compare October 7, 2025 02:17
@mergify mergify bot removed the needs-rebase label Oct 7, 2025
@elvischenv elvischenv force-pushed the elvischenv/support-fp8-sinks-trtllm-attn branch from 8d6c871 to 9a53778 Compare October 7, 2025 09:06
@elvischenv elvischenv marked this pull request as ready for review October 7, 2025 09:12
@nvpohanh
Copy link
Contributor

nvpohanh commented Oct 9, 2025

@elvischenv Please fix the conflict. thanks!

@elvischenv elvischenv force-pushed the elvischenv/support-fp8-sinks-trtllm-attn branch from 9a53778 to 8bfb46f Compare October 9, 2025 07:31
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
@elvischenv elvischenv force-pushed the elvischenv/support-fp8-sinks-trtllm-attn branch from 8bfb46f to 868f3de Compare October 9, 2025 07:32
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 9, 2025
@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Oct 9, 2025
@mgoin mgoin merged commit 44f633d into vllm-project:main Oct 9, 2025
49 checks passed
@elvischenv elvischenv deleted the elvischenv/support-fp8-sinks-trtllm-attn branch October 10, 2025 01:33
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…on (vllm-project#25674)

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
…on (vllm-project#25674)

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…on (vllm-project#25674)

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…on (vllm-project#25674)

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…on (vllm-project#25674)

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…on (vllm-project#25674)

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…on (vllm-project#25674)

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants