use combo kernel to fuse qk-norm and qk-rope #26682

BoyuanFeng · 2025-10-13T05:10:56Z

This PR enables horizontal fusion from inductor. This is helpful for fusing q-norm & k-norm into 1 kernel; and q-rope & k-rope into 1 kernel. It was NOT fused before since q and k have different shapes, which prevent some optimizations to happen.

The following trace comes from qwen3-0.6b.

Before:

After (together w/ #26680):

After qkv_proj, we reduces from 5 kernels to 2 kernels: 1 for qk-norm and 1 for qk-rope.

Performance

Applying this PR + #26680

Qwen/Qwen3-0.6B

Before:

After:

Signed-off-by: Boyuan Feng <boyuan@meta.com>

gemini-code-assist

Code Review

This pull request introduces a new configuration option, use_horizontal_fusion, to enable horizontal fusion for qk-norm and qk-rope operations in PyTorch Inductor. This is achieved by setting combo_kernels and benchmark_combo_kernel in the Inductor configuration when the feature is enabled and the PyTorch version is 2.9.0.dev or newer. My main feedback is regarding the default value of the new flag. For stability, it would be safer to disable this experimental feature by default and allow users to opt-in.

gemini-code-assist · 2025-10-13T05:11:45Z

vllm/config/compilation.py

    since we know all keys are in a range [0, max_capture_size],
    we can optimize it to list[int] for better lookup performance."""

+    use_horizontal_fusion = True


The use_horizontal_fusion flag is enabled by default. This will automatically enable the combo_kernels feature in PyTorch Inductor for users on versions 2.9.0.dev or newer. Since this relies on a feature in a development version of PyTorch, it may be unstable. It would be safer to set this to False by default to prevent potential issues for users on bleeding-edge PyTorch versions. Users can then explicitly opt-in to enable this experimental feature.

Suggested change

use_horizontal_fusion = True

use_horizontal_fusion = False

We want to enable by default since it benefits models in general.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

vllm/config/compilation.py

Signed-off-by: Boyuan Feng <boyuan@meta.com>

zou3519 · 2025-10-13T20:10:34Z

Main thing I think around here is testing, do we have a plan around that or are we yolo-ing this? @BoyuanFeng @ProExpertProg

mgoin · 2025-10-14T00:52:22Z

vllm/config/compilation.py

+            # use horizontal fusion, which is useful for fusing qk-norm and
+            # qk-rope when query and key have different shapes.
+            self.inductor_compile_config["combo_kernels"] = True
+            self.inductor_compile_config["benchmark_combo_kernel"] = True


I would appreciate a doc pointer to how this works so I can understand for future work. Currently this is very opaque

we have some doc here. I will followup with more pytorch docs.

BoyuanFeng · 2025-10-14T00:57:41Z

@zou3519 tested for facebook/opt-125m, qwen/qwen3-0.6b, google/gemma-3-4b-it, openai/gpt-oss-20b, and the outputs are correct: https://paste.sh/llBbgv2x#GHNmWaQcfkKA_6ypMb_UGkgo

ProExpertProg

Yolo here seems acceptable we can add tests in the future

Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: bbartels <benjamin@bartels.dev>

Signed-off-by: Boyuan Feng <boyuan@meta.com>

Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

use combo kernel

506a52f

Signed-off-by: Boyuan Feng <boyuan@meta.com>

BoyuanFeng requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners October 13, 2025 05:10

gemini-code-assist bot reviewed Oct 13, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 13, 2025

View reviewed changes

vllm/config/compilation.py Outdated Show resolved Hide resolved

respect user inductor config

00f4fca

Signed-off-by: Boyuan Feng <boyuan@meta.com>

BoyuanFeng mentioned this pull request Oct 13, 2025

remove attn output view kernel #26680

Merged

zou3519 approved these changes Oct 13, 2025

View reviewed changes

mlazos approved these changes Oct 14, 2025

View reviewed changes

mgoin reviewed Oct 14, 2025

View reviewed changes

ProExpertProg approved these changes Oct 14, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 14, 2025

mgoin approved these changes Oct 14, 2025

View reviewed changes

zou3519 merged commit ca683a2 into vllm-project:main Oct 14, 2025
48 checks passed

Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025

use combo kernel to fuse qk-norm and qk-rope (vllm-project#26682)

3fc4df5

Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025

use combo kernel to fuse qk-norm and qk-rope (vllm-project#26682)

4643052

Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: bbartels <benjamin@bartels.dev>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

use combo kernel to fuse qk-norm and qk-rope (vllm-project#26682)

be7634c

Signed-off-by: Boyuan Feng <boyuan@meta.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

use combo kernel to fuse qk-norm and qk-rope (vllm-project#26682)

3efc084

Signed-off-by: Boyuan Feng <boyuan@meta.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

use combo kernel to fuse qk-norm and qk-rope (vllm-project#26682)

c6d4dc9

Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

use combo kernel to fuse qk-norm and qk-rope (vllm-project#26682)

0dcbb1e

Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

use combo kernel to fuse qk-norm and qk-rope (vllm-project#26682)

8d549e4

Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

use combo kernel to fuse qk-norm and qk-rope (vllm-project#26682)

ccdd134

Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

ProExpertProg mentioned this pull request Oct 28, 2025

[Feature]: Optimize RoPE #22293

Closed

1 task

ProExpertProg linked an issue Oct 28, 2025 that may be closed by this pull request

[Feature]: Optimize RoPE #22293

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

use combo kernel to fuse qk-norm and qk-rope #26682

use combo kernel to fuse qk-norm and qk-rope #26682

Uh oh!

BoyuanFeng commented Oct 13, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 13, 2025

Uh oh!

BoyuanFeng Oct 13, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

zou3519 commented Oct 13, 2025

Uh oh!

mgoin Oct 14, 2025

Uh oh!

BoyuanFeng Oct 14, 2025

Uh oh!

BoyuanFeng commented Oct 14, 2025

Uh oh!

ProExpertProg left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

use combo kernel to fuse qk-norm and qk-rope #26682

use combo kernel to fuse qk-norm and qk-rope #26682

Uh oh!

Conversation

BoyuanFeng commented Oct 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

BoyuanFeng Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

zou3519 commented Oct 13, 2025

Uh oh!

mgoin Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

BoyuanFeng Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

BoyuanFeng commented Oct 14, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

BoyuanFeng commented Oct 13, 2025 •

edited by github-actions bot

Loading