[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled #27146

ZJY0516 · 2025-10-18T08:00:21Z

Purpose

Based on #24604, modified activation fusion pass to do op matching w/o needing to enable the custom op.

Test Plan

pytest -s tests/compile/test_silu_mul_quant_fusion.py

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

gemini-code-assist

Code Review

This pull request enables the silu_mul_fp8_quant fusion pass to work even when the silu_and_mul custom operator is not enabled, by matching against the native PyTorch implementation. This is achieved by introducing a MatcherSiluAndMul utility that can trace either the custom op or the native implementation. The changes are well-structured and the tests have been updated to cover both scenarios. My review found a minor issue in the test suite where TestSiluMulNvfp4QuantModel is not correctly handled by the new test parameterization, which would cause test failures. I've provided a suggestion to fix this by adding appropriate skip conditions.

tests/compile/test_silu_mul_quant_fusion.py

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ProExpertProg

Looks great! For tests, could you only generate relevant tests, and then skip based on support (right now it's a little bit mixed up)

vllm/compilation/matcher_utils.py

vllm/compilation/activation_quant_fusion.py

tests/compile/test_silu_mul_quant_fusion.py

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ProExpertProg

Great work! Could you post some E2E perf and accuracy numbers? And would you be interested in adding dynamic quant support as a follow-up?

tests/compile/test_silu_mul_quant_fusion.py

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 · 2025-10-20T16:08:24Z

Great work! Could you post some E2E perf and accuracy numbers?

Do you know which model use silu_mul and fp8 quant?

And would you be interested in adding dynamic quant support as a follow-up?

Sure.

ProExpertProg · 2025-10-20T22:07:45Z

Do you know which model use silu_mul and fp8 quant?

silu_mul is used by basically all models. fp8 quant is used by the -FP8 quantized models. For example you can use redhatai/meta-llama3.1-8b-instruct-fp8 or redhatai/meta-llama3.1-70B-instruct-fp8

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 · 2025-10-21T15:53:18Z

E           RuntimeError: No CUDA GPUs are available

The error message is very strange.

I don't have blackwell machine. Would you be able to help resolve the blackwell-fusion-tests? @ProExpertProg

…nabled (vllm-project#27146) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

…nabled (vllm-project#27146) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

…nabled (vllm-project#27146) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

…nabled (vllm-project#27146) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

ZJY0516 · 2025-10-29T06:29:00Z

And would you be interested in adding dynamic quant support as a follow-up?

It seems that dynamic quant has already been supported. What else I can do here?

ProExpertProg · 2025-10-29T14:13:14Z

I only see static fp8 and nvfp4. Where do you see dynamic fp8 patterns?

ZJY0516 · 2025-10-29T14:21:06Z

I only see static fp8 and nvfp4. Where do you see dynamic fp8 patterns?

My mistake — I was looking at fusion.py.

silu_mul_fp8_quant

9356596

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 requested review from ProExpertProg, youkaichao and zou3519 as code owners October 18, 2025 08:00

gemini-code-assist bot reviewed Oct 18, 2025

View reviewed changes

tests/compile/test_silu_mul_quant_fusion.py Show resolved Hide resolved

update

6e49f72

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ProExpertProg reviewed Oct 18, 2025

View reviewed changes

ZJY0516 added 2 commits October 19, 2025 14:19

update

6e83e11

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

0fb6ffe

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 requested a review from ProExpertProg October 19, 2025 06:22

ProExpertProg approved these changes Oct 20, 2025

View reviewed changes

tests/compile/test_silu_mul_quant_fusion.py Outdated Show resolved Hide resolved

tests/compile/test_silu_mul_quant_fusion.py Outdated Show resolved Hide resolved

ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 20, 2025

ZJY0516 added 2 commits October 21, 2025 00:02

update

41d4313

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

Merge branch 'main' into SiluMul_quant

9e3310e

ProExpertProg approved these changes Oct 20, 2025

View reviewed changes

ProExpertProg added this to the vllm==v0.12.0/torch==2.9.0 compilation improvements milestone Oct 20, 2025

ZJY0516 added 4 commits October 21, 2025 11:11

fix nvfp4 test

7162838

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

fix nvfp4 test

0e8c6ae

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

Merge branch 'main' into SiluMul_quant

1504944

Merge branch 'main' into SiluMul_quant

5073e1f

ProExpertProg approved these changes Oct 21, 2025

View reviewed changes

ProExpertProg merged commit ab3e800 into vllm-project:main Oct 22, 2025
49 checks passed

ZJY0516 deleted the SiluMul_quant branch October 22, 2025 15:21

Kay-Tian mentioned this pull request Oct 23, 2025

vLLM PR #27146 变更核心文件提醒 Kay-Tian/vllm#20

Closed

usberkeley pushed a commit to usberkeley/vllm that referenced this pull request Oct 23, 2025

[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops e…

3dc560a

…nabled (vllm-project#27146) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025

[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops e…

b4d03ff

…nabled (vllm-project#27146) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

Uh oh!

Uh oh!

[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled #27146

[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled #27146

Conversation

ZJY0516 commented Oct 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ZJY0516 commented Oct 20, 2025

Uh oh!

ProExpertProg commented Oct 20, 2025

Uh oh!

ZJY0516 commented Oct 21, 2025

Uh oh!

Uh oh!

ZJY0516 commented Oct 29, 2025

Uh oh!

ProExpertProg commented Oct 29, 2025

Uh oh!

ZJY0516 commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ZJY0516 commented Oct 18, 2025 •

edited by github-actions bot

Loading