-
-
Couldn't load subscription status.
- Fork 10.9k
[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled #27146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables the silu_mul_fp8_quant fusion pass to work even when the silu_and_mul custom operator is not enabled, by matching against the native PyTorch implementation. This is achieved by introducing a MatcherSiluAndMul utility that can trace either the custom op or the native implementation. The changes are well-structured and the tests have been updated to cover both scenarios. My review found a minor issue in the test suite where TestSiluMulNvfp4QuantModel is not correctly handled by the new test parameterization, which would cause test failures. I've provided a suggestion to fix this by adding appropriate skip conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! For tests, could you only generate relevant tests, and then skip based on support (right now it's a little bit mixed up)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Could you post some E2E perf and accuracy numbers? And would you be interested in adding dynamic quant support as a follow-up?
Do you know which model use
Sure. |
silu_mul is used by basically all models. fp8 quant is used by the -FP8 quantized models. For example you can use |
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
The error message is very strange. I don't have blackwell machine. Would you be able to help resolve the |
…nabled (vllm-project#27146) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
…nabled (vllm-project#27146) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
…nabled (vllm-project#27146) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
…nabled (vllm-project#27146) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…nabled (vllm-project#27146) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
It seems that dynamic quant has already been supported. What else I can do here? |
|
I only see static fp8 and nvfp4. Where do you see dynamic fp8 patterns? |
My mistake — I was looking at fusion.py. |
Purpose
Based on #24604, modified activation fusion pass to do op matching w/o needing to enable the custom op.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.