-
Couldn't load subscription status.
- Fork 49
[Feat][aiter][ROCm] Add aiter rmsnorm and quant fusion #735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat][aiter][ROCm] Add aiter rmsnorm and quant fusion #735
Conversation
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
| @VllmInductorPass.time_and_log | ||
| def __call__(self, graph: fx.Graph): | ||
| self.matched_count = self.patterns.apply(graph) | ||
| print("Matched count:", self.matched_count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kliuae-amd can you remove this line of print?
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
|
LTGM. |
|
Add There are issues on mi300X when using newer AITER, KF could only test this Qwen3-Coder-PTPC-FP8 model and saw improvements.
|
|
thanks for the PR, but we should start to move all new development in upstream first.. |
|
@sunway513 the corresponding PR for upstream is here vllm-project#26575 Yes, for each feature, we will directly PR to upstream. Considering it's too long progress of upstream PR to be merged, we port the upstream PR back to rocm/vllm and combine the other upstream PRs together for performance verification. |
makes sense. we're in progress of moving such usage to upstream amd_dev branch: |
Got it. We will follow. |
Purpose
This PR adds aiter's rmsnorm and fp8 quant fusion kernel, invoked in the rmsnorm+quant_fp8 custom fusion pass.
To use this feature, enable aiter with
VLLM_ROCM_USE_AITER=1and set--compilation-config '{"pass_config": {"enable_fusion": true, "enable_noop": true, "enable-attn-fusion": false}, "custom_ops": ["+rms_norm", "+quant_fp8"]to enable the fusion pass.Test Plan
End-to-end test using RedHatAI/Qwen3-14B-FP8-dynamic model
Server command:
lm_eval command:
Benchmark command:
Test Result
lm_eval
w/o fusion
w/ fusion
Serving benchmark
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.