-
Notifications
You must be signed in to change notification settings - Fork 9
[Qwen3][fusion]port qknorm+rope fusion #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a performance optimization by implementing a fused QK-norm and RoPE (Rotary Position Embedding) operation for the Qwen3-235B model. The fusion combines query/key normalization with rotary position embedding into a single operation, reducing computational overhead.
Key changes:
- Added environment variable
ATOM_ENABLE_QK_NORM_ROPE_FUSIONto toggle the fusion feature - Implemented
RotaryEmbeddingQKNormFusedclass that performs combined QK-norm and RoPE operations - Modified
Qwen3MoeAttentionto conditionally use the fused implementation
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| atom/utils/envs.py | Adds environment variable for enabling QK-norm+RoPE fusion |
| atom/models/qwen3_moe.py | Implements fused RoPE class and integrates it into the attention mechanism |
| atom/model_engine/arg_utils.py | Adds import for envs module (unused in diff) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
b414a35 to
374cc77
Compare
|
@valarLip Could you please help review this pr? |
Motivation
port qknorm+rope fusion for Qwen3-235B
co-work with pr ROCm/aiter#1590
Technical Details
Test Plan
Test Result
perf without this pr: 11290.90 tok/s
perf with this pr: 11805.87 tok/s
Accuracy result: