-
Notifications
You must be signed in to change notification settings - Fork 618
【0.11.0-dev】optimization of kimi-k2 in cann8.3 #4555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v0.11.0-dev
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -322,7 +322,9 @@ def apply( | |||||||||
| assert router_logits.shape[ | ||||||||||
| 1] == global_num_experts - global_redundant_expert_num, "Number of global experts mismatch (excluding redundancy)" | ||||||||||
|
|
||||||||||
| if global_num_experts == 256: | ||||||||||
| # NOTE: now npu_moe_gating_top_k can support `group_count=256` pattern, and `group_count=384` pattern in cann8.3 | ||||||||||
| if global_num_experts == 256 or (global_num_experts == 384 and | ||||||||||
| torch.version.cann.startswith("8.3")): | ||||||||||
|
Comment on lines
+326
to
+327
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's an inconsistency in how the model type is determined here. This file checks
Suggested change
|
||||||||||
| topk_weights, topk_ids, _ = torch_npu.npu_moe_gating_top_k( | ||||||||||
| router_logits, | ||||||||||
| k=top_k, # topk currently is 8 | ||||||||||
|
|
||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to identify
deepseek_v3_r1andkimimodels using magic numbers (256, 384), and the check for CANN version8.3, is duplicated across multiple files (experts_selector.py,torchair_fused_moe.py,torchair_w8a8_dynamic.py, andtorchair_w4a8_dynamic.py). This makes the code harder to maintain and increases the risk of inconsistencies when adding support for new models or CANN versions. Consider centralizing this logic into a helper function or a configuration object for better maintainability and readability.