[0.9.1][Perf]Remove NZ of kv_b_proj in Deepseek MLA. #1872

whx-sjtu · 2025-07-18T06:46:53Z

This PR removes NZ transformation of weights of kv_b_proj. This is because we find that this matmul weight is not quantized and will fall back to ND calculation in runtime (because currently float bmm nz is not supported in torchair graph), which causes two redundant transData operations (trans weight from NZ back to ND). Removing these two operations will provide an optimization of about 40us per layer.

Signed-off-by: whx-sjtu <2952154980@qq.com>

ttanzhiqiang · 2025-07-18T09:58:56Z

#1131 This PR does this

This PR removes NZ transformation of weights of kv_b_proj. This is because we find that this matmul weight is not quantized and will fall back to ND calculation in runtime (because currently float bmm nz is not supported in torchair graph), which causes two redundant transData operations (trans weight from NZ back to ND). Removing these two operations will provide an optimization of about 40us per layer. Signed-off-by: whx-sjtu <2952154980@qq.com>

* br_eplb_into_v091: (29 commits) add eplb design doc merge update in eplb branch dynamic eplb [0.9.1][Perf] Use fused ops npu_top_k_top_p (vllm-project#1920) [0.9.1][PD][Perf] Avoid performing cpu all_reduce in disaggregated-prefill scenario. (vllm-project#1644) [0.9.1][BugFix] Fix bug in path_decorator when engine v0 (vllm-project#1919) [0.9.1][Perf] apply npu_moe_gating_top_k_softmax for moe (vllm-project#1902) [0.9.1][bugfix] W4A8 does not currently support apply_mlp_decode (vllm-project#1910) [0.9.1][CI] Pin vllm version to v0.9.1 to make mypy check passed (vllm-project#1904) [0.9.1][Dist][Bugfix] Fix mc2 process group to resolve self.cpu_group is None (vllm-project#1831) [0.9.1][Perf]Remove NZ of kv_b_proj in Deepseek MLA. (vllm-project#1872) [0.9.1][bugfix] V0.9.1 fix rope accruracy bug for deepseek model (vllm-project#1887) [0.9.1] Fix wheel glibc version incompatibility (vllm-project#1808) [BUGFIX][v0.9.1] repair moe error when set multistream. (vllm-project#1882) [BUGFIX][v0.9.1] ep_group is not equal to word_size in some cases. (vllm-project#1862) [BUGFIX][v0.9.1] fix enable_multistream_moe bug when DBO is enabled (… (vllm-project#1827) [0.9.1]optmize rope in qwen2 (vllm-project#1782) [BugFix] Fix flashcomm_v1 when engine v0 (vllm-project#1859) [BugFix] Fix decorator patch (vllm-project#1858) [0.9.1][Fix] Fix DeepSeek OOM issue in extreme `--gpu-memory-utilization` scenario (vllm-project#1829) ...

remove NZ of kv_b_proj

8aabff7

Signed-off-by: whx-sjtu <2952154980@qq.com>

whx-sjtu force-pushed the nz_opt_091 branch from ae97262 to 8aabff7 Compare July 18, 2025 06:52

ganyi1996ppo merged commit 5be1d8c into vllm-project:v0.9.1-dev Jul 19, 2025
16 checks passed

wangxiyuan added the no-main label Jul 21, 2025

whx-sjtu deleted the nz_opt_091 branch October 20, 2025 11:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[0.9.1][Perf]Remove NZ of kv_b_proj in Deepseek MLA. #1872

[0.9.1][Perf]Remove NZ of kv_b_proj in Deepseek MLA. #1872

Uh oh!

whx-sjtu commented Jul 18, 2025 •

edited

Loading

Uh oh!

ttanzhiqiang commented Jul 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[0.9.1][Perf]Remove NZ of kv_b_proj in Deepseek MLA. #1872

[0.9.1][Perf]Remove NZ of kv_b_proj in Deepseek MLA. #1872

Uh oh!

Conversation

whx-sjtu commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ttanzhiqiang commented Jul 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

whx-sjtu commented Jul 18, 2025 •

edited

Loading