[0.9.1][Perf] apply npu_moe_gating_top_k_softmax for moe #1902

loukong33 · 2025-07-21T05:46:28Z

What this PR does / why we need it?

Apply torch_npu.npu_moe_gating_top_k_softmax for moe, when scoring_func is softmax.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

ut.

performance:
base：
Successful requests: 200
Benchmark duration (s): 367.80
Total input tokens: 409600
Total generated tokens: 409600
Request throughput (req/s): 0.54
Output token throughput (tok/s): 1113.64
Total Token throughput (tok/s): 2227.28

apply torch_npu.npu_moe_gating_top_k_softmax：
Successful requests: 200
Benchmark duration (s): 362.59
Total input tokens: 409600
Total generated tokens: 409600
Request throughput (req/s): 0.55
Output token throughput (tok/s): 1129.64
Total Token throughput (tok/s): 2259.29

Signed-off-by: huangxialu <huangxialu1@huawei.com>

* br_eplb_into_v091: (29 commits) add eplb design doc merge update in eplb branch dynamic eplb [0.9.1][Perf] Use fused ops npu_top_k_top_p (vllm-project#1920) [0.9.1][PD][Perf] Avoid performing cpu all_reduce in disaggregated-prefill scenario. (vllm-project#1644) [0.9.1][BugFix] Fix bug in path_decorator when engine v0 (vllm-project#1919) [0.9.1][Perf] apply npu_moe_gating_top_k_softmax for moe (vllm-project#1902) [0.9.1][bugfix] W4A8 does not currently support apply_mlp_decode (vllm-project#1910) [0.9.1][CI] Pin vllm version to v0.9.1 to make mypy check passed (vllm-project#1904) [0.9.1][Dist][Bugfix] Fix mc2 process group to resolve self.cpu_group is None (vllm-project#1831) [0.9.1][Perf]Remove NZ of kv_b_proj in Deepseek MLA. (vllm-project#1872) [0.9.1][bugfix] V0.9.1 fix rope accruracy bug for deepseek model (vllm-project#1887) [0.9.1] Fix wheel glibc version incompatibility (vllm-project#1808) [BUGFIX][v0.9.1] repair moe error when set multistream. (vllm-project#1882) [BUGFIX][v0.9.1] ep_group is not equal to word_size in some cases. (vllm-project#1862) [BUGFIX][v0.9.1] fix enable_multistream_moe bug when DBO is enabled (… (vllm-project#1827) [0.9.1]optmize rope in qwen2 (vllm-project#1782) [BugFix] Fix flashcomm_v1 when engine v0 (vllm-project#1859) [BugFix] Fix decorator patch (vllm-project#1858) [0.9.1][Fix] Fix DeepSeek OOM issue in extreme `--gpu-memory-utilization` scenario (vllm-project#1829) ...

…LECT_GATING_TOPK_SOTFMAX_EXPERTS (#2112) backport of v0.9.1-dev: #1902 origin main npu_moe_gating_top_k_softmax: #1355 - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@055bd39 Signed-off-by: huangxialu <huangxialu1@huawei.com>

…LECT_GATING_TOPK_SOTFMAX_EXPERTS (vllm-project#2112) backport of v0.9.1-dev: vllm-project#1902 origin main npu_moe_gating_top_k_softmax: vllm-project#1355 - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@055bd39 Signed-off-by: huangxialu <huangxialu1@huawei.com>

github-actions bot added the module:ops label Jul 21, 2025

loukong33 force-pushed the topk_softmax branch 4 times, most recently from b05b8fb to f7daac9 Compare July 21, 2025 10:38

github-actions bot added the module:tests label Jul 21, 2025

loukong33 force-pushed the topk_softmax branch 8 times, most recently from 2a5dcb7 to 562a001 Compare July 21, 2025 13:34

loukong33 closed this Jul 21, 2025

loukong33 reopened this Jul 21, 2025

loukong33 force-pushed the topk_softmax branch 12 times, most recently from aeb4f15 to 39b4c9f Compare July 22, 2025 08:28

loukong33 closed this Jul 22, 2025

loukong33 reopened this Jul 22, 2025

apply npu_moe_gating_top_k_softmax

29288f1

Signed-off-by: huangxialu <huangxialu1@huawei.com>

loukong33 force-pushed the topk_softmax branch from 39b4c9f to 29288f1 Compare July 22, 2025 09:02

loukong33 changed the title ~~[0.9.1]apply npu_moe_gating_top_k_softmax for moe~~ [0.9.1][Perf]apply npu_moe_gating_top_k_softmax for moe Jul 22, 2025

loukong33 changed the title ~~[0.9.1][Perf]apply npu_moe_gating_top_k_softmax for moe~~ [0.9.1][Perf] apply npu_moe_gating_top_k_softmax for moe Jul 22, 2025

ganyi1996ppo approved these changes Jul 22, 2025

View reviewed changes

ganyi1996ppo merged commit b73c701 into vllm-project:v0.9.1-dev Jul 22, 2025
17 checks passed

wangxiyuan added the no-main label Jul 23, 2025

loukong33 mentioned this pull request Jul 30, 2025

[main] adapt usage of npu_moe_gating_top_k_softmax and remove envs.SELECT_GATING_TOPK_SOTFMAX_EXPERTS #2112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[0.9.1][Perf] apply npu_moe_gating_top_k_softmax for moe #1902

[0.9.1][Perf] apply npu_moe_gating_top_k_softmax for moe #1902

Uh oh!

loukong33 commented Jul 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[0.9.1][Perf] apply npu_moe_gating_top_k_softmax for moe #1902

[0.9.1][Perf] apply npu_moe_gating_top_k_softmax for moe #1902

Uh oh!

Conversation

loukong33 commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

loukong33 commented Jul 21, 2025 •

edited

Loading