use npu_moe_gating_top_k_softmax #1355

ttanzhiqiang · 2025-06-22T14:18:59Z

What this PR does / why we need it?

The optimization solution for non-deepseek select_experts is to replace gating_topk_softmax with softmax+topk+to, which is optimized from 37us to 14us on bf16/fp16 of qwen3-235b

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.9.2
vLLM main: vllm-project/vllm@1a4f35e

Signed-off-by: ttanzhiqiang <389825161@qq.com>

codecov · 2025-06-22T14:38:50Z

Codecov Report

❌ Patch coverage is 30.76923% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.54%. Comparing base (c30ddb8) to head (9e41126).
⚠️ Report is 613 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/ops/fused_moe.py	25.00%	6 Missing ⚠️
vllm_ascend/ops/common_fused_moe.py	40.00%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1355       +/-   ##
===========================================
+ Coverage   27.39%   54.54%   +27.15%     
===========================================
  Files          56       80       +24     
  Lines        6191     9980     +3789     
===========================================
+ Hits         1696     5444     +3748     
- Misses       4495     4536       +41

Flag	Coverage Δ
unittests	`54.54% <30.76%> (+27.15%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

vllm_ascend/ops/common_fused_moe.py

Signed-off-by: ttanzhiqiang <389825161@qq.com>

ttanzhiqiang · 2025-06-24T03:23:22Z

@wangxiyuan @ganyi1996ppo

github-actions · 2025-06-28T08:23:17Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: ttanzhiqiang <389825161@qq.com>

github-actions · 2025-07-06T07:31:48Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-07-07T14:40:18Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

ttanzhiqiang · 2025-07-08T06:14:14Z

@wangxiyuan @Yikun

ttanzhiqiang · 2025-07-08T12:15:46Z

@ganyi1996ppo Please help review

ganyi1996ppo · 2025-07-08T13:37:04Z

Your screenshot seems only contains host time, can you paste the device time of this kernel too?

ttanzhiqiang · 2025-07-09T04:17:04Z

Your screenshot seems only contains host time, can you paste the device time of this kernel too?
The optimization solution for non-deepseek select_experts is to replace gating_topk_softmax with softmax+topk+to, which is optimized from 78us to 28us on bf16/fp16 of qwen3-235b

Both prefill and decode can be used

wangxiyuan · 2025-07-09T06:33:16Z

vllm_ascend/envs.py

    # value to False to disable the optimized model.
    "USE_OPTIMIZED_MODEL":
    lambda: bool(int(os.getenv('USE_OPTIMIZED_MODEL', '1'))),
+    "SELECT_GATING_TOPK_SOTFMAX_EXPERTS":


I dont' think we should add more config. Instead, how about check the model type to decide which function will be called?

select_gating_top_k_softmax_experts is theoretically better than select_experts in non-quantization

As discuss offline, we'll remove this env once the function is stable enough

Signed-off-by: ttanzhiqiang <389825161@qq.com>

github-actions · 2025-07-10T13:43:57Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan · 2025-07-10T13:24:32Z

vllm_ascend/ops/fused_moe.py

+    topk_weights, topk_ids, row_idx = torch_npu.npu_moe_gating_top_k_softmax(
+        router_logits, None, k=top_k)
+
+    # # Required by npu_moe_init_routing


remove the comment code

wangxiyuan · 2025-07-11T00:54:35Z

vllm_ascend/envs.py

    # value to False to disable the optimized model.
    "USE_OPTIMIZED_MODEL":
    lambda: bool(int(os.getenv('USE_OPTIMIZED_MODEL', '1'))),
+    "SELECT_GATING_TOPK_SOTFMAX_EXPERTS":


As discuss offline, we'll remove this env once the function is stable enough

…LECT_GATING_TOPK_SOTFMAX_EXPERTS (#2112) backport of v0.9.1-dev: #1902 origin main npu_moe_gating_top_k_softmax: #1355 - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@055bd39 Signed-off-by: huangxialu <huangxialu1@huawei.com>

…LECT_GATING_TOPK_SOTFMAX_EXPERTS (vllm-project#2112) backport of v0.9.1-dev: vllm-project#1902 origin main npu_moe_gating_top_k_softmax: vllm-project#1355 - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@055bd39 Signed-off-by: huangxialu <huangxialu1@huawei.com>

### What this PR does / why we need it? The optimization solution for non-deepseek select_experts is to replace gating_topk_softmax with softmax+topk+to, which is optimized from 37us to 14us on bf16/fp16 of qwen3-235b - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@1a4f35e --------- Signed-off-by: ttanzhiqiang <389825161@qq.com>

…LECT_GATING_TOPK_SOTFMAX_EXPERTS (vllm-project#2112) backport of v0.9.1-dev: vllm-project#1902 origin main npu_moe_gating_top_k_softmax: vllm-project#1355 - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@055bd39 Signed-off-by: huangxialu <huangxialu1@huawei.com>

use npu_moe_gating_top_k_softmax

c2d456b

Signed-off-by: ttanzhiqiang <389825161@qq.com>

github-actions bot added module:tests module:ops module:core labels Jun 22, 2025

update

bff33ab

Signed-off-by: ttanzhiqiang <389825161@qq.com>

ApsarasX reviewed Jun 22, 2025

View reviewed changes

vllm_ascend/ops/common_fused_moe.py Show resolved Hide resolved

update

86965fb

Signed-off-by: ttanzhiqiang <389825161@qq.com>

ApsarasX added the ready read for review label Jun 24, 2025

github-actions bot added merge-conflicts and removed ready read for review labels Jun 28, 2025

Merge branch 'main' into gating_topk_softmax

183ccde

github-actions bot removed the merge-conflicts label Jun 28, 2025

update

7fa72ca

Signed-off-by: ttanzhiqiang <389825161@qq.com>

ApsarasX added the ready read for review label Jul 1, 2025

github-actions bot added merge-conflicts and removed ready read for review labels Jul 6, 2025

github-actions bot removed the merge-conflicts label Jul 7, 2025

github-actions bot added merge-conflicts and removed merge-conflicts labels Jul 7, 2025

ganyi1996ppo approved these changes Jul 9, 2025

View reviewed changes

wangxiyuan reviewed Jul 9, 2025

View reviewed changes

update note

77d1b16

Signed-off-by: ttanzhiqiang <389825161@qq.com>

ttanzhiqiang force-pushed the gating_topk_softmax branch from 2314772 to 77d1b16 Compare July 10, 2025 13:43

github-actions bot added the merge-conflicts label Jul 10, 2025

Merge branch 'main' into gating_topk_softmax

9e41126

github-actions bot removed the merge-conflicts label Jul 10, 2025

wangxiyuan approved these changes Jul 11, 2025

View reviewed changes

wangxiyuan merged commit ee40d3d into vllm-project:main Jul 11, 2025
22 checks passed

loukong33 mentioned this pull request Jul 30, 2025

[main] adapt usage of npu_moe_gating_top_k_softmax and remove envs.SELECT_GATING_TOPK_SOTFMAX_EXPERTS #2112

Merged

use npu_moe_gating_top_k_softmax #1355

use npu_moe_gating_top_k_softmax #1355

Uh oh!

Conversation

ttanzhiqiang commented Jun 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

codecov bot commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

ttanzhiqiang commented Jun 24, 2025

Uh oh!

github-actions bot commented Jun 28, 2025

Uh oh!

github-actions bot commented Jul 6, 2025

Uh oh!

github-actions bot commented Jul 7, 2025

Uh oh!

ttanzhiqiang commented Jul 8, 2025

Uh oh!

ttanzhiqiang commented Jul 8, 2025

Uh oh!

ganyi1996ppo commented Jul 8, 2025

Uh oh!

ttanzhiqiang commented Jul 9, 2025

Uh oh!

wangxiyuan Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

ttanzhiqiang Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 10, 2025

Uh oh!

wangxiyuan Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ttanzhiqiang commented Jun 22, 2025 •

edited by github-actions bot

Loading

codecov bot commented Jun 22, 2025 •

edited

Loading