Skip to content

Conversation

@ttanzhiqiang
Copy link
Contributor

@ttanzhiqiang ttanzhiqiang commented Jun 22, 2025

What this PR does / why we need it?

The optimization solution for non-deepseek select_experts is to replace gating_topk_softmax with softmax+topk+to, which is optimized from 37us to 14us on bf16/fp16 of qwen3-235b
截屏2025-06-22 21 19 41
截屏2025-06-22 21 19 49

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: ttanzhiqiang <389825161@qq.com>
Signed-off-by: ttanzhiqiang <389825161@qq.com>
@codecov
Copy link

codecov bot commented Jun 22, 2025

Codecov Report

❌ Patch coverage is 30.76923% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.54%. Comparing base (c30ddb8) to head (9e41126).
⚠️ Report is 613 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/ops/fused_moe.py 25.00% 6 Missing ⚠️
vllm_ascend/ops/common_fused_moe.py 40.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1355       +/-   ##
===========================================
+ Coverage   27.39%   54.54%   +27.15%     
===========================================
  Files          56       80       +24     
  Lines        6191     9980     +3789     
===========================================
+ Hits         1696     5444     +3748     
- Misses       4495     4536       +41     
Flag Coverage Δ
unittests 54.54% <30.76%> (+27.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: ttanzhiqiang <389825161@qq.com>
@ttanzhiqiang
Copy link
Contributor Author

@wangxiyuan @ganyi1996ppo

@ApsarasX ApsarasX added the ready read for review label Jun 24, 2025
@github-actions github-actions bot added merge-conflicts and removed ready read for review labels Jun 28, 2025
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: ttanzhiqiang <389825161@qq.com>
@ApsarasX ApsarasX added the ready read for review label Jul 1, 2025
@github-actions github-actions bot added merge-conflicts and removed ready read for review labels Jul 6, 2025
@github-actions
Copy link

github-actions bot commented Jul 6, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link

github-actions bot commented Jul 7, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@ttanzhiqiang
Copy link
Contributor Author

@wangxiyuan @Yikun

@ttanzhiqiang
Copy link
Contributor Author

@ganyi1996ppo Please help review

@ganyi1996ppo
Copy link
Collaborator

Your screenshot seems only contains host time, can you paste the device time of this kernel too?

@ttanzhiqiang
Copy link
Contributor Author

Your screenshot seems only contains host time, can you paste the device time of this kernel too?
The optimization solution for non-deepseek select_experts is to replace gating_topk_softmax with softmax+topk+to, which is optimized from 78us to 28us on bf16/fp16 of qwen3-235b

截屏2025-07-09 12 13 40 截屏2025-07-09 12 13 56 Both prefill and decode can be used

# value to False to disable the optimized model.
"USE_OPTIMIZED_MODEL":
lambda: bool(int(os.getenv('USE_OPTIMIZED_MODEL', '1'))),
"SELECT_GATING_TOPK_SOTFMAX_EXPERTS":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont' think we should add more config. Instead, how about check the model type to decide which function will be called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select_gating_top_k_softmax_experts is theoretically better than select_experts in non-quantization

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discuss offline, we'll remove this env once the function is stable enough

Signed-off-by: ttanzhiqiang <389825161@qq.com>
@ttanzhiqiang ttanzhiqiang force-pushed the gating_topk_softmax branch from 2314772 to 77d1b16 Compare July 10, 2025 13:43
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

topk_weights, topk_ids, row_idx = torch_npu.npu_moe_gating_top_k_softmax(
router_logits, None, k=top_k)

# # Required by npu_moe_init_routing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the comment code

# value to False to disable the optimized model.
"USE_OPTIMIZED_MODEL":
lambda: bool(int(os.getenv('USE_OPTIMIZED_MODEL', '1'))),
"SELECT_GATING_TOPK_SOTFMAX_EXPERTS":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discuss offline, we'll remove this env once the function is stable enough

@wangxiyuan wangxiyuan merged commit ee40d3d into vllm-project:main Jul 11, 2025
22 checks passed
jianzs pushed a commit that referenced this pull request Jul 31, 2025
…LECT_GATING_TOPK_SOTFMAX_EXPERTS (#2112)

backport of v0.9.1-dev:
#1902

origin main npu_moe_gating_top_k_softmax:
#1355

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@055bd39

Signed-off-by: huangxialu <huangxialu1@huawei.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
…LECT_GATING_TOPK_SOTFMAX_EXPERTS (vllm-project#2112)

backport of v0.9.1-dev:
vllm-project#1902

origin main npu_moe_gating_top_k_softmax:
vllm-project#1355

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@055bd39

Signed-off-by: huangxialu <huangxialu1@huawei.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
### What this PR does / why we need it?
The optimization solution for non-deepseek select_experts is to replace
gating_topk_softmax with softmax+topk+to, which is optimized from 37us
to 14us on bf16/fp16 of qwen3-235b

- vLLM version: v0.9.2
- vLLM main:
vllm-project/vllm@1a4f35e

---------

Signed-off-by: ttanzhiqiang <389825161@qq.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?
The optimization solution for non-deepseek select_experts is to replace
gating_topk_softmax with softmax+topk+to, which is optimized from 37us
to 14us on bf16/fp16 of qwen3-235b

- vLLM version: v0.9.2
- vLLM main:
vllm-project/vllm@1a4f35e

---------

Signed-off-by: ttanzhiqiang <389825161@qq.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…LECT_GATING_TOPK_SOTFMAX_EXPERTS (vllm-project#2112)

backport of v0.9.1-dev:
vllm-project#1902

origin main npu_moe_gating_top_k_softmax:
vllm-project#1355

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@055bd39

Signed-off-by: huangxialu <huangxialu1@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants