Skip to content

Conversation

@loukong33
Copy link
Contributor

@loukong33 loukong33 commented Jul 21, 2025

What this PR does / why we need it?

Apply torch_npu.npu_moe_gating_top_k_softmax for moe, when scoring_func is softmax.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

ut.

performance:
base
Successful requests: 200
Benchmark duration (s): 367.80
Total input tokens: 409600
Total generated tokens: 409600
Request throughput (req/s): 0.54
Output token throughput (tok/s): 1113.64
Total Token throughput (tok/s): 2227.28

apply torch_npu.npu_moe_gating_top_k_softmax
Successful requests: 200
Benchmark duration (s): 362.59
Total input tokens: 409600
Total generated tokens: 409600
Request throughput (req/s): 0.55
Output token throughput (tok/s): 1129.64
Total Token throughput (tok/s): 2259.29

@loukong33 loukong33 force-pushed the topk_softmax branch 4 times, most recently from b05b8fb to f7daac9 Compare July 21, 2025 10:38
@loukong33 loukong33 force-pushed the topk_softmax branch 8 times, most recently from 2a5dcb7 to 562a001 Compare July 21, 2025 13:34
@loukong33 loukong33 closed this Jul 21, 2025
@loukong33 loukong33 reopened this Jul 21, 2025
@loukong33 loukong33 force-pushed the topk_softmax branch 12 times, most recently from aeb4f15 to 39b4c9f Compare July 22, 2025 08:28
@loukong33 loukong33 closed this Jul 22, 2025
@loukong33 loukong33 reopened this Jul 22, 2025
Signed-off-by: huangxialu <huangxialu1@huawei.com>
@loukong33 loukong33 changed the title [0.9.1]apply npu_moe_gating_top_k_softmax for moe [0.9.1][Perf]apply npu_moe_gating_top_k_softmax for moe Jul 22, 2025
@loukong33 loukong33 changed the title [0.9.1][Perf]apply npu_moe_gating_top_k_softmax for moe [0.9.1][Perf] apply npu_moe_gating_top_k_softmax for moe Jul 22, 2025
@ganyi1996ppo ganyi1996ppo merged commit b73c701 into vllm-project:v0.9.1-dev Jul 22, 2025
17 checks passed
845473182 added a commit to 845473182/vllm-ascend that referenced this pull request Jul 23, 2025
* br_eplb_into_v091: (29 commits)
  add eplb design doc
  merge update in eplb branch
  dynamic eplb
  [0.9.1][Perf] Use fused ops npu_top_k_top_p (vllm-project#1920)
  [0.9.1][PD][Perf] Avoid performing cpu all_reduce in disaggregated-prefill scenario. (vllm-project#1644)
  [0.9.1][BugFix] Fix bug in path_decorator when engine v0 (vllm-project#1919)
  [0.9.1][Perf] apply npu_moe_gating_top_k_softmax for moe (vllm-project#1902)
  [0.9.1][bugfix] W4A8 does not currently support apply_mlp_decode (vllm-project#1910)
  [0.9.1][CI] Pin vllm version to v0.9.1 to make mypy check passed (vllm-project#1904)
  [0.9.1][Dist][Bugfix] Fix mc2 process group to resolve self.cpu_group is None (vllm-project#1831)
  [0.9.1][Perf]Remove NZ of kv_b_proj in Deepseek MLA. (vllm-project#1872)
  [0.9.1][bugfix] V0.9.1 fix rope accruracy bug for deepseek model (vllm-project#1887)
  [0.9.1] Fix wheel glibc version incompatibility (vllm-project#1808)
  [BUGFIX][v0.9.1] repair moe error when set multistream. (vllm-project#1882)
  [BUGFIX][v0.9.1] ep_group is not equal to word_size in some cases. (vllm-project#1862)
  [BUGFIX][v0.9.1] fix enable_multistream_moe bug when DBO is enabled (… (vllm-project#1827)
  [0.9.1]optmize rope in qwen2 (vllm-project#1782)
  [BugFix] Fix flashcomm_v1 when engine v0 (vllm-project#1859)
  [BugFix] Fix decorator patch (vllm-project#1858)
  [0.9.1][Fix] Fix DeepSeek OOM issue in extreme `--gpu-memory-utilization` scenario (vllm-project#1829)
  ...
jianzs pushed a commit that referenced this pull request Jul 31, 2025
…LECT_GATING_TOPK_SOTFMAX_EXPERTS (#2112)

backport of v0.9.1-dev:
#1902

origin main npu_moe_gating_top_k_softmax:
#1355

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@055bd39

Signed-off-by: huangxialu <huangxialu1@huawei.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
…LECT_GATING_TOPK_SOTFMAX_EXPERTS (vllm-project#2112)

backport of v0.9.1-dev:
vllm-project#1902

origin main npu_moe_gating_top_k_softmax:
vllm-project#1355

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@055bd39

Signed-off-by: huangxialu <huangxialu1@huawei.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…LECT_GATING_TOPK_SOTFMAX_EXPERTS (vllm-project#2112)

backport of v0.9.1-dev:
vllm-project#1902

origin main npu_moe_gating_top_k_softmax:
vllm-project#1355

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@055bd39

Signed-off-by: huangxialu <huangxialu1@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants