[0.9.1][Perf] Use fused ops npu_top_k_top_p #1920

Pr0Wh1teGivee · 2025-07-22T02:31:28Z

What this PR does / why we need it?

Use fused ops torch_npu.npu_top_k_top_p(logits, p, k) when p and k are not None, otherwise fallback to the original one. The replacement will take place automatically when VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE=1 .

This patch are using npu_top_k_top_p which required torch_npu>=2.5.1.post1.dev20250619

This modification is backport from https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/patch/worker/patch_common/patch_sampler.py
PR：#1308

Does this PR introduce any user-facing change?

No

How was this patch tested?

ut & e2e

…huaweicloud.com/ascend/repos/pypi/torch-npu/ Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>

* br_eplb_into_v091: (29 commits) add eplb design doc merge update in eplb branch dynamic eplb [0.9.1][Perf] Use fused ops npu_top_k_top_p (vllm-project#1920) [0.9.1][PD][Perf] Avoid performing cpu all_reduce in disaggregated-prefill scenario. (vllm-project#1644) [0.9.1][BugFix] Fix bug in path_decorator when engine v0 (vllm-project#1919) [0.9.1][Perf] apply npu_moe_gating_top_k_softmax for moe (vllm-project#1902) [0.9.1][bugfix] W4A8 does not currently support apply_mlp_decode (vllm-project#1910) [0.9.1][CI] Pin vllm version to v0.9.1 to make mypy check passed (vllm-project#1904) [0.9.1][Dist][Bugfix] Fix mc2 process group to resolve self.cpu_group is None (vllm-project#1831) [0.9.1][Perf]Remove NZ of kv_b_proj in Deepseek MLA. (vllm-project#1872) [0.9.1][bugfix] V0.9.1 fix rope accruracy bug for deepseek model (vllm-project#1887) [0.9.1] Fix wheel glibc version incompatibility (vllm-project#1808) [BUGFIX][v0.9.1] repair moe error when set multistream. (vllm-project#1882) [BUGFIX][v0.9.1] ep_group is not equal to word_size in some cases. (vllm-project#1862) [BUGFIX][v0.9.1] fix enable_multistream_moe bug when DBO is enabled (… (vllm-project#1827) [0.9.1]optmize rope in qwen2 (vllm-project#1782) [BugFix] Fix flashcomm_v1 when engine v0 (vllm-project#1859) [BugFix] Fix decorator patch (vllm-project#1858) [0.9.1][Fix] Fix DeepSeek OOM issue in extreme `--gpu-memory-utilization` scenario (vllm-project#1829) ...

Pr0Wh1teGivee force-pushed the sampler_091 branch from 2bb06c5 to f946e11 Compare July 22, 2025 02:32

Pr0Wh1teGivee changed the title ~~use fused ops npu_top_k_top_p which is introduced in https://mirrors.…~~ [Perf] Use fused ops npu_top_k_top_p Jul 22, 2025

use fused ops npu_top_k_top_p which is introduced in https://mirrors.…

7966750

…huaweicloud.com/ascend/repos/pypi/torch-npu/ Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>

Pr0Wh1teGivee force-pushed the sampler_091 branch from f946e11 to 7966750 Compare July 22, 2025 02:36

github-actions bot added the module:tests label Jul 22, 2025

wangxiyuan changed the title ~~[Perf] Use fused ops npu_top_k_top_p~~ [0.9.1][Perf] Use fused ops npu_top_k_top_p Jul 22, 2025

ganyi1996ppo approved these changes Jul 23, 2025

View reviewed changes

ganyi1996ppo merged commit 734eb68 into vllm-project:v0.9.1-dev Jul 23, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[0.9.1][Perf] Use fused ops npu_top_k_top_p #1920

[0.9.1][Perf] Use fused ops npu_top_k_top_p #1920

Uh oh!

Pr0Wh1teGivee commented Jul 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[0.9.1][Perf] Use fused ops npu_top_k_top_p #1920

[0.9.1][Perf] Use fused ops npu_top_k_top_p #1920

Uh oh!

Conversation

Pr0Wh1teGivee commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pr0Wh1teGivee commented Jul 22, 2025 •

edited

Loading