Skip to content

Conversation

@Pr0Wh1teGivee
Copy link
Contributor

@Pr0Wh1teGivee Pr0Wh1teGivee commented Jul 22, 2025

What this PR does / why we need it?

Use fused ops torch_npu.npu_top_k_top_p(logits, p, k) when p and k are not None, otherwise fallback to the original one. The replacement will take place automatically when VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE=1 .

This patch are using npu_top_k_top_p which required torch_npu>=2.5.1.post1.dev20250619

This modification is backport from https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/patch/worker/patch_common/patch_sampler.py
PR:#1308

Does this PR introduce any user-facing change?

No

How was this patch tested?

ut & e2e

@Pr0Wh1teGivee Pr0Wh1teGivee changed the title use fused ops npu_top_k_top_p which is introduced in https://mirrors.… [Perf] Use fused ops npu_top_k_top_p Jul 22, 2025
@wangxiyuan wangxiyuan changed the title [Perf] Use fused ops npu_top_k_top_p [0.9.1][Perf] Use fused ops npu_top_k_top_p Jul 22, 2025
@ganyi1996ppo ganyi1996ppo merged commit 734eb68 into vllm-project:v0.9.1-dev Jul 23, 2025
17 checks passed
845473182 added a commit to 845473182/vllm-ascend that referenced this pull request Jul 23, 2025
* br_eplb_into_v091: (29 commits)
  add eplb design doc
  merge update in eplb branch
  dynamic eplb
  [0.9.1][Perf] Use fused ops npu_top_k_top_p (vllm-project#1920)
  [0.9.1][PD][Perf] Avoid performing cpu all_reduce in disaggregated-prefill scenario. (vllm-project#1644)
  [0.9.1][BugFix] Fix bug in path_decorator when engine v0 (vllm-project#1919)
  [0.9.1][Perf] apply npu_moe_gating_top_k_softmax for moe (vllm-project#1902)
  [0.9.1][bugfix] W4A8 does not currently support apply_mlp_decode (vllm-project#1910)
  [0.9.1][CI] Pin vllm version to v0.9.1 to make mypy check passed (vllm-project#1904)
  [0.9.1][Dist][Bugfix] Fix mc2 process group to resolve self.cpu_group is None (vllm-project#1831)
  [0.9.1][Perf]Remove NZ of kv_b_proj in Deepseek MLA. (vllm-project#1872)
  [0.9.1][bugfix] V0.9.1 fix rope accruracy bug for deepseek model (vllm-project#1887)
  [0.9.1] Fix wheel glibc version incompatibility (vllm-project#1808)
  [BUGFIX][v0.9.1] repair moe error when set multistream. (vllm-project#1882)
  [BUGFIX][v0.9.1] ep_group is not equal to word_size in some cases. (vllm-project#1862)
  [BUGFIX][v0.9.1] fix enable_multistream_moe bug when DBO is enabled (… (vllm-project#1827)
  [0.9.1]optmize rope in qwen2 (vllm-project#1782)
  [BugFix] Fix flashcomm_v1 when engine v0 (vllm-project#1859)
  [BugFix] Fix decorator patch (vllm-project#1858)
  [0.9.1][Fix] Fix DeepSeek OOM issue in extreme `--gpu-memory-utilization` scenario (vllm-project#1829)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants