[perf] optimize apply_penalties & topKtopP for V0&V1 Engine #1107

linfeng-yuan · 2025-06-06T13:41:10Z

What this PR does / why we need it?

Same as pull/525, this PR optimizes apply_penalties & topKtopP implementation in both V0/V1 Engine by avoiding using torch.scatter and matrix indexing operations.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

This patch was tested with vllm v0.9.0, torch-2.5.1 & torch_npu-2.5.1 (both torch_npu in PyPI and newest internal beta version). At a concurrency of 58 and with post-processing parameters set to "temperature": 0.2, "top_k": 1000, "top_p": 0.92, the average sampling time in each decoding stage was reduced from 90ms to 8ms.

Signed-off-by: linfeng-yuan <1102311262@qq.com>

MengqingCao · 2025-06-06T14:20:28Z

vllm_ascend/sample/ops/ascend_topk_topp_sampler.py

+from vllm.v1.sample.ops.topk_topp_sampler import TopKTopPSampler, random_sample
+
+
+class AscendTopKTopPSampler(TopKTopPSampler):


I think this is duplicated with #970

github-actions · 2025-06-11T08:35:25Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-06-20T09:22:46Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan added the ready read for review label Jun 6, 2025

[perf] optimize apply_penalties & topKtopP for V0&V1 ENgine

f7d8c2b

Signed-off-by: linfeng-yuan <1102311262@qq.com>

linfeng-yuan force-pushed the sampler_optimization branch from 83094bb to f7d8c2b Compare June 6, 2025 13:58

MengqingCao reviewed Jun 6, 2025

View reviewed changes

wangxiyuan removed the ready read for review label Jun 7, 2025

github-actions bot added the merge-conflicts label Jun 11, 2025

github-actions bot removed the merge-conflicts label Jun 19, 2025

github-actions bot added the merge-conflicts label Jun 20, 2025

linfeng-yuan closed this Aug 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[perf] optimize apply_penalties & topKtopP for V0&V1 Engine #1107

[perf] optimize apply_penalties & topKtopP for V0&V1 Engine #1107

Uh oh!

linfeng-yuan commented Jun 6, 2025

Uh oh!

MengqingCao Jun 6, 2025

Uh oh!

github-actions bot commented Jun 11, 2025

Uh oh!

github-actions bot commented Jun 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from vllm.v1.sample.ops.topk_topp_sampler import TopKTopPSampler, random_sample


		class AscendTopKTopPSampler(TopKTopPSampler):

[perf] optimize apply_penalties & topKtopP for V0&V1 Engine #1107

[perf] optimize apply_penalties & topKtopP for V0&V1 Engine #1107

Uh oh!

Conversation

linfeng-yuan commented Jun 6, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

MengqingCao Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 11, 2025

Uh oh!

github-actions bot commented Jun 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants