Skip to content

Conversation

@ZhengWG
Copy link
Contributor

@ZhengWG ZhengWG commented May 27, 2025

What this PR does / why we need it?

Under temperature > 0, the sampler logic's processing time increased to 8ms from 3ms for a batch size of 24 per dp rank, with the scatter operation accounting for 6ms. By optimizing the scatter operator with reference to the TPU implementation, we've reduced the total sampler time to 2ms.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Tested with vllm_ascend, verifying correct behavior in TopKTopPSampler using forward_npu.

@ZhengWG ZhengWG changed the title perf: speed up topk_topp_sampler [Perf] speed up topk_topp_sampler May 29, 2025
@wangxiyuan
Copy link
Collaborator

#970 is involved with the change and you have been added as co-author. Close this PR now.

@wangxiyuan wangxiyuan closed this Jun 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants