optimize the funtion of computing topk and topp in sampler. #970

momo609 · 2025-05-27T07:03:45Z

What this PR does / why we need it?

Optimize the performance of calculation logic in sampler and deepseekv2.

Does this PR introduce any user-facing change?

Added VLLM_ENABLE_TOPK_OPTIMZE config in sampler

How was this patch tested?

pytest test_sampler.py

MengqingCao

Please add pr description
Run bash format.sh locally to fix lint failures

vllm_ascend/models/deepseek_v2.py

MengqingCao · 2025-05-29T06:54:08Z

vllm_ascend/ops/fused_moe.py


        # Sort by local expert IDs
-        sort_indices = torch.argsort(filtered_experts)
+        sort_indices = torch.argsort(filtered_experts.view(torch.float32))


why we change the dtype of filterd_experts to float32? And view will change the metadata of filterd_experts, instead of create a new tensor. Is this expected?

sort can use aicore under float32, with better performance

MengqingCao · 2025-05-29T06:56:07Z

vllm_ascend/ops/utils.py

+                weight: torch.Tensor,
+                bias: Optional[torch.Tensor] = None):
+    import torch_npu
+    if torch_npu.get_npu_format(weight) != 29:


What format does 29 refer to? Let's add a comment on it.

MengqingCao · 2025-05-29T06:57:01Z

vllm_ascend/ops/utils.py

+        return rocm_unquantized_gemm
+    return npu_matmul_add
+
+unquantized_gemm = dispatch_unquantized_gemm


I guess you want to patch vllm.model_executor.layers.utils.dispatch_unquantized_gemm into a custom one?

If so, let's do this in vllm_ascend/patch and discribe the details in vllm_ascend/patch/__init __.py

MengqingCao · 2025-05-29T06:58:40Z

vllm_ascend/sample/sampler.py

+
+s1.apply_min_p = apply_min_p
+if envs.VLLM_ENABLE_TOPK_OPTIMZE:
+    TopKTopPSampler.forward_native = topk_topp_forward_native


ditto. let's do this in vllm_ascend/patch and discribe the details in vllm_ascend/patch/__init __.py

Please add percision ut to check if the _apply_top_k_top_p and apply_min_p calculate correctly

MengqingCao · 2025-05-29T07:02:35Z

vllm_ascend/sample/sampler.py

+    # Convert logits to probability distribution
+    probability_values = torch.nn.functional.softmax(logits, dim=-1)
+    # Calculate maximum probabilities per sequence
+    max_probabilities = torch.amax(probability_values,


Does torch.nn.functional.softmax and torch.amax bring any performance gain compared with torch.softmax and torch.tensor.max?

This function changes indexput to masked_fill to achieve performance optimization. The operation in the comment is not modified.

momo609 · 2025-05-30T07:08:41Z

@wangxiyuan

vllm_ascend/patch/worker/patch_common/patch_sampler.py

tests/sample/test_sampler.py

vllm_ascend/models/deepseek_v2.py

tests/e2e/doctests/001-quickstart-test.sh

wangxiyuan

please add a ds test for multicard as well. example: https://github.com/vllm-project/vllm-ascend/blob/main/tests/multicard/test_offline_inference_distributed.py#L49

tests/singlecard/test_offline_inference.py

github-actions · 2025-06-05T01:24:51Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Yikun

LGTM except env, do wen have any plan to remove the VLLM_ASCEND_ENABLE_TOPK_OPTIMZE and enable by default in future?

Yikun · 2025-06-04T13:29:01Z

tests/singlecard/test_offline_inference.py

                                   max_tokens=64)
+
+
+@patch.dict(os.environ, {"VLLM_ENABLE_TOPK_OPTIMZE": "1"})


Suggested change

@patch.dict(os.environ, {"VLLM_ENABLE_TOPK_OPTIMZE": "1"})

@patch.dict(os.environ, {"VLLM_ASCEND_ENABLE_TOPK_OPTIMZE": "1"})

Yikun · 2025-06-05T06:09:57Z

vllm_ascend/envs.py

+    "VLLM_ENABLE_TOPK_OPTIMZE":
+    lambda: bool(int(os.getenv("VLLM_ENABLE_TOPK_OPTIMZE", '0'))),


Suggested change

"VLLM_ENABLE_TOPK_OPTIMZE":

lambda: bool(int(os.getenv("VLLM_ENABLE_TOPK_OPTIMZE", '0'))),

"VLLM_ASCEND_ENABLE_TOPK_OPTIMZE":

lambda: bool(int(os.getenv("VLLM_ASCEND_ENABLE_TOPK_OPTIMZE", '0'))),

After testing, the tpu_apply_top_k_top_p function achieves optimal performance. Signed-off-by: wangxiaoxin (A) <wangxiaoxin7@huawei.com> Co-authored-by: ZhengWG <zwg0606@gmail.com>

…ject#970) ### What this PR does / why we need it? Optimize the performance of calculation logic in sampler and deepseekv2. ### Does this PR introduce _any_ user-facing change? Added VLLM_ENABLE_TOPK_OPTIMZE config in sampler ### How was this patch tested? pytest test_sampler.py Signed-off-by: wangxiaoxin (A) <wangxiaoxin7@huawei.com> Co-authored-by: wangxiaoxin (A) <wangxiaoxin7@huawei.com> Co-authored-by: ZhengWG <zwg0606@gmail.com>

github-actions bot added module:ops module:core labels May 27, 2025

momo609 force-pushed the main branch from 23fbb8c to 8263545 Compare May 29, 2025 03:47

MengqingCao reviewed May 29, 2025

View reviewed changes

github-actions bot added the module:tests label May 29, 2025

momo609 force-pushed the main branch 3 times, most recently from 52aff53 to c35b678 Compare May 30, 2025 07:06

wangxiyuan reviewed May 30, 2025

View reviewed changes

vllm_ascend/patch/worker/patch_common/patch_sampler.py Show resolved Hide resolved

wangxiyuan reviewed May 30, 2025

View reviewed changes

tests/sample/test_sampler.py Outdated Show resolved Hide resolved

momo609 force-pushed the main branch 4 times, most recently from 90ae7ec to 07c6282 Compare May 30, 2025 08:47

ganyi1996ppo reviewed May 30, 2025

View reviewed changes

vllm_ascend/models/deepseek_v2.py Outdated Show resolved Hide resolved

Yikun reviewed May 30, 2025

View reviewed changes

tests/e2e/doctests/001-quickstart-test.sh Outdated Show resolved Hide resolved

momo609 force-pushed the main branch 4 times, most recently from 77b79f7 to 7940e8e Compare June 3, 2025 03:47

wangxiyuan reviewed Jun 3, 2025

View reviewed changes

tests/singlecard/test_offline_inference.py Outdated Show resolved Hide resolved

momo609 force-pushed the main branch 8 times, most recently from da1af56 to e2bc926 Compare June 3, 2025 09:35

momo609 force-pushed the main branch 4 times, most recently from ba565b9 to 950db4a Compare June 5, 2025 01:24

github-actions bot added the module:quantization label Jun 5, 2025

github-actions bot added the merge-conflicts label Jun 5, 2025

momo609 force-pushed the main branch from 950db4a to 65fba85 Compare June 5, 2025 01:26

github-actions bot removed module:quantization merge-conflicts labels Jun 5, 2025

momo609 force-pushed the main branch 7 times, most recently from 430d325 to 6333191 Compare June 5, 2025 03:30

Yikun approved these changes Jun 5, 2025

View reviewed changes

momo609 force-pushed the main branch 3 times, most recently from ee63d93 to 3ce47a2 Compare June 5, 2025 06:24

add optimze of dsv3.

4be61f7

After testing, the tpu_apply_top_k_top_p function achieves optimal performance. Signed-off-by: wangxiaoxin (A) <wangxiaoxin7@huawei.com> Co-authored-by: ZhengWG <zwg0606@gmail.com>

momo609 force-pushed the main branch from 41f3c7f to 4be61f7 Compare June 5, 2025 06:25

wangxiyuan approved these changes Jun 5, 2025

View reviewed changes

wangxiyuan merged commit 908a851 into vllm-project:main Jun 5, 2025
21 of 23 checks passed

wangxiyuan mentioned this pull request Jun 5, 2025

[Perf] speed up topk_topp_sampler #980

Closed

MengqingCao mentioned this pull request Jun 6, 2025

[perf] optimize apply_penalties & topKtopP for V0&V1 Engine #1107

Closed

Yikun mentioned this pull request Jun 24, 2025

[Bug]: AscendSampler does not handle empty logit tensor #1133

Open

		max_tokens=64)


		@patch.dict(os.environ, {"VLLM_ENABLE_TOPK_OPTIMZE": "1"})

	@patch.dict(os.environ, {"VLLM_ENABLE_TOPK_OPTIMZE": "1"})
	@patch.dict(os.environ, {"VLLM_ASCEND_ENABLE_TOPK_OPTIMZE": "1"})

		"VLLM_ENABLE_TOPK_OPTIMZE":
		lambda: bool(int(os.getenv("VLLM_ENABLE_TOPK_OPTIMZE", '0'))),

optimize the funtion of computing topk and topp in sampler. #970

optimize the funtion of computing topk and topp in sampler. #970

Uh oh!

Conversation

momo609 commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

MengqingCao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

momo609 Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

momo609 commented May 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wangxiyuan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jun 5, 2025

Uh oh!

Yikun left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

momo609 commented May 27, 2025 •

edited

Loading

momo609 Jun 4, 2025 •

edited

Loading