【Inference Optimize】Support setting environment variables to enable stream_k #74317

chang-wenbin · 2025-07-30T09:22:10Z

PR Category

Inference

PR Types

Performance

Description

Pcard-71500
1、Support enabling stream_k by setting the environment variable export CUTLASS_GEMM_STREAM_K=1
2、stream_k can be used to accelerate the performance of wint8/wint4 dense gemm operators.
3、In actual tests, we obtained performance gains of about 15% for wint4 and about 30% for wint8.
4、At the same time, we have added a new executable single test. The previous single test did not execute the operator.
5、Now you can enable this acceleration by setting export CUTLASS_GEMM_STREAM_K=1
6、If you want to reproduce the performance gain, use the following command: ncu --set full -o profile -k Kernel2 python ./Paddle/test/quantization/test_weight_only_linear.py::WeightOnlyLinear_stream_k_TestCase

paddle-bot · 2025-07-30T09:22:15Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

wanghuancoder

LGTM for skip

chang-wenbin added 13 commits July 22, 2025 11:18

update cutlass3.8

5de0dea

support cutlass v3.8.0

a1dae76

update

6c853c2

Merge remote-tracking branch 'origin/develop' into develop

224beb1

update third_party_tag recursive

f232672

update third_party_tag recursive

9d9a93f

Merge remote-tracking branch 'origin/develop' into develop

e56ca34

Merge remote-tracking branch 'origin/develop' into develop

79354b9

add fpA_intB assert

16b26b6

Impact of updating cuda arch

6fe68c9

Impact of updating cuda arch

cbc98d7

Merge remote-tracking branch 'origin/develop' into develop

c70ac12

Support setting environment variables to enable stream_k

e08b20f

zhoutianzi666 approved these changes Jul 30, 2025

View reviewed changes

wanghuancoder approved these changes Jul 31, 2025

View reviewed changes

zhoutianzi666 merged commit 9a7ec49 into PaddlePaddle:develop Jul 31, 2025
51 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【Inference Optimize】Support setting environment variables to enable stream_k #74317

【Inference Optimize】Support setting environment variables to enable stream_k #74317

Uh oh!

chang-wenbin commented Jul 30, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Jul 30, 2025

Uh oh!

wanghuancoder left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

【Inference Optimize】Support setting environment variables to enable stream_k #74317

【Inference Optimize】Support setting environment variables to enable stream_k #74317

Uh oh!

Conversation

chang-wenbin commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented Jul 30, 2025

Uh oh!

wanghuancoder left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chang-wenbin commented Jul 30, 2025 •

edited

Loading