Skip to content

Conversation

@chang-wenbin
Copy link
Contributor

@chang-wenbin chang-wenbin commented Jul 30, 2025

PR Category

Inference

PR Types

Performance

Description

Pcard-71500
1、Support enabling stream_k by setting the environment variable export CUTLASS_GEMM_STREAM_K=1
2、stream_k can be used to accelerate the performance of wint8/wint4 dense gemm operators.
3、In actual tests, we obtained performance gains of about 15% for wint4 and about 30% for wint8.
4、At the same time, we have added a new executable single test. The previous single test did not execute the operator.
5、Now you can enable this acceleration by setting export CUTLASS_GEMM_STREAM_K=1
6、If you want to reproduce the performance gain, use the following command: ncu --set full -o profile -k Kernel2 python ./Paddle/test/quantization/test_weight_only_linear.py::WeightOnlyLinear_stream_k_TestCase

@paddle-bot
Copy link

paddle-bot bot commented Jul 30, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@wanghuancoder wanghuancoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for skip

@zhoutianzi666 zhoutianzi666 merged commit 9a7ec49 into PaddlePaddle:develop Jul 31, 2025
51 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants