Skip to content

Conversation

@A-nnonymous
Copy link
Contributor

PR Category

Operator Mechanism

PR Types

Performance

Description

Add fp8_transpose fast_path

benchmark result:
[Fused]Transpose 2D(7168,16384) with paddle.float8_e4m3fn
Average time over 1000 runs: 0.0883 ms
Throughput: 2477.46 GB/s

[Fused]Transpose 3D(8,7168,4096) with paddle.float8_e4m3fn
Average time over 1000 runs: 0.1660 ms
Throughput: 2634.87 GB/s

[Fused]Transpose 3D(8,2048,7168) with paddle.float8_e4m3fn
Average time over 1000 runs: 0.0849 ms
Throughput: 2578.05 GB/s

.F[Framework] Transpose 2D(7168,16384) with paddle.float8_e4m3fn
Average time over 1000 runs: 0.2231 ms
Throughput: 980.63 GB/s

[Framework] Transpose 3D(8,7168,4096) with paddle.float8_e4m3fn
Average time over 1000 runs: 0.4424 ms
Throughput: 988.88 GB/s

[Framework] Transpose 3D(8,2048,7168) with paddle.float8_e4m3fn
Average time over 1000 runs: 0.2230 ms
Throughput: 981.06 GB/s

pcard-91067

@paddle-bot
Copy link

paddle-bot bot commented Aug 26, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@A-nnonymous
Copy link
Contributor Author

/re-run all-failed

4 similar comments
@A-nnonymous
Copy link
Contributor Author

/re-run all-failed

@A-nnonymous
Copy link
Contributor Author

/re-run all-failed

@A-nnonymous
Copy link
Contributor Author

/re-run all-failed

@A-nnonymous
Copy link
Contributor Author

/re-run all-failed

@A-nnonymous
Copy link
Contributor Author

/re-run all-failed

1 similar comment
@A-nnonymous
Copy link
Contributor Author

/re-run all-failed

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for unittest.skipIf。原因:因为涉及到fp8的特性,单测里使用unittest.skipIf skip了hopper架构以下的设备

@A-nnonymous
Copy link
Contributor Author

/re-run all-failed

@zhangbo9674 zhangbo9674 merged commit bb68c90 into PaddlePaddle:develop Aug 28, 2025
105 of 118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants