Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon 3 No.33】为 Paddle 优化 erfinv op 在 GPU 上的计算性能 #45057

Merged
merged 4 commits into from
Aug 23, 2022

Conversation

thunder95
Copy link
Contributor

PR types

Performance optimization

PR changes

OPs

Describe

目前 Paddle 内 erfinv 算子的 GPU 实现采用 Eigen 组合的模式,缺少 GPU Kernel,性能相对不足;可以基于飞桨已有的kps api基础上开发得到较高的性能提升。
设计文档: PaddlePaddle/community#199

  • 开发环境:
  1. 设备:RTX 2070s
  2. 环境:CUDA10.2,cuDNN 7
  • 优化方法
    1. (方案A)参考Eigen,在cuda算子中先实现ndtri函数,进一步实现erfinv函数
    2.(方案B)直接基于cuda提供的内置api函数进行开发
  1.  基于飞桨团队已实现的elementwisekernel,得到较明显的性能提升

完成优化后,Paddle与优化前的Paddle的前向推理性能对比效果:

方案 Case No. input_shape paddle Perf(ms) old_paddle Perf(ms) ratio
A 0 [16, 204800] 0.1556 0.1302 0.8368 x
A 1 [10, 20, 30, 40, 5, 6] 8.6268 7.9096 0.9169 x
B 0 [16, 204800] 0.067831 0.1302 2.2939 x
B 1 [10, 20, 30, 40, 5, 6] 2.76477 7.9096 3.1202 x

完成优化后,Paddle与Pytorch的前向推理性能对比效果:

方案 Case No. input_shape paddle Perf(ms) pytorch Perf(ms) ratio
A 0 [16, 204800] 0.1556 0.0832 0.5347 x
A 1 [10, 20, 30, 40, 5, 6] 8.6268 2.7898 0.3234 x
B 0 [16, 204800] 0.067831 0.0832 1.2266 x
B 1 [10, 20, 30, 40, 5, 6] 2.76477 2.7898 1.0091 x

方案A实现较为复杂,反而性能还有所降低,故本PR采用方案B。

@paddle-bot
Copy link

paddle-bot bot commented Aug 10, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

namespace phi {

template <typename T>
struct ErfinvCUDAFunctor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接叫ErfinvFunctor就可以

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯 已修改


template <typename T>
struct ErfinvCUDAFunctor {
HOSTDEVICE inline ErfinvCUDAFunctor() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

默认构造为空的话可以省略

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

谢谢建议,已移除

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZzSean 辛苦老师再看一下

Copy link
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZzSean ZzSean merged commit 0e384ad into PaddlePaddle:develop Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants