【Hackathon No.33】为 Paddle 优化 erfinv op 在 GPU 上的计算性能 #199

thunder95 · 2022-08-09T06:00:00Z

为 Paddle 优化 erfinv op 在 GPU 上的计算性能
任务：PaddlePaddle/Paddle#44072 (comment)

JamesLim-sy · 2022-08-11T06:52:18Z

rfcs/OPs-Perf/20220805_erfinv_op_optimization.md

+| Case No. | device | input_shape | input_type | Paddle Perf(ms) |
+|---|---|---|---|---|
+| 1 | RTX 2070s | [-1L, 204800L] | float32 | 0.1438 | 
+| 2 | RTX 2070s |[10L, 20L, 30L, 40L, 5L, 6L] | float64 8| 8.6485 |


float64 8 这块数据好像有些问题

笔误，已纠正

JamesLim-sy · 2022-08-11T06:53:49Z

rfcs/OPs-Perf/20220805_erfinv_op_optimization.md

+
+Pytorch中对Erfinv算子的实现基于GPU计算,  forward整体性能如下(基于pytorch　v1.12)：
+
+| Case No. | device | input_shape | input_type | Paddle Perf(ms) |


Paddle Perf(ms) 这部分是不是应该改成 Pytorch Perf(ms)

笔误，已纠正

JamesLim-sy · 2022-08-11T06:55:49Z

rfcs/OPs-Perf/20220805_erfinv_op_optimization.md

+
+## 2.1 关键模块与性能提升点
+
+通过使用飞桨内部的Elementwise Kernel来进行计算。通过向量化读取、向量化写入以及gpu_launch_config.h中的线程配置方法对算子进行优化，预计提升1.2倍。


性能提升预估1.2x倍提升后，数值上之后距离torch的性能还有差异，可以尝试看下底层C++端二者是否还有什么实现差异。

@JamesLim-sy 尝试了torch的c++实现方式，也尝试了ndtri函数实现，性能没有明显提升。最终使用cuda内置函数，得到了２倍以上的提升，相比torch也有１倍以上的提升。

ZzSean

LGTM

thunder95 added 6 commits August 9, 2022 13:09

erfinv

233daf8

false commit

36f32a3

Merge branch 'master' of https://github.com/PaddlePaddle/community

33a8a1f

erfinv

23da46b

false commit

e3dc6b8

erfinv

42c4c79

thunder95 mentioned this pull request Aug 9, 2022

【PaddlePaddle Hackathon 第三期】任务总览 PaddlePaddle/Paddle#43938

Closed

luotao1 assigned luotao1 and JamesLim-sy Aug 9, 2022

luotao1 added the contributor label Aug 9, 2022

luotao1 assigned Ligoml and luotao1 and unassigned luotao1 Aug 10, 2022

thunder95 mentioned this pull request Aug 10, 2022

【PaddlePaddle Hackathon 3 No.33】为 Paddle 优化 erfinv op 在 GPU 上的计算性能 PaddlePaddle/Paddle#45057

Merged

JamesLim-sy reviewed Aug 11, 2022

View reviewed changes

fix typo

68c9789

luotao1 assigned ZzSean Aug 16, 2022

ZzSean approved these changes Aug 17, 2022

View reviewed changes

ZzSean merged commit 433c68b into PaddlePaddle:master Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Hackathon No.33】为 Paddle 优化 erfinv op 在 GPU 上的计算性能 #199

【Hackathon No.33】为 Paddle 优化 erfinv op 在 GPU 上的计算性能 #199

thunder95 commented Aug 9, 2022

JamesLim-sy Aug 11, 2022

thunder95 Aug 11, 2022

JamesLim-sy Aug 11, 2022

thunder95 Aug 11, 2022

JamesLim-sy Aug 11, 2022

thunder95 Aug 11, 2022 •

edited

Loading

ZzSean left a comment


		Pytorch中对Erfinv算子的实现基于GPU计算, forward整体性能如下(基于pytorch　v1.12)：

		\| Case No. \| device \| input_shape \| input_type \| Paddle Perf(ms) \|


		## 2.1 关键模块与性能提升点

		通过使用飞桨内部的Elementwise Kernel来进行计算。通过向量化读取、向量化写入以及gpu_launch_config.h中的线程配置方法对算子进行优化，预计提升1.2倍。

【Hackathon No.33】为 Paddle 优化 erfinv op 在 GPU 上的计算性能 #199

【Hackathon No.33】为 Paddle 优化 erfinv op 在 GPU 上的计算性能 #199

Conversation

thunder95 commented Aug 9, 2022

JamesLim-sy Aug 11, 2022

Choose a reason for hiding this comment

thunder95 Aug 11, 2022

Choose a reason for hiding this comment

JamesLim-sy Aug 11, 2022

Choose a reason for hiding this comment

thunder95 Aug 11, 2022

Choose a reason for hiding this comment

JamesLim-sy Aug 11, 2022

Choose a reason for hiding this comment

thunder95 Aug 11, 2022 • edited Loading

Choose a reason for hiding this comment

ZzSean left a comment

Choose a reason for hiding this comment

thunder95 Aug 11, 2022 •

edited

Loading