Implement the grad and enhance the cache of norm_convolution fusion ops. #36168

Xreki · 2021-09-27T13:59:15Z

PR types

Performance optimization

PR changes

OPs

Describe

完善基于cudnnFusedOpsPlan_t的CudnnNormConvolution融合计算的写法，包括：
- 基于 @JamesLim-sy 提供的初版，实现 CudnnNormConvolutionGrad
- 实现一个CudnnFusionOpCache，将生成的CudnnFusionOp cache起来，避免每次调用cudnnMakeFusedOpsPlan造成的巨大的CPU开销
调整 cudnn_norm_convolution的单测：
- 增强代码的结构性和重用性
- 添加前向对sum和sum_of_square结果的检查
- 添加反向计算结果的检查

paddle-bot-old · 2021-09-27T13:59:19Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

…d and backward.

JamesLim-sy · 2021-09-29T02:27:25Z

paddle/fluid/operators/fused/cudnn_norm_conv.cu.h

+            handle, args_.filter_desc.desc(), args_.out_desc.desc(),
+            args_.conv_desc.desc(), args_.in_desc.desc(), dgrad_algo_,
+            &workspace_size));
+    return RoundUp(workspace_size, 512);


这里如果也cache住的话，wgrad_op->GetWorkspaceSizeInBytes(ctx.cudnn_handle())得到的dweight_workspace_size和这里得到的dgrad_workspace_size取最大值，这里的开销也可以减掉了。

这部分开销不算大，而且cache的对象不一样。Fused方式是需要把整个CudnnFusionOp都cache下来，主要目的是cache住 FusedOpsPlan。后续可以再验证下这部分对性能是否有影响，若有则再进一步cache吧。

JamesLim-sy

Brilliant work!

Implement the grad and enhance the cache of norm_convolution fusion ops.

ef87c15

Xreki added 6 commits September 27, 2021 14:11

Remove the profiler.

741d2b3

Fix the error use of std::vector and use independent cache for forwar…

6e5dee6

…d and backward.

Polish the unittest.

b0994e6

Add check of sum and square_sum.

636e1a2

Add unittest of backward.

65b8b6a

Merge branch 'develop' into op/norm_conv

e94a526

Xreki force-pushed the op/norm_conv branch from c37c8e3 to e94a526 Compare September 28, 2021 10:43

Rename the class name in unittest.

daa3057

Xreki force-pushed the op/norm_conv branch from 6bd9c01 to daa3057 Compare September 28, 2021 11:35

Xreki requested a review from JamesLim-sy September 28, 2021 12:34

JamesLim-sy reviewed Sep 29, 2021

View reviewed changes

JamesLim-sy approved these changes Sep 29, 2021

View reviewed changes

Xreki merged commit 767050d into PaddlePaddle:develop Sep 29, 2021

Xreki deleted the op/norm_conv branch September 29, 2021 02:37

This was referenced Oct 9, 2021

Add more tests and fix bugs for cudnn_norm_conv_test and cudnn_bn_and_relu_test #36314

Merged

Add the complete code and related files of resnet_unit_op #36366

Merged

Add ResNetUnit Python API #35426

Merged

sneaxiy mentioned this pull request Nov 10, 2021

MLPerf Optimization for Release/2.2 #37109

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the grad and enhance the cache of norm_convolution fusion ops. #36168

Implement the grad and enhance the cache of norm_convolution fusion ops. #36168

Xreki commented Sep 27, 2021 •

edited

Loading

paddle-bot-old bot commented Sep 27, 2021

JamesLim-sy Sep 29, 2021

Xreki Sep 29, 2021

JamesLim-sy left a comment

Implement the grad and enhance the cache of norm_convolution fusion ops. #36168

Implement the grad and enhance the cache of norm_convolution fusion ops. #36168

Conversation

Xreki commented Sep 27, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Sep 27, 2021

JamesLim-sy Sep 29, 2021

Choose a reason for hiding this comment

Xreki Sep 29, 2021

Choose a reason for hiding this comment

JamesLim-sy left a comment

Choose a reason for hiding this comment

Xreki commented Sep 27, 2021 •

edited

Loading