optimize conv algo cache #41891

phlrain · 2022-04-18T01:04:39Z

PR types

Performance optimization

PR changes

Others

Describe

优化点：cudnn提供的cudnnGetConvolutionForwardAlgorithm_v7，cudnnGetConvolutionForwardWorkspaceSize 这两个接口性能都比较低，在重调度的场景下，会带来较大的性能问题

本pr基于auto tune cache的基础上做了一些优化升级，升级点如下：

将workspace size 进行cache，cudnnGetConvolutionForwardAlgorithm_v7返回的结果中包含了workspace size，（memory字段），可以直接使用，没有必要重复获取；因此将cache的value 从int64_t，升级为DnnNode，包含一个int64_t 和 size_t
即使不开启auto tune，也会将搜索的结果进行cache，减少调用cudnnGetConvolutionForwardAlgorithm_v7的次数，本地做了一个实验，当cache中包含10,000,000个元素的时候，平均一次搜索的时间为0.16 微妙，cudnnGetConvolutionForwardAlgorithm_v7 搜索一次的开销大约为70-100微妙左右；cache查询的开销，远低于 cudnn搜索一次的开销
为防止内存爆掉， cache承载的最大的数据量为1,000,000; 超过1,000,000,会强行clear；因为unordered map消耗的资源大约为实际存储的数据的10倍， 1,000,000个元素，每个元素16个字节（int64_t 8字节，size_t 为8字节），所以消耗的最大内存空间为160M字节；同时对于大部分训练的任务，cache中的元素数量是跟输入图片的形状强关联的，是能穷举，不会无限增长；
conv args 增加groups 和 dataformat字段，出现其他的属性相同，但是groups和data format不一致，导致的冲突，这种冲突会引起conv的kernel执行失败
升级cache的unordered map的设计，旧的unordered map的hash key是在外部算好，传递给unordered map的，这种方案带来的问题是如果不同的conv args得到相同的key，就会导致返回错误的algo 和 workspace，导致运行错误；新的方案设计为 unordered map的key 是ConvCacheKey（存储的内容和conv args一致，只是为了编译解耦，新定义了一个数据结构），重载hash和equal函数，避免冲突的产生

该优化pr能够是mask rcnn 在bs =1的时候，性能提升大约20%

paddle-bot-old · 2022-04-18T01:04:43Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… try_to_fix_conv_speed

JamesLim-sy

Good Work.

JamesLim-sy · 2022-04-19T04:17:02Z

paddle/fluid/operators/conv_cudnn_helper.h

        } else {
          result = FindAlgoHeuristic(args, ctx);
        }
+        phi::autotune::DnnNode node(static_cast<int64_t>(result.algo),
+                                    result.workspace_size);


DnnNode 的功能和SearchResult的重复性比较高，如果能够用SearchResult替代更好。不过后续我们这边应该会在DnnNode的基础上扩展出来AutoTuneResult类型。

我有一个版本是使用的SearchResult，但是search Result 里面模板T是 cudnnConvolutionFwdAlgoPerf_t，这样cache.h会依赖，gpu_info.h, cache.h 在cpu场景下也会使用，编译会有问题

JamesLim-sy · 2022-04-19T04:25:31Z

paddle/fluid/platform/flags.cc

+ * Value Range: int32, default=1000000
+ * Example:
+ */
+PADDLE_DEFINE_EXPORTED_int32(search_cache_max_number, 1000000,


这块想了解下1000000 次设置的验证范围，是否覆盖了目前常用的Cudnn版本，主要是想避免因为版本问题导致Find开销超越cudnnGetConvolutionForwardAlgorithm_v7这类接口的开销。

为防止内存爆掉， cache承载的最大的数据量为1,000,000; 超过1,000,000,会强行clear；因为unordered map消耗的资源大约为实际存储的数据的10倍， 1,000,000个元素，每个元素16个字节（int64_t 8字节，size_t 为8字节），所以消耗的最大内存空间未160M字节；同时对于大部分训练的任务，cache中的元素数量是跟输入图片的形状强关联的，是能穷举，不会无限增长；

1,000,000的设置，主要是考虑内存的占用，这个数量，跟模型input 和 weight这些强关联，跟cudnn版本关联性并不大

… try_to_fix_conv_speed

JamesLim-sy

LGTM

… try_to_fix_conv_speed

paddle-bot-old · 2022-05-08T02:58:43Z

Sorry to inform you that b15a4be's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

… try_to_fix_conv_speed

JamesLim-sy

Good work !

JamesLim-sy · 2022-08-15T06:07:11Z

paddle/phi/kernels/autotune/cache.h

+};
+
+template <typename AlgorithmT>
+class CudnnAlgorithmsCache {


CudnnAlgorithmsCache名义上即针对ConvCudnn，是否可以取消模板参数AlgorithmT，切换成DnnNode，比如下方的Get方法：

DnnNode AlgorithmT Get(const ConvCacheKey& key) { ...... };

JamesLim-sy · 2022-08-15T06:32:01Z

paddle/phi/kernels/autotune/cache.h

-    if (auto_tune_map_.find(key) == auto_tune_map_.end()) {
-      AlgorithmsCacheMap cache;
-      auto_tune_map_[key] = cache;
+    if (algo_type == AlgorithmType::kTranspose) {


可用自动调节的OP目前只有Conv 和 Transpose，为了后续更多的OP接入，建议将下列判断条件
if (algo_type == AlgorithmType::kTranspose) { 改成
if (static_cast<size_t>(algo_type) >= static_cast<size_t>(AlgorithmType::kTranspose)) {

JamesLim-sy · 2022-08-15T06:35:37Z

paddle/fluid/operators/conv_base_helper.h

-        paddle::experimental::CppTypeToDataType<T>::Type());
+        paddle::experimental::CppTypeToDataType<T>::Type(),
+        group,
+        static_cast<int64_t>(data_layout));  // (todo,hong) data layeout is a


GetCacheKey的功能已经被Convert2ConvCacheKey 取代了，这块感觉可以直接删除.

JamesLim-sy · 2022-08-15T06:38:42Z

paddle/phi/kernels/autotune/cache.h

+        groups_(groups),
+        data_layout_(data_layout) {}
+  size_t hash_value() const {
+    return ConvKey(x_dims_,


ConvKey的功能已经被取代，这里建议直接改用原ConvKey的函数体了

return GetKey(x_dims_, w_dims_, strides_, paddings_, dilations_, dtype_, groups_, data_layout_);

… try_to_fix_conv_speed

chenwhql

LGTM

JamesLim-sy

LGTM

luotao1

LGTM for 'self.assertTrue(np.allclose(...))' and 'self.assertTrue(np.array_equal(...))。下个PR修复一下。

optimizer conv alog speed

5547efd

phlrain changed the title ~~optimize conv alog speed~~ optimize conv algo scache Apr 18, 2022

phlrain added 12 commits April 18, 2022 01:13

code polish

490abce

remove useless code

f06dcad

fix compile error

12f8364

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

d9dfe6c

… try_to_fix_conv_speed

fix cpu compile error

1729ba8

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

b8c05fd

… try_to_fix_conv_speed

not use cudnn alog t

0f64787

add search cache max number

5314766

polish code

56eb2c6

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e5bfa67

… try_to_fix_conv_speed

fix cache test bug

aabc60f

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

09a04fb

… try_to_fix_conv_speed

phlrain changed the title ~~optimize conv algo scache~~ optimize conv algo cache Apr 18, 2022

phlrain added 9 commits April 18, 2022 09:12

add groups data format to conv args

c455f11

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

445fe4d

… try_to_fix_conv_speed

fix cache test bug

442a9e1

fix cudnn_deterministic bug

19c59f7

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

4b0a58d

… try_to_fix_conv_speed

fix test switch auto tune bug

40c7d23

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

184cab6

… try_to_fix_conv_speed

fix test swith autotune bug;

df57ee6

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

82c2419

… try_to_fix_conv_speed

JamesLim-sy reviewed Apr 19, 2022

View reviewed changes

phlrain added 3 commits April 20, 2022 05:43

fix conv cache bug

6dbeaa5

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

2cd1c00

… try_to_fix_conv_speed

fix cache test error

c8fe9c6

phlrain added 3 commits April 20, 2022 07:40

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

83e1c8c

… try_to_fix_conv_speed

fix windows mac compile error

93885d4

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

3404862

… try_to_fix_conv_speed

JamesLim-sy previously approved these changes Apr 24, 2022

View reviewed changes

phlrain added 3 commits April 28, 2022 02:19

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

fb53df7

… try_to_fix_conv_speed

fix workspace search error

4098916

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

b15a4be

… try_to_fix_conv_speed

phlrain dismissed JamesLim-sy’s stale review via b15a4be April 28, 2022 08:18

phlrain added 8 commits June 23, 2022 13:37

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

11b8315

… try_to_fix_conv_speed

update cudnn cache

ba41e29

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

662dca2

… try_to_fix_conv_speed

fix cache test bug; test=develop

99a33bf

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ecfa2e4

… try_to_fix_conv_speed

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

af7fa80

… try_to_fix_conv_speed

fix autotune swith test error

f7afc76

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

0101cc4

… try_to_fix_conv_speed

JamesLim-sy reviewed Aug 15, 2022

View reviewed changes

phlrain added 7 commits August 17, 2022 09:36

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

4ad71a8

… try_to_fix_conv_speed

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

a806f93

… try_to_fix_conv_speed

polish code

10de962

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

7290528

… try_to_fix_conv_speed

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

a549c20

… try_to_fix_conv_speed

oplish code

65c5ecc

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

f1a0da6

… try_to_fix_conv_speed

chenwhql approved these changes Aug 25, 2022

View reviewed changes

JamesLim-sy approved these changes Aug 25, 2022

View reviewed changes

luotao1 approved these changes Aug 25, 2022

View reviewed changes

phlrain merged commit 1cd7e68 into PaddlePaddle:develop Aug 25, 2022

Xreki mentioned this pull request Oct 19, 2022

Enable to record whether the conv algo is got by exhaustive search to fix autotune cache bug. #47065

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize conv algo cache #41891

optimize conv algo cache #41891

phlrain commented Apr 18, 2022 •

edited

Loading

paddle-bot-old bot commented Apr 18, 2022

JamesLim-sy left a comment

JamesLim-sy Apr 19, 2022

phlrain Apr 19, 2022

JamesLim-sy Apr 19, 2022

phlrain Apr 19, 2022

JamesLim-sy left a comment

paddle-bot-old bot commented May 8, 2022

JamesLim-sy left a comment

JamesLim-sy Aug 15, 2022

phlrain Aug 24, 2022

JamesLim-sy Aug 15, 2022

phlrain Aug 24, 2022

JamesLim-sy Aug 15, 2022

phlrain Aug 24, 2022

JamesLim-sy Aug 15, 2022

phlrain Aug 24, 2022

chenwhql left a comment

JamesLim-sy left a comment

luotao1 left a comment

optimize conv algo cache #41891

optimize conv algo cache #41891

Conversation

phlrain commented Apr 18, 2022 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Apr 18, 2022

JamesLim-sy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JamesLim-sy left a comment

Choose a reason for hiding this comment

paddle-bot-old bot commented May 8, 2022

JamesLim-sy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenwhql left a comment

Choose a reason for hiding this comment

JamesLim-sy left a comment

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

phlrain commented Apr 18, 2022 •

edited

Loading