Modify the elementwise op according to the kernel primitive API #34456

AnnaTrainingG · 2021-07-28T09:15:37Z

PR types

Function optimization

PR changes

APIs

Describe

Modify the elementwise op according to the kernel primitive API

1.将elementwise_op_impl.cu.h 中的ElementwiseVectorKernel 根据ET类型拆分成3个cuda kernel 以适配primivetive_api
2.将elementwise_op_broadcast.cu.h 中的ElementwiseBroadcastKernel 根据ET类型拆分成3个cuda kernel 以适配primivetive_api；
3.重构elementwise_op_broadcast.cu.h 中函数调用结构，定义BroadcastConfig结构体，简化broadcastConfig配置方式；

性能：替换前后性能与替换之前性能打平，部分case超越原始性能

same_dim add
	case x_shape	dtype	old us	API us	pytorch us	speed up
0	[50L, 128L, 1000L]	fp32	92.16	92.21	92.361	1.00
1	[50L, 128L, 1000L	fp32	92.21	92.2	92.277	1.00
2	[-1L, 2048L, 7L, 7L]	fp32	24.58	24.574	25.3	1.00
3	[-1L, 2048L, -1L, -1L]	fp32	120.11	120.08	120.33	1.00
4	[-1L, 1L, 513L, 513L]	fp32	61.256	61.376	62	1.00
5	[512L, 896L, 4L, 12L]	fp32	311.73	311.9	312.4	1.00
6	[512L, 896L, 4L, 12L]	fp16	158.29	158.27	158.76	1.00
8	[32L, 1L, 1L, 128L]	fp16	1.317	1.29	1.275	1.02

broadcast_add
	case x_shape	case y_shape	dtype	old us	API us	speed up
0	[50L, 128L, 1000L]	[128L, 1000L]	fp32	64.65	64.382	1.00
1	[50L, 128L, 1000L	[1L, 128L, 1000L]	fp32	65.058	64.31	1.01
2	[-1L, 2048L, 7L, 7L]	[-1L, 2048L]	fp32	18.332	17.637	1.04
3	[-1L, 2048L, -1L, -1L]	[-1L, 2048L, -1L, -1L]	fp32	120.1	120.08	1.00
4	[-1L, 1L, 513L, 513L]	[1L]	fp32	42.465	42.505	1.00
5	[512L, 896L, 4L, 12L]	[512L, 896L, 4L, 1L]	fp32	225.92	223.15	1.01
6	[512L, 896L, 4L, 12L]	[512L, 896L, 4L, 1L]	fp16	119.92	119.57	1.00
8	[32L, 12L, 128L, 128L]	[32L, 1L, 1L, 128L]	fp16	34.866	34.505	1.01
9	[32L, 1L, 1L, 128L]	[1L, 12L, 128L, 1L]	fp16	41.154	39.376	1.05

update

CLAassistant · 2021-07-28T09:15:41Z

All committers have signed the CLA.

paddle-bot-old · 2021-07-28T09:15:42Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle/fluid/operators/kernel_primitives/datamover_primitives.h

paddle/fluid/operators/kernel_primitives/compute_primitives.h

paddle/fluid/operators/kernel_primitives/datamover_primitives.h

paddle/fluid/operators/elementwise/elementwise_op_broadcast_api.cu.h

update

…ntwise_add

update

paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h

update

… module_api_add_block

AnnaTrainingG

删除不需要的模板参数

paddle/fluid/operators/elementwise/elementwise_op_broadcast.cu.h

paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h

update

… module_api_add_block

paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h

xingfeng01 · 2021-09-06T02:09:55Z

LGTM

JamesLim-sy · 2021-09-06T03:24:51Z

LGTM

ZzSean · 2021-09-07T03:10:29Z

LGTM

paddle/fluid/operators/elementwise/elementwise_op_broadcast.cu.h

paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h

Xreki · 2021-09-07T10:31:14Z

paddle/fluid/operators/elementwise/elementwise_op_broadcast.cu.h

 namespace paddle {
 namespace operators {

+#define MAX_INPUT_NUM 3  // the max num of ET for BroadcacstConfig


定义在ElementwiseType里面，可以最后定义一个kMaxArity = 4。

Xreki · 2021-09-07T10:31:47Z

paddle/fluid/operators/elementwise/elementwise_op_broadcast.cu.h

-        LoadVectorizedDataByDivmod(args[j], tid, j);
-      }
-    }
+template <typename T, int VecSize, int ShapeSize, bool IsBoundary = false>


建议ShapeSize -> Rank

Xreki · 2021-09-07T10:37:27Z

paddle/fluid/operators/elementwise/elementwise_op_broadcast.cu.h

-    }
+template <ElementwiseType ET, typename InT, typename OutT, int ShapeSize,
+          int VecSize, typename Functor, bool IsBoundary = false>
+__device__ void DealSegment(


函数名不恰当。

Xreki · 2021-09-07T10:39:34Z

paddle/fluid/operators/elementwise/elementwise_op_broadcast.cu.h

-  broadcast_wrapper.LoadVectorizedData(args, tid);
+template <typename InT, typename OutT, ElementwiseType ET, int VecSize,
+          int Size, typename Functor>
+void LaunchKernel(const platform::CUDADeviceContext &ctx,


LaunchBroadcastKernel吧，函数名区分一下。

Xreki · 2021-09-07T11:07:31Z

paddle/fluid/operators/elementwise/elementwise_op_broadcast.cu.h

+                  framework::Tensor *out, Functor func,
+                  DimensionsTransform merge_dims) {
+  int numel = out->numel();
+  const int threads = 256;


线程数原来是通过GetThreadsConfig控制的，对一些小case能够有效调整线程配置。

Xreki · 2021-09-07T11:08:31Z

paddle/fluid/operators/elementwise/elementwise_op_broadcast.cu.h

+  OutT *out_data = out->data<OutT>();
+
+  framework::Array<kps::details::BroadcastConfig<Size>, MAX_INPUT_NUM>
+      configlists;


configlists -> config_list

Xreki · 2021-09-07T11:09:30Z

paddle/fluid/operators/elementwise/elementwise_op_broadcast.cu.h

-  InT args[ET][VecSize];
-  broadcast_wrapper.LoadVectorizedData(args, tid);
+template <typename InT, typename OutT, ElementwiseType ET, int VecSize,
+          int Size, typename Functor>


这里的Size也是Rank吧？

Xreki · 2021-09-07T11:11:42Z

paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h

-  inline __device__ void LoadScalarizedData(InT args[], int tid) {
+template <ElementwiseType ET, int VecSize, typename InT, typename OutT,
+          typename Functor, bool IsBoundary>
+__device__ void DealSegment(


函数名同样需要改下，并且这个函数same dims版本和broadcast版本差不多，可以考虑合并一下。

…lePaddle#34456)

AnnaTrainingG added 14 commits March 25, 2021 16:46

Merge pull request #1 from PaddlePaddle/develop

7d58b91

update

Merge pull request #2 from PaddlePaddle/develop

1021e08

update

Merge pull request #3 from PaddlePaddle/develop

43f53fe

update

Merge pull request #4 from PaddlePaddle/develop

d25ab26

update

Merge pull request #5 from PaddlePaddle/develop

8c8717f

update

Merge pull request #6 from PaddlePaddle/develop

9ddf5e8

update

Merge pull request #9 from PaddlePaddle/develop

b0cbcca

update

Merge pull request #14 from PaddlePaddle/develop

cdecaf0

update

Merge pull request #16 from PaddlePaddle/develop

0da14c9

update

Merge pull request #17 from PaddlePaddle/develop

ca95763

update

Merge pull request #18 from PaddlePaddle/develop

25ba21c

update

Merge pull request #19 from PaddlePaddle/develop

3ce9983

update

Merge pull request #20 from PaddlePaddle/develop

61842ed

update

Merge pull request #21 from PaddlePaddle/develop

0e2c73b

update

xingfeng01 reviewed Aug 2, 2021

View reviewed changes

Merge pull request #22 from PaddlePaddle/develop

c1e59cf

update

AnnaTrainingG force-pushed the module_api_add_block branch from 6c0f358 to bbc68cf Compare August 2, 2021 09:06

add api in compute_primitives.h and datamover_primitives.h for eleme…

5065b23

…ntwise_add

AnnaTrainingG force-pushed the module_api_add_block branch from bbc68cf to 5065b23 Compare August 2, 2021 09:09

AnnaTrainingG and others added 5 commits August 2, 2021 10:51

elementwise_same_dim

3bd9291

update

1bbd14b

update

8204ce4

update

7387f80

Merge pull request #23 from PaddlePaddle/develop

3a54149

update

add init in Elementwisebinary

8ed5490

JamesLim-sy reviewed Aug 25, 2021

View reviewed changes

paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h Outdated Show resolved Hide resolved

AnnaTrainingG and others added 5 commits August 25, 2021 10:59

add reduceReadData

a7c7e6b

add vec_size switch

2c6d135

Merge pull request #26 from PaddlePaddle/develop

e1a92d6

update

update

a54bea6

Merge branch 'develop' of https://github.com/niuliling123/Paddle into…

df90f52

… module_api_add_block

AnnaTrainingG changed the title ~~Module api add block~~ Modify the elementwise op according to the kernel primitive API Sep 2, 2021

AnnaTrainingG commented Sep 3, 2021

View reviewed changes

ZzSean reviewed Sep 3, 2021

View reviewed changes

paddle/fluid/operators/elementwise/elementwise_op_broadcast.cu.h Outdated Show resolved Hide resolved

ZzSean reviewed Sep 3, 2021

View reviewed changes

paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h Outdated Show resolved Hide resolved

ZzSean reviewed Sep 3, 2021

View reviewed changes

paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h Outdated Show resolved Hide resolved

ZzSean reviewed Sep 3, 2021

View reviewed changes

paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h Outdated Show resolved Hide resolved

AnnaTrainingG and others added 6 commits September 3, 2021 06:55

delete data_per_thread

37c468c

Merge pull request #27 from PaddlePaddle/develop

05da032

update

Merge branch 'develop' of https://github.com/niuliling123/Paddle into…

346b60d

… module_api_add_block

merge develop

2af5eab

merge ET

a68841c

merge ET

487f12a

JamesLim-sy reviewed Sep 6, 2021

View reviewed changes

paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h Show resolved Hide resolved

xingfeng01 approved these changes Sep 6, 2021

View reviewed changes

zhangting2020 reviewed Sep 7, 2021

View reviewed changes

paddle/fluid/operators/elementwise/elementwise_op_broadcast.cu.h Show resolved Hide resolved

paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h Show resolved Hide resolved

zhangting2020 approved these changes Sep 7, 2021

View reviewed changes

zhangting2020 merged commit eae4bf5 into PaddlePaddle:develop Sep 7, 2021

Xreki reviewed Sep 7, 2021

View reviewed changes

Xreki mentioned this pull request Sep 8, 2021

Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. #35487

Merged

AnnaTrainingG added a commit to AnnaTrainingG/Paddle that referenced this pull request Sep 29, 2021

Modify the elementwise op according to the kernel primitive API (Padd…

9fe2ed1

…lePaddle#34456)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify the elementwise op according to the kernel primitive API #34456

Modify the elementwise op according to the kernel primitive API #34456

AnnaTrainingG commented Jul 28, 2021 •

edited

Loading

CLAassistant commented Jul 28, 2021 •

edited

Loading

paddle-bot-old bot commented Jul 28, 2021

AnnaTrainingG left a comment

xingfeng01 commented Sep 6, 2021

JamesLim-sy commented Sep 6, 2021

ZzSean commented Sep 7, 2021

Xreki Sep 7, 2021

Xreki Sep 7, 2021

Xreki Sep 7, 2021

Xreki Sep 7, 2021

Xreki Sep 7, 2021

Xreki Sep 7, 2021

Xreki Sep 7, 2021

Xreki Sep 7, 2021

Modify the elementwise op according to the kernel primitive API #34456

Modify the elementwise op according to the kernel primitive API #34456

Conversation

AnnaTrainingG commented Jul 28, 2021 • edited Loading

PR types

PR changes

Describe

CLAassistant commented Jul 28, 2021 • edited Loading

paddle-bot-old bot commented Jul 28, 2021

AnnaTrainingG left a comment

Choose a reason for hiding this comment

xingfeng01 commented Sep 6, 2021

JamesLim-sy commented Sep 6, 2021

ZzSean commented Sep 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AnnaTrainingG commented Jul 28, 2021 •

edited

Loading

CLAassistant commented Jul 28, 2021 •

edited

Loading