optimize prelu alpha grad #7600

Flowingsun007 · 2022-02-25T06:37:55Z

No description provided.

MARD1NO · 2022-02-25T06:39:46Z

oneflow/user/kernels/prelu_kernel.cu

@@ -400,10 +401,12 @@ class GpuPReluGradKernel final : public user_op::OpKernel {
                                    alpha->dptr<T>(), dy->dptr<T>(), dx->mut_dptr<T>(),
                                    broadcasted_alpha_diff);
    }
-    NdarrayUtil<DeviceType::kCUDA, T>::ReduceSum(
+    if(alpha_requires_grad){


这个if应该放到386行，并且如果alpha不需要grad的话，底下不需要申请tmp_buffer

guo-ran · 2022-02-25T06:46:02Z

oneflow/user/kernels/prelu_kernel.cu

@@ -400,10 +401,12 @@ class GpuPReluGradKernel final : public user_op::OpKernel {
                                    alpha->dptr<T>(), dy->dptr<T>(), dx->mut_dptr<T>(),
                                    broadcasted_alpha_diff);


broadcasted_alpha_diff 也不应该算，可以在cuda kernel中处理一下不写broadcasted_alpha_diff的情况

github-actions · 2022-02-26T01:30:57Z

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

daquexian · 2022-02-26T02:35:22Z

oneflow/ir/include/OneFlow/OneFlowUserOps.td

@@ -4633,6 +4633,9 @@ def OneFlow_PreluGradOp : OneFlow_BaseOp<"prelu_grad", [NoSideEffect, DeclareOpI
    OneFlow_Tensor:$dx,
    OneFlow_Tensor:$alpha_diff
  );
+  let attrs = (ins
+    DefaultValuedAttr<BoolAttr, "false">:$alpha_requires_grad


这里是说 alpha_requires_grad 默认是 false 吗，感觉它不应该有默认值

emmm，感觉默认为false比较符合直觉？

感觉没有很强的理由默认选择 false，是不是要求用户显式传入会好一些

感觉没有很强的理由默认选择 false，是不是要求用户显式传入会好一些

同意，而且有默认也应该是true，因为这里应为false但是没设置导致默认为true，只对性能有影响，反过来影响正确性

感觉没有很强的理由默认选择 false，是不是要求用户显式传入会好一些

好的，那我改成true吧（这个是可以显示传入的，如果alpha的requires_grad=True/False就显示传入了

github-actions · 2022-02-26T02:39:19Z

Speed stats:

GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.5ms (= 12853.7ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 136.8ms (= 13680.8ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.06 (= 136.8ms / 128.5ms)

❌ OneFlow resnet50 time: 79.5ms (= 7954.1ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 82.4ms (= 8243.4ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.04 (= 82.4ms / 79.5ms)

OneFlow resnet50 time: 50.4ms (= 10087.1ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 57.4ms (= 11482.3ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.14 (= 57.4ms / 50.4ms)

OneFlow resnet50 time: 43.7ms (= 8733.8ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 45.6ms (= 9121.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.04 (= 45.6ms / 43.7ms)

OneFlow resnet50 time: 38.7ms (= 7732.9ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 40.2ms (= 8045.7ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.04 (= 40.2ms / 38.7ms)

✔️ OneFlow resnet50 time: 142.5ms (= 14249.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.7ms (= 16074.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 160.7ms / 142.5ms)

OneFlow resnet50 time: 88.6ms (= 8858.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.9ms (= 10385.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 103.9ms / 88.6ms)

OneFlow resnet50 time: 61.5ms (= 12309.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.0ms (= 14791.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 74.0ms / 61.5ms)

OneFlow resnet50 time: 50.8ms (= 10160.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.5ms (= 13509.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 67.5ms / 50.8ms)

OneFlow resnet50 time: 48.3ms (= 9664.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 62.6ms (= 12525.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.30 (= 62.6ms / 48.3ms)

* optimize prelu alpha grad * refine * refine * refine

optimize prelu alpha grad

c5bbb6c

MARD1NO approved these changes Feb 25, 2022

View reviewed changes

guo-ran reviewed Feb 25, 2022

View reviewed changes

Flowingsun007 added 4 commits February 25, 2022 16:10

refine

516523a

refine

d5c4f86

refine

de22832

Merge branch 'master' into dev_optinal_prelu_alpha_grad

d854e5f

Flowingsun007 marked this pull request as ready for review February 25, 2022 08:35

Flowingsun007 requested review from hjchen2 and liujuncheng as code owners February 25, 2022 08:35

Flowingsun007 enabled auto-merge (squash) February 25, 2022 12:25

Flowingsun007 added 2 commits February 25, 2022 20:26

Merge branch 'master' into dev_optinal_prelu_alpha_grad

bd33885

Merge branch 'master' into dev_optinal_prelu_alpha_grad

a310b9c

Flowingsun007 added enhancement op labels Feb 26, 2022

Flowingsun007 requested a review from oneflow-ci-bot February 26, 2022 01:29

auto format by CI

513a456

Flowingsun007 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 26, 2022 01:33

daquexian reviewed Feb 26, 2022

View reviewed changes

BBuf approved these changes Feb 26, 2022

View reviewed changes

oneflow-ci-bot removed their request for review February 26, 2022 02:42

Flowingsun007 added the automerge label Feb 26, 2022

Flowingsun007 requested a review from oneflow-ci-bot February 26, 2022 02:48

refine

d9a2fc5

Flowingsun007 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 26, 2022 03:21

Flowingsun007 merged commit b7ad753 into master Feb 26, 2022

Flowingsun007 deleted the dev_optinal_prelu_alpha_grad branch February 26, 2022 05:13

marigoold pushed a commit that referenced this pull request Mar 15, 2022

optimize prelu alpha grad (#7600)

aeaddef

* optimize prelu alpha grad * refine * refine * refine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize prelu alpha grad #7600

optimize prelu alpha grad #7600

Flowingsun007 commented Feb 25, 2022

MARD1NO Feb 25, 2022

guo-ran Feb 25, 2022

Flowingsun007 Feb 25, 2022

Flowingsun007 Feb 25, 2022

github-actions bot commented Feb 26, 2022

daquexian Feb 26, 2022

Flowingsun007 Feb 26, 2022

daquexian Feb 26, 2022

liujuncheng Feb 26, 2022

Flowingsun007 Feb 26, 2022

github-actions bot commented Feb 26, 2022

		@@ -400,10 +401,12 @@ class GpuPReluGradKernel final : public user_op::OpKernel {
		alpha->dptr<T>(), dy->dptr<T>(), dx->mut_dptr<T>(),
		broadcasted_alpha_diff);

optimize prelu alpha grad #7600

optimize prelu alpha grad #7600

Conversation

Flowingsun007 commented Feb 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 26, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 26, 2022