-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize prelu alpha grad #7600
Conversation
oneflow/user/kernels/prelu_kernel.cu
Outdated
@@ -400,10 +401,12 @@ class GpuPReluGradKernel final : public user_op::OpKernel { | |||
alpha->dptr<T>(), dy->dptr<T>(), dx->mut_dptr<T>(), | |||
broadcasted_alpha_diff); | |||
} | |||
NdarrayUtil<DeviceType::kCUDA, T>::ReduceSum( | |||
if(alpha_requires_grad){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个if应该放到386行,并且如果alpha不需要grad的话,底下不需要申请tmp_buffer
oneflow/user/kernels/prelu_kernel.cu
Outdated
@@ -400,10 +401,12 @@ class GpuPReluGradKernel final : public user_op::OpKernel { | |||
alpha->dptr<T>(), dy->dptr<T>(), dx->mut_dptr<T>(), | |||
broadcasted_alpha_diff); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
broadcasted_alpha_diff 也不应该算,可以在cuda kernel中处理一下不写broadcasted_alpha_diff的情况
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally. |
@@ -4633,6 +4633,9 @@ def OneFlow_PreluGradOp : OneFlow_BaseOp<"prelu_grad", [NoSideEffect, DeclareOpI | |||
OneFlow_Tensor:$dx, | |||
OneFlow_Tensor:$alpha_diff | |||
); | |||
let attrs = (ins | |||
DefaultValuedAttr<BoolAttr, "false">:$alpha_requires_grad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是说 alpha_requires_grad 默认是 false 吗,感觉它不应该有默认值
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emmm,感觉默认为false比较符合直觉?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉没有很强的理由默认选择 false,是不是要求用户显式传入会好一些
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉没有很强的理由默认选择 false,是不是要求用户显式传入会好一些
同意,而且有默认也应该是true,因为这里应为false但是没设置导致默认为true,只对性能有影响,反过来影响正确性
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉没有很强的理由默认选择 false,是不是要求用户显式传入会好一些
好的,那我改成true吧(这个是可以显示传入的,如果alpha的requires_grad=True/False就显示传入了
Speed stats:
|
* optimize prelu alpha grad * refine * refine * refine
No description provided.