-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
Hey @Adnios , Thanks for submitting the PR
CI supported jobs: [edge, centos-gpu, sanity, unix-gpu, windows-gpu, miscellaneous, unix-cpu, centos-cpu, clang, windows-cpu, website] Note: |
@mxnet-bot run ci [unix-gpu] |
Jenkins CI successfully triggered : [unix-gpu] |
@mxnet-bot run ci [unix-gpu] |
Jenkins CI successfully triggered : [unix-gpu] |
@szha @bartekkuncer @bgawrych can you help preview review? |
@@ -161,10 +161,34 @@ The storage type of ``log_sigmoid`` output is always dense | |||
|
|||
)code" ADD_FILELINE) | |||
.set_attr<FCompute>("FCompute<cpu>", UnaryOp::Compute<cpu, mshadow_op::log_sigmoid>) | |||
.set_attr<nnvm::FGradient>("FGradient", ElemwiseGradUseIn{"_backward_log_sigmoid"}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous version looks correct with the ElemwiseGradUseIn
which makes the input to the gradient function the input of the elementwise function. Could you elaborate on in which cases this would fail and why you need to change it to ElemwiseGradUseOut and the definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how scalar array would trigger the problem yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @szha .
The reason of "scalar array would trigger the problem" is:
https://github.com/apache/incubator-mxnet/blob/835e25031f847b80277b6d11db0519723d26a80a/src/operator/nn/activation.cc#L126-L140
- If the input
x
is scalar array,SupportMKLDNNAct(param, inputs[0])
willreturn false
. - If the input
x
is vector array,SupportMKLDNNAct(param, inputs[0])
willreturn true
. And the functionMKLDNNActivationBackward
will make it work. Maybe the following code take effect.
https://github.com/apache/incubator-mxnet/blob/835e25031f847b80277b6d11db0519723d26a80a/src/operator/nn/mkldnn/mkldnn_act.cc#L266-L272
There are 2 solutions to make scalar array input to work.
- The input of
log_sigmoid_grad
should bey
. So we can modify the following code which takesx
as its input. This is also what I am doing in this pr.
https://github.com/apache/incubator-mxnet/blob/835e25031f847b80277b6d11db0519723d26a80a/src/operator/mshadow_op.h#L416MXNET_UNARY_MATH_OP(log_sigmoid_grad, 1.0f - math::exp(a));
- Since
log_sigmoid_grad
takesx
as input, we can also change the following code. Now it will makex
as input tolog_sigmoid_grad
.
https://github.com/apache/incubator-mxnet/blob/835e25031f847b80277b6d11db0519723d26a80a/src/operator/nn/activation-inl.h#L207-L210case activation::kLogSigmoid: ActivationBackward<xpu, mshadow_op::log_sigmoid, mshadow_op::log_sigmoid_grad>( ctx, inputs[0], inputs[2], req[0], outputs[0]); break;
I think solution_1 is better. For y = log_sigmoid(x)
, it calculates dx based on (dy, y)
instead of (dy, x)
which enables inplace operation during y = log_simoid(x)
(i.e. y and x shares the same memory).
Another problem arose when I adopted the solution_1. The gradient of sym.log_sigmoid()
will be wrong. The reason of this problem is that the input of _backward_log_sigmoid
is x
. When I adopt the solution_1, the input of _backward_log_sigmoid
should be y
. The source code of sym.log_sigmoid()
is the following.
https://github.com/apache/incubator-mxnet/blob/835e25031f847b80277b6d11db0519723d26a80a/src/operator/tensor/elemwise_unary_op_basic.cc#L152-L167
So, I change it to ElemwiseGradUseOut
in reference of the source code of sym.sigmoid()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed analysis. The proposed change looks good and I have no further concern.
Description
Solve #20371
Checklist
Essentials
Comments
When the input's shape=(), the backward log_sigmoid activation is incorrect
y = log_sigmoid(x)
, the inputs forlog_sigmoid_grad
is(dy, y)
. This problem is similar to _backward_softsign activation is incorrect #10868There is an error when run the log_sigmoid in gpu.
cudnn_activation-inl.h
https://github.com/apache/incubator-mxnet/blob/da4ff3a4dc0bd6a54af3d75c492021d18ba1867b/src/operator/nn/cudnn/cudnn_activation-inl.h#L48-L65