-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ELU activation #4395
ELU activation #4395
Conversation
paddle/operators/activation_op.cc
Outdated
ELUOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) | ||
: OpProtoAndCheckerMaker(proto, op_checker) { | ||
AddInput("X", "Input of ELU operator"); | ||
AddOutput("Y", "Output of ELU operator"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could be changed to AddOutput("Y", "Output of ELU operator").NotInGradient();
, cause y is not used in ELUGradKernel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tonyyang-svail Thanks for pointing that out, you are right.
However, after reexamining the math, I find there is a better way to rewrite the gradient formulation which does require using y
. I'm working on it, and this PR will be updated soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Now the gradient of negative part is computed as follow:
'''
dy * (y + alpha) * (x < static_cast(0)).template cast();
'''
So now value of y
is used in the gradient calculation.
self.inputs = {'X': x} | ||
self.attrs = {'alpha': alpha} | ||
self.outputs = { | ||
'Y': np.maximum(0, x) + np.minimum(0, alpha * (np.exp(x) - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be some non-differentiable point in elu. You can refer to #4120
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reference, I'm a little bit confused by the line x[np.abs(x) < 0.005] = 0.02
before and #4120 explains the motivation.
However, ELU has a quite "smooth" negative part, so this modification may be unnecessary here. In fact, elu(x=-0.005)
gives -0.0049875208073176802, which makes the relative numeric gradient error less than 2e-5 and is small enough to pass normal gradient check.
Here is the code I used to experiment.
import matplotlib.pyplot as plt
import numpy as np
def elu(x):
return np.maximum(0, x) + np.minimum(0, 1. * (np.exp(x) - 1))
x = np.linspace(-0.5, 0.5, num=100)
y = elu(x)
plt.plot(x,y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But on the other hand, I can make another PR if you think we should make this modification x[np.abs(x) < 0.005] = 0.02
a common practice in unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, elu is actually very smooth. 0 is a non-differentiable point in elu. We'd better to filter it in out test data.
This line
x[np.abs(x) < 0.005] = 0.02
will filter these potential points and set them to 0.02.(0.02-0.005 = 0.015;0.02+0.005=0.025, both of 0.015 and 0.025 are larger than 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as we are talking about the standard elu
, i.e. alpha=1, then it is differentiable at point 0 since both left derivate and right derivate at 0 equal to 1. The "piecewise" form of the function only makes it non-differentiable on the derivative function at 0, i.e. the second order derivative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@QiJune I didn't add the filtering code to the new commit, since as we discussed it is not necessary. However, I add a comment to explain the reason.
paddle/operators/activation_op.cc
Outdated
public: | ||
ELUOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) | ||
: OpProtoAndCheckerMaker(proto, op_checker) { | ||
AddInput("X", "Input of ELU operator"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow our comment style https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/name_convention.md#opprotomaker-names
I think fc_op has a good comment style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. fc_op
provides a nice reference indeed, and I'll try to follow that style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I didn't squash all my commit because they get interleaved with other's commit. Please check commit 4436ba0 for the new comments. Sorry for the inconvenience
c2a8434
to
601e231
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Add ELU activation operator, resolve #4364 .
SELU activation is not added to the activation_op.{h cc cu} file, instead it can be trivially implemented with this ELU interface. Check #4364 for more details.