Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ELU activation #4395

Merged
merged 5 commits into from
Oct 10, 2017
Merged

Conversation

zhouxiao-coder
Copy link
Contributor

Add ELU activation operator, resolve #4364 .
SELU activation is not added to the activation_op.{h cc cu} file, instead it can be trivially implemented with this ELU interface. Check #4364 for more details.

ELUOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of ELU operator");
AddOutput("Y", "Output of ELU operator");
Copy link

@tonyyang-svail tonyyang-svail Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be changed to AddOutput("Y", "Output of ELU operator").NotInGradient();, cause y is not used in ELUGradKernel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonyyang-svail Thanks for pointing that out, you are right.

However, after reexamining the math, I find there is a better way to rewrite the gradient formulation which does require using y. I'm working on it, and this PR will be updated soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
Now the gradient of negative part is computed as follow:
'''
dy * (y + alpha) * (x < static_cast(0)).template cast();
'''
So now value of y is used in the gradient calculation.

self.inputs = {'X': x}
self.attrs = {'alpha': alpha}
self.outputs = {
'Y': np.maximum(0, x) + np.minimum(0, alpha * (np.exp(x) - 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be some non-differentiable point in elu. You can refer to #4120

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reference, I'm a little bit confused by the line x[np.abs(x) < 0.005] = 0.02 before and #4120 explains the motivation.

However, ELU has a quite "smooth" negative part, so this modification may be unnecessary here. In fact, elu(x=-0.005) gives -0.0049875208073176802, which makes the relative numeric gradient error less than 2e-5 and is small enough to pass normal gradient check.
Here is the code I used to experiment.

import matplotlib.pyplot as plt
import numpy as np
def elu(x):
    return np.maximum(0, x) + np.minimum(0, 1. * (np.exp(x) - 1))
x = np.linspace(-0.5, 0.5, num=100)
y = elu(x)
plt.plot(x,y)

figure_1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But on the other hand, I can make another PR if you think we should make this modification x[np.abs(x) < 0.005] = 0.02 a common practice in unit test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, elu is actually very smooth. 0 is a non-differentiable point in elu. We'd better to filter it in out test data.
This line

x[np.abs(x) < 0.005] = 0.02

will filter these potential points and set them to 0.02.(0.02-0.005 = 0.015;0.02+0.005=0.025, both of 0.015 and 0.025 are larger than 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we are talking about the standard elu, i.e. alpha=1, then it is differentiable at point 0 since both left derivate and right derivate at 0 equal to 1. The "piecewise" form of the function only makes it non-differentiable on the derivative function at 0, i.e. the second order derivative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QiJune I didn't add the filtering code to the new commit, since as we discussed it is not necessary. However, I add a comment to explain the reason.

public:
ELUOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of ELU operator");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. fc_op provides a nice reference indeed, and I'll try to follow that style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I didn't squash all my commit because they get interleaved with other's commit. Please check commit 4436ba0 for the new comments. Sorry for the inconvenience

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zhouxiao-coder zhouxiao-coder merged commit 0d017d9 into PaddlePaddle:develop Oct 10, 2017
@zhouxiao-coder zhouxiao-coder deleted the elu-activation branch October 10, 2017 04:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ELU and SELU Operator
6 participants