ELU activation #4395

zhouxiao-coder · 2017-09-26T14:40:41Z

Add ELU activation operator, resolve #4364 .
SELU activation is not added to the activation_op.{h cc cu} file, instead it can be trivially implemented with this ELU interface. Check #4364 for more details.

tonyyang-svail · 2017-09-28T00:13:26Z

paddle/operators/activation_op.cc

+ ELUOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
+ : OpProtoAndCheckerMaker(proto, op_checker) {
+ AddInput("X", "Input of ELU operator");
+ AddOutput("Y", "Output of ELU operator");


it could be changed to AddOutput("Y", "Output of ELU operator").NotInGradient();, cause y is not used in ELUGradKernel.

@tonyyang-svail Thanks for pointing that out, you are right.

However, after reexamining the math, I find there is a better way to rewrite the gradient formulation which does require using y. I'm working on it, and this PR will be updated soon.

Done.
Now the gradient of negative part is computed as follow:
'''
dy * (y + alpha) * (x < static_cast(0)).template cast();
'''
So now value of y is used in the gradient calculation.

QiJune · 2017-09-28T05:02:00Z

python/paddle/v2/framework/tests/test_activation_op.py

+ self.inputs = {'X': x}
+ self.attrs = {'alpha': alpha}
+ self.outputs = {
+ 'Y': np.maximum(0, x) + np.minimum(0, alpha * (np.exp(x) - 1))


There may be some non-differentiable point in elu. You can refer to #4120

Thanks for the reference, I'm a little bit confused by the line x[np.abs(x) < 0.005] = 0.02 before and #4120 explains the motivation.

However, ELU has a quite "smooth" negative part, so this modification may be unnecessary here. In fact, elu(x=-0.005) gives -0.0049875208073176802, which makes the relative numeric gradient error less than 2e-5 and is small enough to pass normal gradient check.
Here is the code I used to experiment.

import matplotlib.pyplot as plt import numpy as np def elu(x): return np.maximum(0, x) + np.minimum(0, 1. * (np.exp(x) - 1)) x = np.linspace(-0.5, 0.5, num=100) y = elu(x) plt.plot(x,y)

But on the other hand, I can make another PR if you think we should make this modification x[np.abs(x) < 0.005] = 0.02 a common practice in unit test.

Yes, elu is actually very smooth. 0 is a non-differentiable point in elu. We'd better to filter it in out test data.
This line

x[np.abs(x) < 0.005] = 0.02

will filter these potential points and set them to 0.02.(0.02-0.005 = 0.015;0.02+0.005=0.025, both of 0.015 and 0.025 are larger than 0)

As long as we are talking about the standard elu, i.e. alpha=1, then it is differentiable at point 0 since both left derivate and right derivate at 0 equal to 1. The "piecewise" form of the function only makes it non-differentiable on the derivative function at 0, i.e. the second order derivative.

@QiJune I didn't add the filtering code to the new commit, since as we discussed it is not necessary. However, I add a comment to explain the reason.

lcy-seso · 2017-09-28T05:03:43Z

paddle/operators/activation_op.cc

+ public:
+ ELUOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
+ : OpProtoAndCheckerMaker(proto, op_checker) {
+ AddInput("X", "Input of ELU operator");


Please follow our comment style https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/name_convention.md#opprotomaker-names

I think fc_op has a good comment style.

I see. fc_op provides a nice reference indeed, and I'll try to follow that style.

Done. I didn't squash all my commit because they get interleaved with other's commit. Please check commit 4436ba0 for the new comments. Sorry for the inconvenience

qingqing01

LGTM.

zhouxiao-coder requested a review from qingqing01 September 26, 2017 14:41

tonyyang-svail reviewed Sep 28, 2017

View reviewed changes

QiJune reviewed Sep 28, 2017

View reviewed changes

lcy-seso reviewed Sep 28, 2017

View reviewed changes

Xreki added the OpPorting label Sep 28, 2017

zhouxiao-coder added 3 commits September 29, 2017 17:29

elu: Optimize gradient calculation;Add more comments

a815d6a

elu: Optimize gradient calculation;Add more comments

4436ba0

update to latest

601e231

zhouxiao-coder force-pushed the elu-activation branch from c2a8434 to 601e231 Compare September 29, 2017 10:14

zhouxiao-coder added 2 commits October 9, 2017 17:06

reslove merge conflict;reimplement ELU activation with functor

53574e5

update to latest

e642124

qingqing01 approved these changes Oct 10, 2017

View reviewed changes

zhouxiao-coder merged commit 0d017d9 into PaddlePaddle:develop Oct 10, 2017

zhouxiao-coder deleted the elu-activation branch October 10, 2017 04:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ELU activation #4395

ELU activation #4395

zhouxiao-coder commented Sep 26, 2017

tonyyang-svail Sep 28, 2017 •

edited

Loading

zhouxiao-coder Sep 28, 2017

zhouxiao-coder Sep 29, 2017

QiJune Sep 28, 2017

zhouxiao-coder Sep 28, 2017

zhouxiao-coder Sep 28, 2017

QiJune Sep 28, 2017

zhouxiao-coder Sep 29, 2017

zhouxiao-coder Sep 29, 2017

lcy-seso Sep 28, 2017

zhouxiao-coder Sep 28, 2017

zhouxiao-coder Sep 29, 2017

qingqing01 left a comment

ELU activation #4395

ELU activation #4395

Conversation

zhouxiao-coder commented Sep 26, 2017

tonyyang-svail Sep 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

tonyyang-svail Sep 28, 2017 •

edited

Loading