Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-978] Second order gradient support for some unary operators #14613

Merged
merged 34 commits into from
Jun 10, 2019
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
45e1502
try to add support some ops
sxjscience Oct 14, 2018
904adb4
Merge branch 'higher_order_sample' of https://github.com/sxjscience/m…
apeforest Mar 7, 2019
d5dc994
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest Mar 12, 2019
0e69075
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest Mar 19, 2019
0c7cf98
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest Apr 2, 2019
492e4cd
add unit test for second order grad
apeforest Apr 3, 2019
45b334e
implement grad for relu and add unit test
apeforest Apr 3, 2019
3bbfbac
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest Apr 4, 2019
4dc0907
fix lint
apeforest Apr 5, 2019
c4034b2
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest Apr 5, 2019
3fe54e6
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 16, 2019
76aa6ad
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 16, 2019
8458717
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 21, 2019
f66610b
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 23, 2019
30ff1e9
register FGradient attribute for backward relu
apeforest May 28, 2019
8ecffcc
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 28, 2019
d9ba3da
resolve conflict
apeforest May 28, 2019
1c93c7d
remove unused imports
apeforest May 28, 2019
de721bc
change gradient using set_attr
apeforest May 30, 2019
0ac0942
remove higher order grad test for negative(x)
apeforest May 30, 2019
f8e624e
fix lint
apeforest May 30, 2019
3315124
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 30, 2019
8538980
reverse indent
apeforest May 30, 2019
1ee38b5
remove unused backward operator
apeforest May 30, 2019
c18f317
refactor backward for sin(x) and cos(x)
apeforest May 30, 2019
689cfee
change value init to list init
apeforest May 30, 2019
d56e132
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 30, 2019
2207815
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 31, 2019
0b6c2ef
change to list initialization
apeforest May 31, 2019
31f671f
generate random shape in test
apeforest May 31, 2019
62fcca3
fix a bug in second order backward
apeforest Jun 3, 2019
a0a0e75
fix lint
apeforest Jun 3, 2019
451c4bd
fix lint
apeforest Jun 4, 2019
b9b0c93
address reviewer comment and renaming
apeforest Jun 5, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions src/operator/tensor/elemwise_unary_op_basic.cc
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,20 @@ The storage type of ``relu`` output depends upon the input storage type:
)code" ADD_FILELINE)
.set_attr<nnvm::FGradient>("FGradient", ElemwiseGradUseOut{"_backward_relu"});

MXNET_OPERATOR_REGISTER_BINARY_WITH_SPARSE_CPU(_backward_relu,
unary_bwd<mshadow_op::relu_grad>);
MXNET_OPERATOR_REGISTER_BINARY_WITH_SPARSE_CPU(_backward_relu, unary_bwd<mshadow_op::relu_grad>)
.set_attr<nnvm::FGradient>("FGradient",
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
std::vector<nnvm::NodeEntry> ret;
// f(x) -> f = relu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be having these comments? couldn't it be included as part of 'R"code'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hidden operator so user do not see this.

// f'(x) = 1 if x > 0 else 0
// f''(x) = 0
auto gx = nnvm::NodeEntry{n}; // f'(x)
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad",
{ograds[0], gx}, nullptr, &n));
Copy link
Contributor

@kshitij12345 kshitij12345 Jun 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to what you have done below for sin and cos.

gx is actually f'(x) * {head_grads/output_gradient}.
It should actually only be f'(x).

Explanation : gx = f'(x) * head_grads.
Therefore, gx w.r.t. f'(x) = head_grads,
Similarly gx w.r.t. head_grads = f'(x).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

ret.emplace_back(MakeNode("zeros_like", n->attrs.name + "_backward_grad_grad_in",
{gx}, nullptr, &n));
return ret;
});

// sigmoid
MXNET_OPERATOR_REGISTER_UNARY(sigmoid)
Expand Down
43 changes: 41 additions & 2 deletions src/operator/tensor/elemwise_unary_op_trig.cc
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,26 @@ The storage type of ``sin`` output depends upon the input storage type:
)code" ADD_FILELINE)
.set_attr<nnvm::FGradient>("FGradient", ElemwiseGradUseIn{ "_backward_sin" });

MXNET_OPERATOR_REGISTER_BINARY_WITH_SPARSE_CPU_DR(_backward_sin, unary_bwd<mshadow_op::sin_grad>);
MXNET_OPERATOR_REGISTER_BINARY_WITH_SPARSE_CPU_DR(_backward_sin, unary_bwd<mshadow_op::sin_grad>)
.set_attr<nnvm::FGradient>("FGradient",
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
// f(x) = sin(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hidden operator so user do not see this.

// f'(x) = cos(x)
// f''(x) = -sin(x)
auto grad_x = nnvm::NodeEntry(n);
auto grad_grad_x_mid = MakeNode("sin", n->attrs.name + "_mid_grad_grad",
{n->inputs[1]}, nullptr, &n);
auto grad_grad_x = MakeNode("negative", n->attrs.name + "_backward_grad_grad",
{nnvm::NodeEntry(grad_grad_x_mid)}, nullptr, &n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List initialisation. Otherwise LGTM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

std::vector<nnvm::NodeEntry> ret;
// for the backward of the _backward_sin node
// first input is the ograd and second input is x (because ElemwiseUseIn)
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad",
{ograds[0], grad_x}, nullptr, &n));
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad_in",
{ograds[0], nnvm::NodeEntry(grad_grad_x)}, nullptr, &n));
return ret;
});

// cos
MXNET_OPERATOR_REGISTER_UNARY_WITH_SPARSE_DR(cos, cpu, mshadow_op::cos)
Expand All @@ -63,7 +82,27 @@ The storage type of ``cos`` output is always dense
)code" ADD_FILELINE)
.set_attr<nnvm::FGradient>("FGradient", ElemwiseGradUseIn{"_backward_cos"});

MXNET_OPERATOR_REGISTER_BINARY_WITH_SPARSE_CPU(_backward_cos, unary_bwd<mshadow_op::cos_grad>);
MXNET_OPERATOR_REGISTER_BINARY_WITH_SPARSE_CPU(_backward_cos, unary_bwd<mshadow_op::cos_grad>)
.set_attr<nnvm::FGradient>("FGradient",
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
// f(x) = cos(x)
// f'(x) = -sin(x)
// f''(x) = -cos(x)
auto grad_x = nnvm::NodeEntry(n);
auto grad_grad_x_mid = MakeNode("cos", n->attrs.name + "_mid_grad_grad",
{n->inputs[1]}, nullptr, &n);
auto grad_grad_x = MakeNode("negative", n->attrs.name + "_backward_grad_grad",
{nnvm::NodeEntry(grad_grad_x_mid)}, nullptr, &n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

std::vector<nnvm::NodeEntry> ret;
// for the backward of the _backward_cos node
// first input is the ograd and second input is x (because ElemwiseUseIn)
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad",
{ograds[0], grad_x}, nullptr, &n));
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad_in",
{ograds[0], nnvm::NodeEntry(grad_grad_x)}, nullptr, &n));
Copy link
Contributor

@larroy larroy May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can move the shared_ptrs inside as you are not using them anymore. If CI passes it's ok I can refactor it myself later. I wouldn't do another CI round for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the NodeEntry constructor already doing the std::move of shared_ptr? I thought declaring a separate variable in this scope increases readability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument is by value, so it needs to be an rvalue if you want it moved two times, otherwise is copied, then moved. So is best:

  ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad_in",
                            {ograds[0], nnvm::NodeEntry(std::move(grad_grad_x))}, nullptr, &n));

Having a shared ptr by value then move into a class seem to be the accepted idiom nowadays. Since it allows both move and copy semantics, depending if the caller passes an rvalue or not.

return ret;
});


// tan
MXNET_OPERATOR_REGISTER_UNARY_WITH_RSP_CSR(tan, cpu, mshadow_op::tan)
Expand Down
41 changes: 40 additions & 1 deletion tests/python/unittest/test_higher_order_grad.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,52 @@
# specific language governing permissions and limitations
# under the License.

import math

import math
from mxnet import nd, autograd
from mxnet.test_utils import assert_almost_equal, random_arrays
from common import with_seed


@with_seed()
def test_sin():
def sin(x):
return nd.sin(x)

def grad_grad_op(x):
return -nd.sin(x)

arrays = random_arrays((2, 2), (2, 3), (4, 5, 2), (3, 1, 4, 5))
for array in arrays:
check_second_order_unary(array, sin, grad_grad_op)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think check_second_order_unary function should be moved to python/mxnet/test_utils.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only used in this test. If we add a different test file then it makes sense as you suggested.



@with_seed()
def test_cos():
def cos(x):
return nd.cos(x)

def grad_grad_op(x):
return -nd.cos(x)

arrays = random_arrays((2, 2), (2, 3), (4, 5, 2), (3, 1, 4, 5))
for array in arrays:
check_second_order_unary(array, cos, grad_grad_op)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these check_second_order_unary checks be changes to Nth order?

Copy link
Contributor Author

@apeforest apeforest Jun 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is only to verify second order gradient. Can we add test for Nth order gradient in a separate PR?



@with_seed()
def test_relu():
def relu(x):
return nd.relu(x)

def grad_grad_op(x):
return nd.zeros_like(x)

arrays = random_arrays((2, 2), (2, 3), (4, 5, 2), (3, 1, 4, 5))
Copy link
Member

@anirudhacharya anirudhacharya May 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we test for 1-d arrays?

not sure if it is needed here, but there is this to randomize the shape of an array - https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/test_utils.py#L418

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestion. updated.

for array in arrays:
check_second_order_unary(array, relu, grad_grad_op)


@with_seed()
def test_log():
def log(x):
Expand Down