Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-978] Second order gradient support for some unary operators #14613

Merged
merged 34 commits into from
Jun 10, 2019
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
45e1502
try to add support some ops
sxjscience Oct 14, 2018
904adb4
Merge branch 'higher_order_sample' of https://github.com/sxjscience/m…
apeforest Mar 7, 2019
d5dc994
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest Mar 12, 2019
0e69075
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest Mar 19, 2019
0c7cf98
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest Apr 2, 2019
492e4cd
add unit test for second order grad
apeforest Apr 3, 2019
45b334e
implement grad for relu and add unit test
apeforest Apr 3, 2019
3bbfbac
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest Apr 4, 2019
4dc0907
fix lint
apeforest Apr 5, 2019
c4034b2
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest Apr 5, 2019
3fe54e6
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 16, 2019
76aa6ad
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 16, 2019
8458717
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 21, 2019
f66610b
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 23, 2019
30ff1e9
register FGradient attribute for backward relu
apeforest May 28, 2019
8ecffcc
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 28, 2019
d9ba3da
resolve conflict
apeforest May 28, 2019
1c93c7d
remove unused imports
apeforest May 28, 2019
de721bc
change gradient using set_attr
apeforest May 30, 2019
0ac0942
remove higher order grad test for negative(x)
apeforest May 30, 2019
f8e624e
fix lint
apeforest May 30, 2019
3315124
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 30, 2019
8538980
reverse indent
apeforest May 30, 2019
1ee38b5
remove unused backward operator
apeforest May 30, 2019
c18f317
refactor backward for sin(x) and cos(x)
apeforest May 30, 2019
689cfee
change value init to list init
apeforest May 30, 2019
d56e132
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 30, 2019
2207815
Merge remote-tracking branch 'upstream/master' into develop/higher_or…
apeforest May 31, 2019
0b6c2ef
change to list initialization
apeforest May 31, 2019
31f671f
generate random shape in test
apeforest May 31, 2019
62fcca3
fix a bug in second order backward
apeforest Jun 3, 2019
a0a0e75
fix lint
apeforest Jun 3, 2019
451c4bd
fix lint
apeforest Jun 4, 2019
b9b0c93
address reviewer comment and renaming
apeforest Jun 5, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions src/imperative/imperative.cc
Original file line number Diff line number Diff line change
Expand Up @@ -347,8 +347,9 @@ std::vector<NDArray*> Imperative::Backward(
x_reqs.push_back(info.grad_req);
info.fresh_out_grad = true;
}
CHECK_GT(xs.size(), 0)
<< "There are no inputs in computation graph that require gradients.";
if (xs.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change from CHECK to warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current backward operation requires an operator must have at least one inputs, because the gradient of a constants is always zero. However, the second order of some operators such as relu is actually gradient of a constant (ones or zeros). Therefore we need to support gradient for constant operators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should dive deeper into this one. Does it produce the warning (or early the failure) for some of the test cases?

In the original code I think the intention is to get if there's any input nodes which have gradient attached, I understand your explanation but what I don't see is where would we store the gradient for such constants, is because grad_req of the constant is kNullOp? the constant is just another node right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rootcause is when we do second order gradient of the negative(x) operator. The backward graph of this does not require any input and therefore will trigger this condition. I think if I remove the test for negative(x) then we do not need to modify this.

LOG(WARNING) << "There are no inputs in computation graph that require gradients.";
}
}

Graph g_graph = pass::MXGradient(
Expand Down
12 changes: 11 additions & 1 deletion src/operator/tensor/elemwise_binary_op_basic.cc
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,17 @@ The storage type of ``elemwise_mul`` output depends on storage types of inputs
return std::vector<ResourceRequest>{ResourceRequest::kTempSpace};
})
.add_alias("_mul").add_alias("_Mul")
.set_attr<nnvm::FGradient>("FGradient", ElemwiseGradUseIn{"_backward_mul"});
.set_attr<nnvm::FGradient>("FGradient",
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
auto lhs_grad = MakeNode("elemwise_mul", n->attrs.name + "_backward_lhs",
{ograds[0], n->inputs[1]}, nullptr, &n);
auto rhs_grad = MakeNode("elemwise_mul", n->attrs.name + "_backward_rhs",
{ograds[0], n->inputs[0]}, nullptr, &n);
std::vector<nnvm::NodeEntry> ret;
ret.emplace_back(nnvm::NodeEntry{lhs_grad, 0, 0});
Copy link
Contributor

@larroy larroy May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify as per #14095 ?

ret.emplace_back(MakeNode(...));

ret.emplace_back(nnvm::NodeEntry{rhs_grad, 0, 0});
return ret;
});

NNVM_REGISTER_OP(_backward_mul)
.set_num_inputs(3)
Expand Down
18 changes: 16 additions & 2 deletions src/operator/tensor/elemwise_unary_op_basic.cc
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,15 @@ The storage type of ``relu`` output depends upon the input storage type:
- relu(csr) = csr

)code" ADD_FILELINE)
.set_attr<nnvm::FGradient>("FGradient", ElemwiseGradUseOut{"_backward_relu"});
.set_attr<nnvm::FGradient>("FGradient",
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
auto zero_node = MakeNode("zeros_like", n->attrs.name + "_relu_backward", {n->inputs[0]}, nullptr, &n);
auto x_grad = MakeNode("_greater", n->attrs.name + "_mid_x_grad", {n->inputs[0], nnvm::NodeEntry{zero_node, 0, 0}}, nullptr, &n);
auto in_grad = MakeNode("elemwise_mul", n->attrs.name + "_backward", {ograds[0], nnvm::NodeEntry{x_grad, 0 , 0}}, nullptr, &n);
std::vector<nnvm::NodeEntry> ret;
ret.emplace_back(nnvm::NodeEntry{in_grad, 0, 0});
return ret;
});
Copy link
Contributor

@larroy larroy May 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should measure if this causes regressions as we discussed, otherwise we should add FGradient to _backward_relu.

I think same applies for other functions.


MXNET_OPERATOR_REGISTER_BINARY_WITH_SPARSE_CPU(_backward_relu,
unary_bwd<mshadow_op::relu_grad>);
Expand Down Expand Up @@ -648,7 +656,13 @@ The storage type of ``negative`` output depends upon the input storage type:
- negative(csr) = csr

)code")
.set_attr<nnvm::FGradient>("FGradient", ElemwiseGradUseNone{"negative"});
.set_attr<nnvm::FGradient>("FGradient",
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
auto in_grad = MakeNode("negative", n->attrs.name + "_backward", {ograds[0]}, nullptr, &n);
std::vector<nnvm::NodeEntry> ret;
ret.emplace_back(nnvm::NodeEntry{in_grad, 0, 0});
return ret;
});

// reciprocal
MXNET_OPERATOR_REGISTER_UNARY(reciprocal)
Expand Down
22 changes: 20 additions & 2 deletions src/operator/tensor/elemwise_unary_op_trig.cc
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,15 @@ The storage type of ``sin`` output depends upon the input storage type:
- sin(csr) = csr

)code" ADD_FILELINE)
.set_attr<nnvm::FGradient>("FGradient", ElemwiseGradUseIn{ "_backward_sin" });
.set_attr<nnvm::FGradient>("FGradient",
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
auto x_grad = MakeNode("cos", n->attrs.name + "_mid_x_grad", {n->inputs[0]}, nullptr, &n);
auto in_grad = MakeNode("elemwise_mul", n->attrs.name + "_backward",
{ograds[0], nnvm::NodeEntry{x_grad, 0, 0}}, nullptr, &n);
std::vector<nnvm::NodeEntry> ret;
ret.emplace_back(nnvm::NodeEntry{in_grad, 0, 0});
return ret;
});

MXNET_OPERATOR_REGISTER_BINARY_WITH_SPARSE_CPU_DR(_backward_sin, unary_bwd<mshadow_op::sin_grad>);

Expand All @@ -61,7 +69,17 @@ The input should be in radians (:math:`2\pi` rad equals 360 degrees).
The storage type of ``cos`` output is always dense

)code" ADD_FILELINE)
.set_attr<nnvm::FGradient>("FGradient", ElemwiseGradUseIn{"_backward_cos"});
.set_attr<nnvm::FGradient>("FGradient",
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
auto x_grad = MakeNode("sin", n->attrs.name + "_mid_x_grad", {n->inputs[0]}, nullptr, &n);
auto neg_x_grad = MakeNode("negative", n->attrs.name + "_mid_neg_x_grad",
{nnvm::NodeEntry{x_grad, 0, 0}}, nullptr, &n);
auto in_grad = MakeNode("elemwise_mul", n->attrs.name + "_backward",
{ograds[0], nnvm::NodeEntry{neg_x_grad, 0, 0}}, nullptr, &n);
std::vector<nnvm::NodeEntry> ret;
ret.emplace_back(nnvm::NodeEntry{in_grad, 0, 0});
return ret;
});

MXNET_OPERATOR_REGISTER_BINARY_WITH_SPARSE_CPU(_backward_cos, unary_bwd<mshadow_op::cos_grad>);

Expand Down
89 changes: 89 additions & 0 deletions tests/python/unittest/test_higher_order_grad.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

import mxnet as mx
import numpy as np
from mxnet import gluon, nd, autograd
from mxnet.test_utils import assert_almost_equal
from tests.python.unittest.common import with_seed


@with_seed()
def test_elemwise_mul():
x = nd.array([1, 2, 3])
y = nd.zeros(3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this y?

x.attach_grad()
with autograd.record():
y = nd.elemwise_mul(x, x)
y_grad = autograd.grad(y, x, create_graph=True, retain_graph=True)[0]
y_grad.backward()
expect_grad = nd.array([2, 2, 2])
assert_almost_equal(expect_grad.asnumpy(), x.grad.asnumpy())


@with_seed()
def test_sin():
def sin(x):
return nd.sin(x)

x = nd.array([1, 2, 3])
expect_grad = -nd.sin(x)
check_second_order_unary(x, sin, expect_grad)


@with_seed()
def test_cos():
def cos(x):
return nd.cos(x)

x = nd.array([1, 2, 3])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we randomize the test arrays with random_arrays and rand_shape_2d

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for second order not using random inputs helps reason about the gradient result...

expect_grad = -nd.cos(x)
check_second_order_unary(x, cos, expect_grad)


@with_seed()
def test_negative():
def negative(x):
return nd.negative(x)

x = nd.array([1, 2, 3])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above and for rest of the tests

expect_grad = nd.zeros_like(x)
check_second_order_unary(x, negative, expect_grad)


@with_seed()
def test_relu():
def relu(x):
return nd.relu(x)

x = nd.array([1, 2, 3])
expect_grad = nd.zeros_like(x)
check_second_order_unary(x, relu, expect_grad)


def check_second_order_unary(x, op, expect_grad):
x.attach_grad()
with autograd.record():
y = op(x)
y_grad = autograd.grad(y, x, create_graph=True, retain_graph=True)[0]
y_grad.backward()
assert_almost_equal(expect_grad.asnumpy(), x.grad.asnumpy())


if __name__ == '__main__':
import nose
nose.runmodule()