[MXNET-978] Second order gradient support for some unary operators #14613

apeforest · 2019-04-04T00:01:34Z

Description

This PR is to support higher order gradient in some basic unary operators. Thanks to @sxjscience, this PR is a continued work of #12821.

This PR adds additional support for relu operator and fixed the issue of negative operator support in #12821. In addition, I added unit tests for to test higher order gradient.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

higher order gradient for a set of unary operators
unit test to test higher order gradient

…xnet into develop/higher_order_grad

…der_grad

larroy

Nice! LGTM

larroy · 2019-04-04T00:40:31Z

src/imperative/imperative.cc

@@ -347,8 +347,9 @@ std::vector<NDArray*> Imperative::Backward(
 x_reqs.push_back(info.grad_req);
 info.fresh_out_grad = true;
 }
- CHECK_GT(xs.size(), 0)
- << "There are no inputs in computation graph that require gradients.";
+ if (xs.empty()) {


Why change from CHECK to warning?

Current backward operation requires an operator must have at least one inputs, because the gradient of a constants is always zero. However, the second order of some operators such as relu is actually gradient of a constant (ones or zeros). Therefore we need to support gradient for constant operators.

I think we should dive deeper into this one. Does it produce the warning (or early the failure) for some of the test cases?

In the original code I think the intention is to get if there's any input nodes which have gradient attached, I understand your explanation but what I don't see is where would we store the gradient for such constants, is because grad_req of the constant is kNullOp? the constant is just another node right?

The rootcause is when we do second order gradient of the negative(x) operator. The backward graph of this does not require any input and therefore will trigger this condition. I think if I remove the test for negative(x) then we do not need to modify this.

…der_grad

piyushghai · 2019-04-04T17:24:37Z

Thanks for your contributions @apeforest. Can you also look into the CI failures ?

@mxnet-label-bot Add [Operator, pr-awaiting-review, Backend].

apeforest · 2019-04-05T17:18:06Z

@sxjscience Please help to review

…der_grad

sxjscience · 2019-04-07T13:39:11Z

Looks good. However, the test has not passed and we need to check what caused the error. Also, we should later add a testing utility function that checks the higher gradient by numerical differentiation.

Roshrini · 2019-04-16T23:03:34Z

@apeforest Thanks for working on this. Can you look into build failures?

larroy · 2019-04-24T01:33:26Z

tests/python/unittest/test_higher_order_grad.py

+@with_seed()
+def test_elemwise_mul():
+ x = nd.array([1, 2, 3])
+ y = nd.zeros(3)


Do we need this y?

anirudhacharya · 2019-05-02T20:00:51Z

tests/python/unittest/test_higher_order_grad.py

+ def cos(x):
+ return nd.cos(x)
+
+ x = nd.array([1, 2, 3])


can we randomize the test arrays with random_arrays and rand_shape_2d

I think for second order not using random inputs helps reason about the gradient result...

anirudhacharya · 2019-05-02T20:01:11Z

tests/python/unittest/test_higher_order_grad.py

+ def negative(x):
+ return nd.negative(x)
+
+ x = nd.array([1, 2, 3])


same as above and for rest of the tests

vandanavk · 2019-05-09T06:01:52Z

@apeforest could you have a look at the CI failures

…der_grad

kshitij12345 · 2019-05-31T17:43:55Z

src/operator/tensor/elemwise_unary_op_trig.cc

+ auto grad_grad_x_mid = MakeNode("cos", n->attrs.name + "_mid_grad_grad",
+ {n->inputs[1]}, nullptr, &n);
+ auto grad_grad_x = MakeNode("negative", n->attrs.name + "_backward_grad_grad",
+ {nnvm::NodeEntry(grad_grad_x_mid)}, nullptr, &n);


anirudhacharya · 2019-05-31T17:58:21Z

src/operator/tensor/elemwise_unary_op_basic.cc

+.set_attr<nnvm::FGradient>("FGradient",
+ [](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
+ std::vector<nnvm::NodeEntry> ret;
+ // f(x) -> f = relu


should we be having these comments? couldn't it be included as part of 'R"code'

This is a hidden operator so user do not see this.

anirudhacharya · 2019-05-31T17:58:52Z

src/operator/tensor/elemwise_unary_op_trig.cc

+MXNET_OPERATOR_REGISTER_BINARY_WITH_SPARSE_CPU_DR(_backward_sin, unary_bwd<mshadow_op::sin_grad>)
+.set_attr<nnvm::FGradient>("FGradient",
+ [](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
+ // f(x) = sin(x)


same as above

This is a hidden operator so user do not see this.

anirudhacharya · 2019-05-31T18:10:02Z

tests/python/unittest/test_higher_order_grad.py

+
+ arrays = random_arrays((2, 2), (2, 3), (4, 5, 2), (3, 1, 4, 5))
+ for array in arrays:
+ check_second_order_unary(array, sin, grad_grad_op)


I think check_second_order_unary function should be moved to python/mxnet/test_utils.py

It's only used in this test. If we add a different test file then it makes sense as you suggested.

anirudhacharya · 2019-05-31T18:20:12Z

tests/python/unittest/test_higher_order_grad.py

+ def grad_grad_op(x):
+ return nd.zeros_like(x)
+
+ arrays = random_arrays((2, 2), (2, 3), (4, 5, 2), (3, 1, 4, 5))


shouldn't we test for 1-d arrays?

not sure if it is needed here, but there is this to randomize the shape of an array - https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/test_utils.py#L418

thanks for the suggestion. updated.

anirudhacharya

everything else lgtm

apeforest · 2019-05-31T21:51:29Z

@kshitij12345 gentle ping again :) please approve it there is no other concern. thanks

kshitij12345 · 2019-06-01T07:13:17Z

@apeforest As mentioned in the #14992 for log , ~~I guess the check will fail with given script.~~
Sorry for missing this earlier.
I have checked that it fails with

from mxnet import nd, autograd
import numpy
import math
grad_grad_op = lambda x: -nd.sin(x) # -nd.cos(x)

x = nd.random.normal(0,1,(3,3))
x.attach_grad()
with autograd.record():
  y = nd.sin(x) # nd.cos(x)
  y_grad = autograd.grad(y, x, head_grads= nd.ones_like(y) * 0.5, create_graph=True, retain_graph=True)[0]
y_grad.backward(nd.ones_like(y_grad) * 0.6)

numpy.testing.assert_allclose(x.grad.asnumpy() , ( grad_grad_op(x) * 0.5 * 0.6).asnumpy(), rtol=1e-7, atol=1e-7)

As the grad from upper layer is not preserved for sin and cos.

apeforest · 2019-06-03T22:41:40Z

@kshitij12345 Thanks for catching this bug. I have updated my sin and cos implementation with comments to help better understanding the mathematics behind second order gradient calculation. Please help to review again.

apeforest · 2019-06-04T17:02:55Z

@kshitij12345 could you please take a look at the PR again. Thanks

kshitij12345 · 2019-06-04T17:41:43Z

src/operator/tensor/elemwise_unary_op_basic.cc

+ // f''(x) = 0
+ auto gx = nnvm::NodeEntry{n}; // f'(x)
+ ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad",
+ {ograds[0], gx}, nullptr, &n));


Similar to what you have done below for sin and cos.

gx is actually f'(x) * {head_grads/output_gradient}.
It should actually only be f'(x).

Explanation : gx = f'(x) * head_grads.
Therefore, gx w.r.t. f'(x) = head_grads,
Similarly gx w.r.t. head_grads = f'(x).

kshitij12345 · 2019-06-04T17:53:06Z

@apeforest

Could you update the check_second_order_unary as per

https://github.com/apache/incubator-mxnet/blob/37ce3b87268a8154f5c0ad97ce2522478038ee06/tests/python/unittest/test_higher_order_grad.py#L76-L103

This covers check for gradient of the first input argument as well. Have tested a similar Pytorch Script which works ( code in PR #15120 ).

However do note that for PR #15120 ,
https://github.com/apache/incubator-mxnet/blob/37ce3b87268a8154f5c0ad97ce2522478038ee06/tests/python/unittest/test_higher_order_grad.py#L102

Assertion fails with head_grads.grad.asnumpy() being all 0's.

Please check to see if it works for you.
Thank You.

apeforest · 2019-06-05T17:08:28Z

@kshitij12345 I updated the test locally and it also fails with sin(x) and cos(x) as well with the same reason of log(x). I have not pushed it to the remote because it then requires your PR #15120 to get merged first. Otherwise the test will always fail.

apeforest · 2019-06-05T18:21:11Z

Ping @aidan-plenert-macdonald for review

aodhan-domhnaill

Looks great! I think the code is right, but I would like the tests to be updated slightly to catch ops that don't support Nth order gradients.

aodhan-domhnaill · 2019-06-05T18:43:04Z

tests/python/unittest/test_higher_order_grad.py

+ for dim in range(1, 5):
+ shape = rand_shape_nd(dim)
+ array = random_arrays(shape)
+ check_second_order_unary(array, cos, grad_grad_op)


Can these check_second_order_unary checks be changes to Nth order?

This PR is only to verify second order gradient. Can we add test for Nth order gradient in a separate PR?

aodhan-domhnaill · 2019-06-05T18:46:40Z

src/operator/tensor/elemwise_unary_op_trig.cc

+ // f''(x) = -sin(x)
+ auto x_grad = MakeNode("cos", n->attrs.name + "_x_grad",
+ {n->inputs[1]}, nullptr, &n);
+ auto x_grad_grad = MakeNode("negative", n->attrs.name + "_x_grad_grad",


Although it appears that this permit's Nth order derivatives, the _grad_grad thing concerns me as though this is simple registering a 2nd order grad and calling that good. Assuming this does support Nth order, it worries me that someone else trying to copy this would "cheat" by only having 2 gradients.

Ideally the testing would catch this.

as long as the nodes are themselves differentiable, then it would support additional differentiation. For some complex functions having just 2nd gradient should be good. In this case I would say it's N times differentiable. the _x_grad_grad is just the name of the node as far as I understand it. Why does it concern you?

I renamed it. Also this indicates the gradient is on top of previous gradient. User can still apply this recursively for nth order gradient.

apeforest · 2019-06-06T22:46:01Z

To summarize from discussion in #15120, I think in MXNet we calculate input gradients based on the variables specified in the autograd.grad() API. In this case y_grad, even if a grad attribute is attached to it, no gradient values will be assigned during the second pass of backward(). Only the second order gradient x.grad is assigned. This behavior is different from PyTorch, but I think it is not a bug of MXNet.

kshitij12345 · 2019-06-07T17:46:52Z

Thank you very much for the clarification.

I think in MXNet we calculate input gradients based on the variables specified in the autograd.grad() API. In this case y_grad, even if a grad attribute is attached to it, no gradient values will be assigned during the second pass of backward(). Only the second order gradient x.grad is assigned.

BTW, I was wondering if above should be documented? Mostly probably users won't end up in this situation but still. Your thoughts?

apeforest · 2019-06-07T23:24:08Z

BTW, I was wondering if above should be documented? Mostly probably users won't end up in this situation but still. Your thoughts?

Yes, I think we should document this in the autograd API. What do you think?

apeforest · 2019-06-10T17:47:19Z

@kshitij12345 Do you think we can merge this PR now? If so, could you please approve it? thanks!

kshitij12345 · 2019-06-10T17:55:54Z

LGTM

…pache#14613) * try to add support some ops * add unit test for second order grad * implement grad for relu and add unit test * fix lint * register FGradient attribute for backward relu * resolve conflict * remove unused imports * change gradient using set_attr * remove higher order grad test for negative(x) * fix lint * reverse indent * remove unused backward operator * refactor backward for sin(x) and cos(x) * change value init to list init * change to list initialization * generate random shape in test * fix a bug in second order backward * fix lint * fix lint * address reviewer comment and renaming

sxjscience and others added 7 commits October 14, 2018 14:56

try to add support some ops

45e1502

Merge branch 'higher_order_sample' of https://github.com/sxjscience/m…

904adb4

…xnet into develop/higher_order_grad

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

d5dc994

…der_grad

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

0e69075

…der_grad

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

0c7cf98

…der_grad

add unit test for second order grad

492e4cd

implement grad for relu and add unit test

45b334e

apeforest mentioned this pull request Apr 4, 2019

General support of OPs for second-order gradient #10002

Open

larroy approved these changes Apr 4, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

3bbfbac

…der_grad

marcoabreu added Backend Issues related to the backend of MXNet Operator pr-awaiting-review PR is waiting for code review labels Apr 4, 2019

apeforest added 2 commits April 5, 2019 11:26

fix lint

4dc0907

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

c4034b2

…der_grad

larroy reviewed Apr 24, 2019

View reviewed changes

anirudhacharya reviewed May 2, 2019

View reviewed changes

apeforest added 2 commits May 15, 2019 20:49

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

3fe54e6

…der_grad

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

76aa6ad

…der_grad

kshitij12345 mentioned this pull request May 18, 2019

[MXNET-978] Support higher order gradient for log, log2, log10. #14992

Merged

7 tasks

larroy mentioned this pull request May 20, 2019

Simplify creation of NodeEntry instances and use emplace_back #14095

Merged

5 tasks

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

8458717

…der_grad

apeforest mentioned this pull request May 21, 2019

Add second order gradient for a list of basic operators #15024

Merged

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

f66610b

…der_grad

kshitij12345 reviewed May 31, 2019

View reviewed changes

anirudhacharya reviewed May 31, 2019

View reviewed changes

change to list initialization

0b6c2ef

anirudhacharya reviewed May 31, 2019

View reviewed changes

generate random shape in test

31f671f

anirudhacharya approved these changes May 31, 2019

View reviewed changes

fix a bug in second order backward

62fcca3

apeforest added 2 commits June 3, 2019 16:49

fix lint

a0a0e75

fix lint

451c4bd

kshitij12345 reviewed Jun 4, 2019

View reviewed changes

aodhan-domhnaill reviewed Jun 5, 2019

View reviewed changes

address reviewer comment and renaming

b9b0c93

apeforest changed the title ~~[MXNET-978] Higher order gradient support for some unary operators~~ [MXNET-978] Second order gradient support for some unary operators Jun 5, 2019

aodhan-domhnaill approved these changes Jun 5, 2019

View reviewed changes

larroy mentioned this pull request Jun 5, 2019

[DOC] refine autograd docs #15109

Merged

5 tasks

kshitij12345 approved these changes Jun 10, 2019

View reviewed changes

apeforest merged commit 3c82ce2 into apache:master Jun 10, 2019

[MXNET-978] Second order gradient support for some unary operators #14613

[MXNET-978] Second order gradient support for some unary operators #14613

Conversation

apeforest commented Apr 4, 2019

Description

Checklist

Essentials

Changes

larroy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piyushghai commented Apr 4, 2019

apeforest commented Apr 5, 2019

sxjscience commented Apr 7, 2019

Roshrini commented Apr 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vandanavk commented May 9, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anirudhacharya May 31, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anirudhacharya left a comment

Choose a reason for hiding this comment

apeforest commented May 31, 2019

kshitij12345 commented Jun 1, 2019 • edited Loading

apeforest commented Jun 3, 2019

apeforest commented Jun 4, 2019

kshitij12345 Jun 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kshitij12345 commented Jun 4, 2019

apeforest commented Jun 5, 2019 • edited Loading

apeforest commented Jun 5, 2019

aodhan-domhnaill left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apeforest Jun 5, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apeforest commented Jun 6, 2019 • edited Loading

kshitij12345 commented Jun 7, 2019 • edited Loading

apeforest commented Jun 7, 2019

apeforest commented Jun 10, 2019

kshitij12345 commented Jun 10, 2019

anirudhacharya May 31, 2019 •

edited

Loading

kshitij12345 commented Jun 1, 2019 •

edited

Loading

kshitij12345 Jun 4, 2019 •

edited

Loading

apeforest commented Jun 5, 2019 •

edited

Loading

apeforest Jun 5, 2019 •

edited

Loading

apeforest commented Jun 6, 2019 •

edited

Loading

kshitij12345 commented Jun 7, 2019 •

edited

Loading