Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-978] Second order gradient support for some unary operators #14613

Merged
merged 34 commits into from
Jun 10, 2019

Conversation

apeforest
Copy link
Contributor

Description

This PR is to support higher order gradient in some basic unary operators. Thanks to @sxjscience, this PR is a continued work of #12821.

This PR adds additional support for relu operator and fixed the issue of negative operator support in #12821. In addition, I added unit tests for to test higher order gradient.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • higher order gradient for a set of unary operators
  • unit test to test higher order gradient

Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM

@@ -347,8 +347,9 @@ std::vector<NDArray*> Imperative::Backward(
x_reqs.push_back(info.grad_req);
info.fresh_out_grad = true;
}
CHECK_GT(xs.size(), 0)
<< "There are no inputs in computation graph that require gradients.";
if (xs.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change from CHECK to warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current backward operation requires an operator must have at least one inputs, because the gradient of a constants is always zero. However, the second order of some operators such as relu is actually gradient of a constant (ones or zeros). Therefore we need to support gradient for constant operators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should dive deeper into this one. Does it produce the warning (or early the failure) for some of the test cases?

In the original code I think the intention is to get if there's any input nodes which have gradient attached, I understand your explanation but what I don't see is where would we store the gradient for such constants, is because grad_req of the constant is kNullOp? the constant is just another node right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rootcause is when we do second order gradient of the negative(x) operator. The backward graph of this does not require any input and therefore will trigger this condition. I think if I remove the test for negative(x) then we do not need to modify this.

@piyushghai
Copy link
Contributor

Thanks for your contributions @apeforest. Can you also look into the CI failures ?

@mxnet-label-bot Add [Operator, pr-awaiting-review, Backend].

@marcoabreu marcoabreu added Backend Issues related to the backend of MXNet Operator pr-awaiting-review PR is waiting for code review labels Apr 4, 2019
@apeforest
Copy link
Contributor Author

@sxjscience Please help to review

@sxjscience
Copy link
Member

Looks good. However, the test has not passed and we need to check what caused the error. Also, we should later add a testing utility function that checks the higher gradient by numerical differentiation.

@Roshrini
Copy link
Member

@apeforest Thanks for working on this. Can you look into build failures?

@with_seed()
def test_elemwise_mul():
x = nd.array([1, 2, 3])
y = nd.zeros(3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this y?

def cos(x):
return nd.cos(x)

x = nd.array([1, 2, 3])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we randomize the test arrays with random_arrays and rand_shape_2d

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for second order not using random inputs helps reason about the gradient result...

def negative(x):
return nd.negative(x)

x = nd.array([1, 2, 3])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above and for rest of the tests

@vandanavk
Copy link
Contributor

@apeforest could you have a look at the CI failures

auto grad_grad_x_mid = MakeNode("cos", n->attrs.name + "_mid_grad_grad",
{n->inputs[1]}, nullptr, &n);
auto grad_grad_x = MakeNode("negative", n->attrs.name + "_backward_grad_grad",
{nnvm::NodeEntry(grad_grad_x_mid)}, nullptr, &n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

.set_attr<nnvm::FGradient>("FGradient",
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
std::vector<nnvm::NodeEntry> ret;
// f(x) -> f = relu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be having these comments? couldn't it be included as part of 'R"code'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hidden operator so user do not see this.

MXNET_OPERATOR_REGISTER_BINARY_WITH_SPARSE_CPU_DR(_backward_sin, unary_bwd<mshadow_op::sin_grad>)
.set_attr<nnvm::FGradient>("FGradient",
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
// f(x) = sin(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hidden operator so user do not see this.


arrays = random_arrays((2, 2), (2, 3), (4, 5, 2), (3, 1, 4, 5))
for array in arrays:
check_second_order_unary(array, sin, grad_grad_op)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think check_second_order_unary function should be moved to python/mxnet/test_utils.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only used in this test. If we add a different test file then it makes sense as you suggested.

def grad_grad_op(x):
return nd.zeros_like(x)

arrays = random_arrays((2, 2), (2, 3), (4, 5, 2), (3, 1, 4, 5))
Copy link
Member

@anirudhacharya anirudhacharya May 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we test for 1-d arrays?

not sure if it is needed here, but there is this to randomize the shape of an array - https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/test_utils.py#L418

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestion. updated.

Copy link
Member

@anirudhacharya anirudhacharya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything else lgtm

@apeforest
Copy link
Contributor Author

@kshitij12345 gentle ping again :) please approve it there is no other concern. thanks

@kshitij12345
Copy link
Contributor

kshitij12345 commented Jun 1, 2019

@apeforest As mentioned in the #14992 for log , I guess the check will fail with given script.
Sorry for missing this earlier.
I have checked that it fails with

from mxnet import nd, autograd
import numpy
import math
grad_grad_op = lambda x: -nd.sin(x) # -nd.cos(x)

x = nd.random.normal(0,1,(3,3))
x.attach_grad()
with autograd.record():
  y = nd.sin(x) # nd.cos(x)
  y_grad = autograd.grad(y, x, head_grads= nd.ones_like(y) * 0.5, create_graph=True, retain_graph=True)[0]
y_grad.backward(nd.ones_like(y_grad) * 0.6)

numpy.testing.assert_allclose(x.grad.asnumpy() , ( grad_grad_op(x) * 0.5 * 0.6).asnumpy(), rtol=1e-7, atol=1e-7)

As the grad from upper layer is not preserved for sin and cos.

@apeforest
Copy link
Contributor Author

@kshitij12345 Thanks for catching this bug. I have updated my sin and cos implementation with comments to help better understanding the mathematics behind second order gradient calculation. Please help to review again.

@apeforest
Copy link
Contributor Author

@kshitij12345 could you please take a look at the PR again. Thanks

// f''(x) = 0
auto gx = nnvm::NodeEntry{n}; // f'(x)
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad",
{ograds[0], gx}, nullptr, &n));
Copy link
Contributor

@kshitij12345 kshitij12345 Jun 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to what you have done below for sin and cos.

gx is actually f'(x) * {head_grads/output_gradient}.
It should actually only be f'(x).

Explanation : gx = f'(x) * head_grads.
Therefore, gx w.r.t. f'(x) = head_grads,
Similarly gx w.r.t. head_grads = f'(x).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@kshitij12345
Copy link
Contributor

@apeforest

Could you update the check_second_order_unary as per

https://github.com/apache/incubator-mxnet/blob/37ce3b87268a8154f5c0ad97ce2522478038ee06/tests/python/unittest/test_higher_order_grad.py#L76-L103

This covers check for gradient of the first input argument as well. Have tested a similar Pytorch Script which works ( code in PR #15120 ).

However do note that for PR #15120 ,
https://github.com/apache/incubator-mxnet/blob/37ce3b87268a8154f5c0ad97ce2522478038ee06/tests/python/unittest/test_higher_order_grad.py#L102

Assertion fails with head_grads.grad.asnumpy() being all 0's.

Please check to see if it works for you.
Thank You.

@apeforest
Copy link
Contributor Author

apeforest commented Jun 5, 2019

@kshitij12345 I updated the test locally and it also fails with sin(x) and cos(x) as well with the same reason of log(x). I have not pushed it to the remote because it then requires your PR #15120 to get merged first. Otherwise the test will always fail.

@apeforest
Copy link
Contributor Author

Ping @aidan-plenert-macdonald for review

Copy link

@aodhan-domhnaill aodhan-domhnaill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I think the code is right, but I would like the tests to be updated slightly to catch ops that don't support Nth order gradients.

for dim in range(1, 5):
shape = rand_shape_nd(dim)
array = random_arrays(shape)
check_second_order_unary(array, cos, grad_grad_op)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these check_second_order_unary checks be changes to Nth order?

Copy link
Contributor Author

@apeforest apeforest Jun 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is only to verify second order gradient. Can we add test for Nth order gradient in a separate PR?

// f''(x) = -sin(x)
auto x_grad = MakeNode("cos", n->attrs.name + "_x_grad",
{n->inputs[1]}, nullptr, &n);
auto x_grad_grad = MakeNode("negative", n->attrs.name + "_x_grad_grad",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it appears that this permit's Nth order derivatives, the _grad_grad thing concerns me as though this is simple registering a 2nd order grad and calling that good. Assuming this does support Nth order, it worries me that someone else trying to copy this would "cheat" by only having 2 gradients.

Ideally the testing would catch this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as the nodes are themselves differentiable, then it would support additional differentiation. For some complex functions having just 2nd gradient should be good. In this case I would say it's N times differentiable. the _x_grad_grad is just the name of the node as far as I understand it. Why does it concern you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it. Also this indicates the gradient is on top of previous gradient. User can still apply this recursively for nth order gradient.

@apeforest apeforest changed the title [MXNET-978] Higher order gradient support for some unary operators [MXNET-978] Second order gradient support for some unary operators Jun 5, 2019
@larroy larroy mentioned this pull request Jun 5, 2019
5 tasks
@apeforest
Copy link
Contributor Author

apeforest commented Jun 6, 2019

To summarize from discussion in #15120, I think in MXNet we calculate input gradients based on the variables specified in the autograd.grad() API. In this case y_grad, even if a grad attribute is attached to it, no gradient values will be assigned during the second pass of backward(). Only the second order gradient x.grad is assigned. This behavior is different from PyTorch, but I think it is not a bug of MXNet.

@kshitij12345
Copy link
Contributor

kshitij12345 commented Jun 7, 2019

Thank you very much for the clarification.

I think in MXNet we calculate input gradients based on the variables specified in the autograd.grad() API. In this case y_grad, even if a grad attribute is attached to it, no gradient values will be assigned during the second pass of backward(). Only the second order gradient x.grad is assigned.

BTW, I was wondering if above should be documented? Mostly probably users won't end up in this situation but still. Your thoughts?

@apeforest
Copy link
Contributor Author

BTW, I was wondering if above should be documented? Mostly probably users won't end up in this situation but still. Your thoughts?

Yes, I think we should document this in the autograd API. What do you think?

@apeforest
Copy link
Contributor Author

@kshitij12345 Do you think we can merge this PR now? If so, could you please approve it? thanks!

@kshitij12345
Copy link
Contributor

LGTM

@apeforest apeforest merged commit 3c82ce2 into apache:master Jun 10, 2019
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
…pache#14613)

* try to add support some ops

* add unit test for second order grad

* implement grad for relu and add unit test

* fix lint

* register FGradient attribute for backward relu

* resolve conflict

* remove unused imports

* change gradient using set_attr

* remove higher order grad test for negative(x)

* fix lint

* reverse indent

* remove unused backward operator

* refactor backward for sin(x) and cos(x)

* change value init to list init

* change to list initialization

* generate random shape in test

* fix a bug in second order backward

* fix lint

* fix lint

* address reviewer comment and renaming
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Backend Issues related to the backend of MXNet Operator pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants