-
Notifications
You must be signed in to change notification settings - Fork 6.8k
attach_grad of intermediate variables causes the gradient graph to be lost #11865
Comments
To understand this right, in scenarios such as the following x = mx.nd.array([0, 7], ctx = mx.cpu())
x.attach_grad()
with mx.autograd.record():
y = ((5 * (x**2)) + (13 * x) + 10)
y.attach_grad()
z = 2 * y
z.backward()
print(x.grad) what you want is we should be able to get In the above example would you also want the result of Have I understood this right? Which of the two above situations is your issue pointing to? |
Yes, that's what I'd like to have. In the current implementation, instead of marking the y's gradient to be one of the output, the above code discards the previous graph in which y resides. |
@szha would you agree with the above part that these intermediate gradients should not be stored by default but rather we should provide a function call, something like |
Yes, there should be an explicit mechanism to mark new outputs. Whether it's reusing attach_grad or a new method is up for debate. |
Here is another usecase where using With the following example I would expect from mxnet import ndarray as nd
from mxnet import autograd as ag
x = nd.array([1,2,3,4])
x.attach_grad()
y = nd.array([5,6,7,8])
y.attach_grad()
ag.set_recording(True)
u = x * y
v = u.detach()
v.attach_grad()
z = v * x
ag.set_recording(False)
z.backward()
u.backward(v.grad)
print(x.grad, y.grad) But when I do it without using head gradients like as follows I get the correct gradients - from mxnet import autograd as ag
x = nd.array([1,2,3,4])
x.attach_grad()
y = nd.array([5,6,7,8])
y.attach_grad()
ag.set_recording(True)
u = x * y
z = u * x
ag.set_recording(False)
z.backward()
print(x.grad, y.grad) |
And as per the autograd documentation here - https://www.d2l.ai/chapter_crashcourse/autograd.html#attach-gradients-to-internal-variables it would seem we are expecting the computation graph to be thrown away when we execute We need to get a clear understanding of what the expected behavior is here. |
Unfortunately this is how it's implemented. Why do you want to attach grad to output again? detach is not the cause, as far as I understand the code. |
Here are more test cases:
The task is to evaluate z = xu = x (x*y). Test 1: Test 2: My guess of the purpose of implicitly running Another idea of fixing it is that A further user case is when partial gradient |
The text was updated successfully, but these errors were encountered: