[Feature] Support for `torch.autograd.grad` #1417

Xmaster6y · 2025-08-16T18:10:38Z

Description

Implement a torch.autograd.grad for TensorDict.

It is still in progress, but comments are welcome. Especially on:

How to do it on lazy, if at all?
Should I add some tests / docs?

Motivation and Context

Closes #1416

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

New feature (non-breaking change which adds core functionality)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

vmoens · 2025-08-17T07:42:53Z

Great feature!
We do need tests and doc.
Tests should cover regular TDs, lazy stacks, tensorclasses (at least). We should check that things like persistent TDs and memory mapped ones raise the appropriate error.

Why would lazy TDs not work? I think they should (tensors in the lazy stack can be leaves of the graph).

Xmaster6y · 2025-08-17T07:54:13Z

Okay thanks, I will work on that next week! And for lazy I just don't know how they are handled but will dig.

vmoens · 2025-09-02T16:51:23Z

hello
i'm back from holiday, lmk if you need help with this

Xmaster6y · 2025-09-02T22:15:00Z

Also took some time off, but I'll work on that on Friday, and thx for the proposition I might need help ^^

Xmaster6y · 2025-09-05T09:38:09Z

@vmoens I have an issue with LazyStackedTensorDict, what is expected with an example like this:

import torch
from tensordict import TensorDict
from tensordict import LazyStackedTensorDict

td = LazyStackedTensorDict(TensorDict(), TensorDict(), stack_dim=0)
td["a"] = [torch.ones(1), torch.zeros(1)]
td.requires_grad_()
out = td + 1
torch.autograd.grad(out["a"], td["a"], torch.ones_like(out["a"]))

~~Since the requires_grad is not applied to the underlying tensors, it raises. Is it expected?~~

Xmaster6y · 2025-09-05T09:56:10Z

I also put a small bit of docs in td.rst, but didn't know if it was the right place.

vmoens

Good progress!

Can you investigate these two errors?

FAILED test/test_tensordict.py::TestTensorDicts::test_autograd_grad[nested_stacked_td-device3] - RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
FAILED test/test_tensordict.py::TestTensorDicts::test_autograd_grad[stacked_td-device4] - RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Also a similar test with tensorclass would be nice!

docs/source/reference/td.rst

Xmaster6y · 2025-09-05T11:05:30Z

Good progress!

Can you investigate these two errors?

FAILED test/test_tensordict.py::TestTensorDicts::test_autograd_grad[nested_stacked_td-device3] - RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
FAILED test/test_tensordict.py::TestTensorDicts::test_autograd_grad[stacked_td-device4] - RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Also a similar test with tensorclass would be nice!

~~This is due to what I mention in my previous comment. requires grad is not propagated to inner tensors (pre stack) of the LazyStackedTensorDict~~

This is due to the mixed computation graph introduced byLazyStackedTensorDict . For:

td = LazyStackedTensorDict(TensorDict(), TensorDict(), stack_dim=0)
td["a"] = [torch.ones(1), torch.zeros(1)]
td.requires_grad_()
out = td + 1
torch.autograd.grad(out["a"], td["a"], torch.ones_like(out["a"]))

(td+1)["a"]: td[0]["a"] -> td[0]["a"]+1 ; td[1]["a"] -> td[2]["a"]+1 ; (td[0]["a"]+1, td[2]["a"]+1) ->stack(...)

But:

td["a"]: (td[0]["a"], td[2]["a"]) ->stack(...)

never appears in the previous computation graph.

vmoens · 2025-09-05T12:08:59Z

```python
torch.autograd.grad(out["a"], td["a"], torch.ones_like(out["a"]))

You could do

out.get("a", as_list=True)

Xmaster6y · 2025-09-05T12:30:12Z

I see thx, I'll split all Lazy then and merge them back after.

Xmaster6y · 2025-09-05T15:13:55Z

@vmoens I made it work but it feels hacky, lmk what you think

vmoens · 2025-09-05T15:34:27Z

I used a simpler approach, can you check that it makes sense?
The only problem is that if one of the inputs is a lazy stacks and other are not you will end up in a situation where the inputs, outputs tuples will not be the same.
We can either cast everything to a lazy stack if that's the case (td.to_lazystack()) or just throw a more informative error. This can be in a follow up PR if you want (if you put it in a separate PR then make sure that in this one the doc mentions that all inputs to grad must have the same TD type!)

Xmaster6y · 2025-09-05T16:29:40Z

It definitely makes sense and is much simpler! But I am not following:

if one of the inputs is a lazy stacks and other are not

Do you want to add support for passing multiple tensordicts as inputs/outputs? Or do you mean when the fields are of different type? I think the former could be supported using nested TensorDict (but wanted to get the single grad to work first), for the other, I don't see the problem. Could you share a quick snippet?

vmoens · 2025-09-05T16:45:11Z

I mean that if you have grad_outputs=TensorDict(...) and outputs=LazyStackedTensorDict(...), then calling values or items like we do now will return different tuple lengths. In this case we should cast grad_outputs to a lazy stack.
I think it's really an edge case though! In most cases you will do torch.ones_like(outputs) or smth of that kind no?

Xmaster6y · 2025-09-05T20:12:23Z

Gotcha, yes, it is an edge case, I think, I might have an example where it might be useful though, and I'd say often outputs and grad_outputs do have the same shape.

But I am not getting why converting to lazy would do the trick, will it split the stack dimension of the full tensordict? But even two stacked tensordicts could have the same shape but not stacked similarily right?

I think we should keep tuples since the issue was only with the inputs. Like keeping this:

        tup_grad_outputs = tuple(grad_outputs[k] for k in outputs.keys(True, True))
    else:
        tup_grad_outputs = None

    tup_outputs = tuple(outputs[k] for k in outputs.keys(True, True))

Xmaster6y · 2025-09-05T20:32:26Z

It also ensure same order of outputs and grad_outputs

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 16, 2025

vmoens added the enhancement New feature or request label Aug 19, 2025

vmoens reviewed Sep 5, 2025

View reviewed changes

docs/source/reference/td.rst Show resolved Hide resolved

Xmaster6y and others added 8 commits September 5, 2025 16:21

torch.autograd.grad support

d6e1477

format

afa8cd1

simple test failling for LazyStackedTensorDict

e48bb2b

small doc in td.rst

d7e659e

real output

5dae3e7

tensorclass support

6360ca9

LazyStackedTensorDict support

475dfc0

simpler approach

6b996cb

vmoens force-pushed the autograd branch from 78ae3ea to 6b996cb Compare September 5, 2025 15:32

vmoens merged commit 2c0794c into pytorch:main Sep 5, 2025
72 of 79 checks passed

Xmaster6y mentioned this pull request Sep 5, 2025

[Feature] Support for mixed TensorDictBase types in torch.autograd.grad #1434

Open

9 tasks

[Feature] Support for torch.autograd.grad #1417

[Feature] Support for torch.autograd.grad #1417

Uh oh!

Conversation

Xmaster6y commented Aug 16, 2025

Description

Motivation and Context

Types of changes

Checklist

Uh oh!

vmoens commented Aug 17, 2025

Uh oh!

Xmaster6y commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmoens commented Sep 2, 2025

Uh oh!

Xmaster6y commented Sep 2, 2025

Uh oh!

Xmaster6y commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xmaster6y commented Sep 5, 2025

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Xmaster6y commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmoens commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xmaster6y commented Sep 5, 2025

Uh oh!

Xmaster6y commented Sep 5, 2025

Uh oh!

vmoens commented Sep 5, 2025

Uh oh!

Xmaster6y commented Sep 5, 2025

Uh oh!

vmoens commented Sep 5, 2025

Uh oh!

Uh oh!

Xmaster6y commented Sep 5, 2025

Uh oh!

Xmaster6y commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feature] Support for `torch.autograd.grad` #1417

[Feature] Support for `torch.autograd.grad` #1417

Xmaster6y commented Aug 17, 2025 •

edited

Loading

Xmaster6y commented Sep 5, 2025 •

edited

Loading

Xmaster6y commented Sep 5, 2025 •

edited

Loading

vmoens commented Sep 5, 2025 •

edited

Loading