Skip to content

Conversation

@Xmaster6y
Copy link
Contributor

Description

Implement a torch.autograd.grad for TensorDict.

It is still in progress, but comments are welcome. Especially on:

  • How to do it on lazy, if at all?
  • Should I add some tests / docs?

Motivation and Context

Closes #1416

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

  • New feature (non-breaking change which adds core functionality)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 16, 2025
@vmoens
Copy link
Collaborator

vmoens commented Aug 17, 2025

Great feature!
We do need tests and doc.
Tests should cover regular TDs, lazy stacks, tensorclasses (at least). We should check that things like persistent TDs and memory mapped ones raise the appropriate error.

Why would lazy TDs not work? I think they should (tensors in the lazy stack can be leaves of the graph).

@Xmaster6y
Copy link
Contributor Author

Xmaster6y commented Aug 17, 2025

Okay thanks, I will work on that next week! And for lazy I just don't know how they are handled but will dig.

@vmoens vmoens added the enhancement New feature or request label Aug 19, 2025
@vmoens
Copy link
Collaborator

vmoens commented Sep 2, 2025

hello
i'm back from holiday, lmk if you need help with this

@Xmaster6y
Copy link
Contributor Author

Also took some time off, but I'll work on that on Friday, and thx for the proposition I might need help ^^

@Xmaster6y
Copy link
Contributor Author

Xmaster6y commented Sep 5, 2025

@vmoens I have an issue with LazyStackedTensorDict, what is expected with an example like this:

import torch
from tensordict import TensorDict
from tensordict import LazyStackedTensorDict

td = LazyStackedTensorDict(TensorDict(), TensorDict(), stack_dim=0)
td["a"] = [torch.ones(1), torch.zeros(1)]
td.requires_grad_()
out = td + 1
torch.autograd.grad(out["a"], td["a"], torch.ones_like(out["a"]))

Since the requires_grad is not applied to the underlying tensors, it raises. Is it expected?

@Xmaster6y
Copy link
Contributor Author

I also put a small bit of docs in td.rst, but didn't know if it was the right place.

Copy link
Collaborator

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress!

Can you investigate these two errors?

FAILED test/test_tensordict.py::TestTensorDicts::test_autograd_grad[nested_stacked_td-device3] - RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
FAILED test/test_tensordict.py::TestTensorDicts::test_autograd_grad[stacked_td-device4] - RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Also a similar test with tensorclass would be nice!

@Xmaster6y
Copy link
Contributor Author

Xmaster6y commented Sep 5, 2025

Good progress!

Can you investigate these two errors?

FAILED test/test_tensordict.py::TestTensorDicts::test_autograd_grad[nested_stacked_td-device3] - RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
FAILED test/test_tensordict.py::TestTensorDicts::test_autograd_grad[stacked_td-device4] - RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Also a similar test with tensorclass would be nice!

This is due to what I mention in my previous comment. requires grad is not propagated to inner tensors (pre stack) of the LazyStackedTensorDict

This is due to the mixed computation graph introduced byLazyStackedTensorDict . For:

td = LazyStackedTensorDict(TensorDict(), TensorDict(), stack_dim=0)
td["a"] = [torch.ones(1), torch.zeros(1)]
td.requires_grad_()
out = td + 1
torch.autograd.grad(out["a"], td["a"], torch.ones_like(out["a"]))

(td+1)["a"]: td[0]["a"] -> td[0]["a"]+1 ; td[1]["a"] -> td[2]["a"]+1 ; (td[0]["a"]+1, td[2]["a"]+1) ->stack(...)

But:

td["a"]: (td[0]["a"], td[2]["a"]) ->stack(...)

never appears in the previous computation graph.

@vmoens
Copy link
Collaborator

vmoens commented Sep 5, 2025

```python
torch.autograd.grad(out["a"], td["a"], torch.ones_like(out["a"]))

You could do

out.get("a", as_list=True)

@Xmaster6y
Copy link
Contributor Author

I see thx, I'll split all Lazy then and merge them back after.

@Xmaster6y
Copy link
Contributor Author

@vmoens I made it work but it feels hacky, lmk what you think

@vmoens
Copy link
Collaborator

vmoens commented Sep 5, 2025

I used a simpler approach, can you check that it makes sense?
The only problem is that if one of the inputs is a lazy stacks and other are not you will end up in a situation where the inputs, outputs tuples will not be the same.
We can either cast everything to a lazy stack if that's the case (td.to_lazystack()) or just throw a more informative error. This can be in a follow up PR if you want (if you put it in a separate PR then make sure that in this one the doc mentions that all inputs to grad must have the same TD type!)

@Xmaster6y
Copy link
Contributor Author

It definitely makes sense and is much simpler! But I am not following:

if one of the inputs is a lazy stacks and other are not

Do you want to add support for passing multiple tensordicts as inputs/outputs? Or do you mean when the fields are of different type? I think the former could be supported using nested TensorDict (but wanted to get the single grad to work first), for the other, I don't see the problem. Could you share a quick snippet?

@vmoens
Copy link
Collaborator

vmoens commented Sep 5, 2025

I mean that if you have grad_outputs=TensorDict(...) and outputs=LazyStackedTensorDict(...), then calling values or items like we do now will return different tuple lengths. In this case we should cast grad_outputs to a lazy stack.
I think it's really an edge case though! In most cases you will do torch.ones_like(outputs) or smth of that kind no?

@vmoens vmoens merged commit 2c0794c into pytorch:main Sep 5, 2025
72 of 79 checks passed
@Xmaster6y
Copy link
Contributor Author

Gotcha, yes, it is an edge case, I think, I might have an example where it might be useful though, and I'd say often outputs and grad_outputs do have the same shape.

But I am not getting why converting to lazy would do the trick, will it split the stack dimension of the full tensordict? But even two stacked tensordicts could have the same shape but not stacked similarily right?

I think we should keep tuples since the issue was only with the inputs. Like keeping this:

        tup_grad_outputs = tuple(grad_outputs[k] for k in outputs.keys(True, True))
    else:
        tup_grad_outputs = None

    tup_outputs = tuple(outputs[k] for k in outputs.keys(True, True))

@Xmaster6y
Copy link
Contributor Author

It also ensure same order of outputs and grad_outputs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Support for torch.autograd.grad

2 participants