Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] How to create a compound actor? #1473

Closed
1 task done
hersh opened this issue Aug 27, 2023 · 4 comments · Fixed by #1673
Closed
1 task done

[Feature Request] How to create a compound actor? #1473

hersh opened this issue Aug 27, 2023 · 4 comments · Fixed by #1673
Assignees
Labels
enhancement New feature or request

Comments

@hersh
Copy link

hersh commented Aug 27, 2023

Motivation

I created an environment with a compound action space: a list of continuous values (robot joint angles) and a boolean value (suction gripper on or off).

In the PPO tutorial the policy_module is a ProbabilisticActor which takes "loc" and "scale" inputs. I want to make an actor which is a combination of this (for the joint angles) and something else that uses a Bernoulli distribution to generate boolean action values for the gripper.

It kind of looks like this may already be supported by using a TensorDictSequential, but it's not clear how that would work.

Solution

I would like to see an example in the docs of a compound action space like this.

Alternatives

Maybe there's another way where one actor is created for each type of action space? Then how to combine them for use with a DataCollector?

Additional context

The environment is a robot arm manipulation scenario using box2d.

Checklist

  • I have checked that there is no similar issue in the repo (required)
@hersh hersh added the enhancement New feature or request label Aug 27, 2023
@vmoens vmoens pinned this issue Aug 30, 2023
@vmoens
Copy link
Contributor

vmoens commented Aug 31, 2023

As of now compounding distributions isn't supported but it's something we can work on (it popped up already in the past).
Let me come up with something!

@vmoens
Copy link
Contributor

vmoens commented Sep 7, 2023

Note to self: we now need to write an example of this in the doc, then we can close this issue

@vmoens
Copy link
Contributor

vmoens commented Oct 11, 2023

@hersh here's the minimal example I will put in the doc

from tensordict import TensorDict
from tensordict.nn import CompositeDistribution, TensorDictModule
from torchrl.modules import ProbabilisticActor
from torch import nn, distributions as d
import torch

class Module(nn.Module):
    def forward(self, x):
        return x[..., :3], x[..., 3:6], x[..., 6:]
module = TensorDictModule(Module(), 
                          in_keys=["x"], 
                          out_keys=[("params", "normal", "loc"), ("params", "normal", "scale"), ("params", "categ", "logits")])
actor = ProbabilisticActor(module, 
                           in_keys=["params"],
                           distribution_class=CompositeDistribution, 
                           distribution_kwargs={"distribution_map": {"normal": d.Normal, "categ": d.Categorical}}
                          )
data = TensorDict({"x": torch.rand(10)}, [])
module(data)
actor(data)

Which prints

TensorDict(
    fields={
        categ: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int64, is_shared=False),
        normal: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
        params: TensorDict(
            fields={
                categ: TensorDict(
                    fields={
                        logits: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.float32, is_shared=False)},
                    batch_size=torch.Size([]),
                    device=None,
                    is_shared=False),
                normal: TensorDict(
                    fields={
                        loc: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
                        scale: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False)},
                    batch_size=torch.Size([]),
                    device=None,
                    is_shared=False)},
            batch_size=torch.Size([]),
            device=None,
            is_shared=False),
        x: Tensor(shape=torch.Size([10]), device=cpu, dtype=torch.float32, is_shared=False)},
    batch_size=torch.Size([]),
    device=None,
    is_shared=False)

I can make some variations of this if needed.

Would that serve the purpose?

@hersh
Copy link
Author

hersh commented Oct 11, 2023

Yes, that looks great! Thanks so much for your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants