-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] How to create a compound actor? #1473
Comments
As of now compounding distributions isn't supported but it's something we can work on (it popped up already in the past). |
Note to self: we now need to write an example of this in the doc, then we can close this issue |
@hersh here's the minimal example I will put in the doc from tensordict import TensorDict
from tensordict.nn import CompositeDistribution, TensorDictModule
from torchrl.modules import ProbabilisticActor
from torch import nn, distributions as d
import torch
class Module(nn.Module):
def forward(self, x):
return x[..., :3], x[..., 3:6], x[..., 6:]
module = TensorDictModule(Module(),
in_keys=["x"],
out_keys=[("params", "normal", "loc"), ("params", "normal", "scale"), ("params", "categ", "logits")])
actor = ProbabilisticActor(module,
in_keys=["params"],
distribution_class=CompositeDistribution,
distribution_kwargs={"distribution_map": {"normal": d.Normal, "categ": d.Categorical}}
)
data = TensorDict({"x": torch.rand(10)}, [])
module(data)
actor(data) Which prints
I can make some variations of this if needed. Would that serve the purpose? |
Yes, that looks great! Thanks so much for your work. |
Motivation
I created an environment with a compound action space: a list of continuous values (robot joint angles) and a boolean value (suction gripper on or off).
In the PPO tutorial the policy_module is a ProbabilisticActor which takes "loc" and "scale" inputs. I want to make an actor which is a combination of this (for the joint angles) and something else that uses a Bernoulli distribution to generate boolean action values for the gripper.
It kind of looks like this may already be supported by using a TensorDictSequential, but it's not clear how that would work.
Solution
I would like to see an example in the docs of a compound action space like this.
Alternatives
Maybe there's another way where one actor is created for each type of action space? Then how to combine them for use with a DataCollector?
Additional context
The environment is a robot arm manipulation scenario using box2d.
Checklist
The text was updated successfully, but these errors were encountered: