Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Transform that stacks data for agents with identical specs #2566

Open
1 task done
kurtamohler opened this issue Nov 14, 2024 · 8 comments · May be fixed by #2567
Open
1 task done

[Feature Request] Transform that stacks data for agents with identical specs #2566

kurtamohler opened this issue Nov 14, 2024 · 8 comments · May be fixed by #2567
Assignees
Labels
enhancement New feature or request

Comments

@kurtamohler
Copy link
Collaborator

Motivation

Some multi-agent environments, like VmasEnv, stack all of the tensors for observations, rewards, etc. for different agents that have identical specs. For instance, in one of these stacked environments, if there are 2 agents that each have 8 observations, the observation spec might look like this:

Composite(
    group_0: Composite(
        observation: UnboundedContinuous(
            shape=torch.Size([2, 8]),
            ...),
        shape=torch.Size([2])),
    shape=torch.Size([]))

In contrast, other environments, like UnityMLAgentsEnv, have separate keys for each agent, even if the agents' specs are identical. For instance, with 2 agents that each have 8 observations, the observation spec might look like this:

Composite(
    group_0: Composite(
        agent_0: Composite(
            observation: UnboundedContinuous(
                shape=torch.Size([8]),
                ...),
            shape=torch.Size([])),
        agent_1: Composite(
            observation: UnboundedContinuous(
                shape=torch.Size([8]),
                ...),
            shape=torch.Size([])),
        shape=torch.Size([])),
    shape=torch.Size([]))

It is not easy to apply the same training script to two environments that use these two different formats. For instance, applying the multi-agent PPO tutorial to a Unity env is not straightforward.

Solution

If we had an environment transform that could stack all the data from different keys, we could convert an environment that uses the unstacked format into an environment that uses the stacked format. Then it should be straightforward to use the same (or almost the same) training script on the two different environments.

Alternatives

Additional context

Checklist

  • I have checked that there is no similar issue in the repo (required)
@kurtamohler kurtamohler added the enhancement New feature or request label Nov 14, 2024
@kurtamohler kurtamohler self-assigned this Nov 14, 2024
@kurtamohler kurtamohler linked a pull request Nov 14, 2024 that will close this issue
7 tasks
@thomasbbrunner
Copy link
Contributor

thomasbbrunner commented Nov 15, 2024

Have you taken a look at the group_map argument? When set to MarlGroupMapType.ALL_IN_ONE_GROUP the environment should return all agents in a single group (when possible, otherwise in more than one group).

If you are using this setting, then imo there's an issue in the implementation of the grouping of agents in UnityMLAgentsEnv.

@kurtamohler
Copy link
Collaborator Author

kurtamohler commented Nov 18, 2024

@thomasbbrunner, the group_map=ALL_IN_ONE_GROUP argument does combine all the agents into one group key, but just to be clear, the specific issue is that the data for each of the agents do not get stacked all into the same set of tensors--each agent still has a separate key containing separate tensors, like the example I showed in the issue description.

One of the reasons why UnityMLAgentsEnv does not stack the tensors for all the agents is that each of the agents in a Unity env are free to have different specs--for instance, the strikers and goalies have different specs in the "StrikersVsGoalies" example environment. In cases like that, it's not possible to stack all of the agents' data together. So when I was writing the wrapper, it seemed natural to keep the agents under separate agent keys, rather than try to automatically figure out which agents' specs match each other and can be stacked together. Furthermore, it seemed best to organize the data in a way that is consistent with how the ml_agents library itself does it.

But I'm happy to change the behavior if there is good reason to do so.

@vmoens, wdyt?

@thomasbbrunner
Copy link
Contributor

Ah, I see! Thanks for the explanation. It seems that the UnityMLAgentsEnv has a different behavior than the other multi-agent environments (my experience is mostly based on PettingZoo here).

PettingZooEnv has three possible grouping types:

  1. ONE_GROUP_PER_AGENT (self explanatory)
  2. None (default): will group as many agents as possible, while keeping the groups homogeneous (i.e., all agents in the group have the same obs and action spec).
  3. ALL_IN_ONE_GROUP: (self explanatory) + if the agents cannot be grouped together (due to mismatched specs), it will return some weird spec and any environment operation will lead to an error.

Personally, I don't see the benefit of grouping heterogeneous agents. Maybe that makes sense in your use-case, but I'd argue that there is no difference between having one group for each agent and a group containing sub-groups of heterogenous agents.

I quite like the default behavior of PettingZooEnv (grouping where possible) and I don't understand why there isn't a GROUP_HOMOGENEOUS (or similar) in MarlGroupMapType.

Would be interested in hearing more about your use-case!

@thomasbbrunner
Copy link
Contributor

thomasbbrunner commented Nov 18, 2024

It would be nice if the behavior of all TorchRL environments would be consistent with each other. It sometimes feels like multi-agent is not part of the "core" TorchRL capabilities and I'm hoping we can change that!

cc @matteobettini

@kurtamohler
Copy link
Collaborator Author

I agree that we should make environments consistent with each other when possible. At the time, I didn't think that putting the Unity env agents under separate keys would be a significant inconsistency with other TorchRL envs (at least compared to VMAS, PettingZoo, and OpenSpiel)--it just seemed like the right choice because it's more consistent with the underlying ml_agents library. But I see that it probably would be better to stack the data of agents that share the same group, by default.

@vmoens, do you agree with this?

I don't understand why there isn't a GROUP_HOMOGENEOUS

That sounds like a good idea to me. Feel free to submit an issue

Personally, I don't see the benefit of grouping heterogeneous agents.

I don't either, but then again, I'm fairly new here. Maybe I took ALL_IN_ONE_GROUP too literally

@vmoens
Copy link
Contributor

vmoens commented Nov 19, 2024

It would be nice if the behavior of all TorchRL environments would be consistent with each other. It sometimes feels like multi-agent is not part of the "core" TorchRL capabilities and I'm hoping we can change that!

I'm sorry this is the impression you have, @matteobettini gave a great deal of effort in unifying MARL APIs, but if there are inconsistencies we should address them!

One of the reasons why UnityMLAgentsEnv does not stack the tensors for all the agents is that each of the agents in a Unity env are free to have different specs--for instance, the strikers and goalies have different specs in the "StrikersVsGoalies" example environment. In cases like that, it's not possible to stack all of the agents' data together. So when I was writing the wrapper, it seemed natural to keep the agents under separate agent keys, rather than try to automatically figure out which agents' specs match each other and can be stacked together. Furthermore, it seemed best to organize the data in a way that is consistent with how the ml_agents library itself does it.

I 100% agree on this.
My take when writing wrappers is usually that I want the wrapper to have the minimal interference with the underlying API – the only thing we want is to present the data in a tensordict-friendly manner: ie, convert to tensors, make sure that the data is on the desired device, minimize the cost of operations etc. Once that is done, we leave it to the user to deal with the data.
The reason we want to do this that any change we make in the way the data is presented to the user will create edge cases that we may not know of.

Now RE the "consistentcy" problem highlighted by @thomasbbrunner, as I said earlier we should make sure that things can be made uniform at a low cost.
My vision here would be for TorchRL to deal itself with the various grouping strategies with its own API, possibly inspired by pettingzoo if that is a convention in the community.
The API would consist in appending a transform that re-arranges the tensordict given some flag, for instance you could create your Unity env in its wrapper and pass a transform

env = UnityMLAgentsEnv(...)
transform = GroupMARLAgents(MARLGroups.ONE_GROUP_PER_AGENT)
env = env.append_transform(transform)

and then, internally, we make sure that the transform does its job for every MARL library we support.

Would that make sense?

nb: Using metaclasses, we could even pass directly a group argument in each wrapper that automatically append the transform if required.

@matteobettini
Copy link
Contributor

matteobettini commented Nov 19, 2024

Hey guys so I think there might have been some general confusion about the multi-agent environment API here.

If you can, whatch this section of this video to understand how it works https://youtu.be/1tOIMgJf_VQ?si=1RJ7PGD3s5--hI2o&t=1235

Here is a recap:

The choice of which agents should be stacked together and which kept separate (what we call grouping) has to be 100% up to the user.

That is why multi-agent envs should take a group map that maps groups to agents in that group.
This is a dictionary passed by the user.

The MarlGroupMapType are just some sensible defaults for this value. As you know, these right now are either all in one group or one group per agent.

Then environments have a default behaviour for building the map when nothing is passed.
The default behavior that makes the most sense to me (and it is the one used in vmas and pettingzoo) is to group together agents based on their name. Basically if you have "evader_1", "evader_2" "attacker_1" it would make 2 groups named "attacker" and "evader".

This to me was better than GROUP_HOMOGENEOUS because it is difficult to define a sensible and consistent bhavior for that. But I am open to discuss this.

The choice of the grouping needs to be passed at environment construction. If MLAgents is overfitted to one specific choice I believe we should extend it. Modifying the grouping in a transform later is very inefficient and I think should be avoided

MLAgents also has a concept of an agent type or something similar which I remember would be great for the default grouping strategy

@matteobettini
Copy link
Contributor

Maybe i am missing what the group_map input to the MLAgents constructor does. Cause that is what I would use to change the data grouping and address this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants