-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] Documentation for POCA and cooperative behaviors #5056
Conversation
Integrate into CC
* Agents within groups should always set the `Max Steps` parameter the Agent script to 0, meaning | ||
they will never reach a max step. Instead, handle Max Steps with MultiAgentGroup by ending the episode for the entire |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Agents within groups should always set the `Max Steps` parameter the Agent script to 0, meaning | |
they will never reach a max step. Instead, handle Max Steps with MultiAgentGroup by ending the episode for the entire | |
* Agents within groups should always set the `Max Steps` parameter in the Agent script to 0. Instead, handle Max Steps with MultiAgentGroup by ending the episode for the entire |
makes learning what to do as an individual difficult - you may get a win | ||
for doing nothing, and a loss for doing your best. | ||
|
||
In ML-Agents, we provide MA-POCA (MultiAgent POsthumous Credit Assignment), which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we say "paper coming soon" or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to not say anything. Although I am worried someone will coin the name.
} | ||
|
||
// if the team scores a goal | ||
m_AgentGroup.AddGroupReward(score); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_AgentGroup.AddGroupReward(score); | |
m_AgentGroup.AddGroupReward(rewardForGoal); |
ResetScene(); | ||
``` | ||
|
||
Multi Agent Groups are best used with the MA-POCA trainer, which is explicitly designed to train |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi Agent Groups are best used with the MA-POCA trainer, which is explicitly designed to train | |
Multi Agent Groups can only be trained with the MA-POCA trainer, which is explicitly designed to train |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this isn't exactly true - Multi Agent Groups will run and try to train with PPO but their behaviors won't be very collaborative. I changed it to the stronger-but-not-as-hard "should be trained with".
``` | ||
|
||
Multi Agent Groups are best used with the MA-POCA trainer, which is explicitly designed to train | ||
cooperative environments. This can be enabled by using the `coma` trainer - see the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cooperative environments. This can be enabled by using the `coma` trainer - see the | |
cooperative environments. This can be enabled by using the `poca` trainer - see the |
Team Id. If this playing field is duplicated many times in the Scene (e.g. for training | ||
speedup), there should be two Groups _per playing field_, and two unique Team Ids | ||
_for the entire Scene_. In environments with both Groups and Team Ids configured, MA-POCA and | ||
self-play can be used together for training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a little image will help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a small diagram of the difference
For an example of how to set up cooperative environments, see the | ||
[Cooperative PushBlock](Learning-Environment-Examples.md#cooperative-push-block) and | ||
[Dungeon Escape](Learning-Environment-Examples.md#dungeon-escape) example environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an example of how to set up cooperative environments, see the | |
[Cooperative PushBlock](Learning-Environment-Examples.md#cooperative-push-block) and | |
[Dungeon Escape](Learning-Environment-Examples.md#dungeon-escape) example environments. |
Remove until the environments are actually merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
* If an Agent finished earlier, e.g. completed tasks/be removed/be killed in the game, do not call | ||
`EndEpisode()` on the Agent. Instead, disable the Agent and re-enable it when the next episode starts, | ||
or destroy the agent entirely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give an explanation:
"This is because calling EndEpisode will call OnEpisodeBegin, hence resetting the Agent immediately. This is usually not the desired behavior when training a group of Agents."
It is possible to call EndEpisode
it just will most likely not be what the user expects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this explanation 👍
`EndEpisode()` on the Agent. Instead, disable the Agent and re-enable it when the next episode starts, | ||
or destroy the agent entirely. | ||
|
||
* If an Agent is disabled in a scene, it must be re-registered to the MultiAgentGroup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disabled or destroyed right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
destroyed = gone. no way it can be re-registered right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I would say, if a previously disabled agent is re-enabled it must be re-registered
|
||
* Group rewards are meant to reinforce agents to act in the group's best interest instead of | ||
individual ones, and are treated differently than individual agent rewards during | ||
training. So calling AddGroupReward() is not equivalent to calling agent.AddReward() on each agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
training. So calling AddGroupReward() is not equivalent to calling agent.AddReward() on each agent | |
training. So calling `AddGroupReward()` is not equivalent to calling `agent.AddReward()` on each agent |
@@ -456,3 +456,43 @@ drop down. New pieces are spawned randomly at the top, with a chance of being | |||
- Recommended Minimum: 1 | |||
- Recommended Maximum: 20 | |||
- Benchmark Mean Reward: Depends on the number of tiles. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add these along side the environment addition PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed and moved to the environment PR
makes learning what to do as an individual difficult - you may get a win | ||
for doing nothing, and a loss for doing your best. | ||
|
||
In ML-Agents, we provide MA-POCA (MultiAgent POsthumous Credit Assignment), which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to not say anything. Although I am worried someone will coin the name.
@@ -933,7 +933,8 @@ hyperparameter hierarchy in both. | |||
|
|||
Cooperative behavior in ML-Agents can be enabled by instantiating a `SimpleMultiAgentGroup`, | |||
typically in an environment controller or similar script, and adding agents to it | |||
using the `RegisterAgent(Agent agent)` method. Using `MultiAgentGroup` enables the | |||
using the `RegisterAgent(Agent agent)` method. Note that all agents added to the same `MultiAgentGroup` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only have SimpleMultiAgentGroup
and IMultiAgentGroup
, no MultiAgentGroup
. (To verify)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed and fixed
Proposed change(s)
Documentation for COMA2 and MultiAgentGroup.
Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)
Types of change(s)
Checklist
Other comments