Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Documentation for POCA and cooperative behaviors #5056

Merged
merged 300 commits into from
Mar 12, 2021
Merged

Conversation

ervteng
Copy link
Contributor

@ervteng ervteng commented Mar 8, 2021

Proposed change(s)

Documentation for COMA2 and MultiAgentGroup.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

Comment on lines 991 to 992
* Agents within groups should always set the `Max Steps` parameter the Agent script to 0, meaning
they will never reach a max step. Instead, handle Max Steps with MultiAgentGroup by ending the episode for the entire
Copy link
Contributor

@andrewcoh andrewcoh Mar 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Agents within groups should always set the `Max Steps` parameter the Agent script to 0, meaning
they will never reach a max step. Instead, handle Max Steps with MultiAgentGroup by ending the episode for the entire
* Agents within groups should always set the `Max Steps` parameter in the Agent script to 0. Instead, handle Max Steps with MultiAgentGroup by ending the episode for the entire

makes learning what to do as an individual difficult - you may get a win
for doing nothing, and a loss for doing your best.

In ML-Agents, we provide MA-POCA (MultiAgent POsthumous Credit Assignment), which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say "paper coming soon" or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to not say anything. Although I am worried someone will coin the name.

}

// if the team scores a goal
m_AgentGroup.AddGroupReward(score);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
m_AgentGroup.AddGroupReward(score);
m_AgentGroup.AddGroupReward(rewardForGoal);

ResetScene();
```

Multi Agent Groups are best used with the MA-POCA trainer, which is explicitly designed to train
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Multi Agent Groups are best used with the MA-POCA trainer, which is explicitly designed to train
Multi Agent Groups can only be trained with the MA-POCA trainer, which is explicitly designed to train

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this isn't exactly true - Multi Agent Groups will run and try to train with PPO but their behaviors won't be very collaborative. I changed it to the stronger-but-not-as-hard "should be trained with".

```

Multi Agent Groups are best used with the MA-POCA trainer, which is explicitly designed to train
cooperative environments. This can be enabled by using the `coma` trainer - see the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cooperative environments. This can be enabled by using the `coma` trainer - see the
cooperative environments. This can be enabled by using the `poca` trainer - see the

Team Id. If this playing field is duplicated many times in the Scene (e.g. for training
speedup), there should be two Groups _per playing field_, and two unique Team Ids
_for the entire Scene_. In environments with both Groups and Team Ids configured, MA-POCA and
self-play can be used together for training.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a little image will help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a small diagram of the difference

Comment on lines 981 to 983
For an example of how to set up cooperative environments, see the
[Cooperative PushBlock](Learning-Environment-Examples.md#cooperative-push-block) and
[Dungeon Escape](Learning-Environment-Examples.md#dungeon-escape) example environments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For an example of how to set up cooperative environments, see the
[Cooperative PushBlock](Learning-Environment-Examples.md#cooperative-push-block) and
[Dungeon Escape](Learning-Environment-Examples.md#dungeon-escape) example environments.

Remove until the environments are actually merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment on lines 1000 to 1002
* If an Agent finished earlier, e.g. completed tasks/be removed/be killed in the game, do not call
`EndEpisode()` on the Agent. Instead, disable the Agent and re-enable it when the next episode starts,
or destroy the agent entirely.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give an explanation:
"This is because calling EndEpisode will call OnEpisodeBegin, hence resetting the Agent immediately. This is usually not the desired behavior when training a group of Agents."
It is possible to call EndEpisode it just will most likely not be what the user expects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this explanation 👍

`EndEpisode()` on the Agent. Instead, disable the Agent and re-enable it when the next episode starts,
or destroy the agent entirely.

* If an Agent is disabled in a scene, it must be re-registered to the MultiAgentGroup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disabled or destroyed right ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

destroyed = gone. no way it can be re-registered right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would say, if a previously disabled agent is re-enabled it must be re-registered


* Group rewards are meant to reinforce agents to act in the group's best interest instead of
individual ones, and are treated differently than individual agent rewards during
training. So calling AddGroupReward() is not equivalent to calling agent.AddReward() on each agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
training. So calling AddGroupReward() is not equivalent to calling agent.AddReward() on each agent
training. So calling `AddGroupReward()` is not equivalent to calling `agent.AddReward()` on each agent

@@ -456,3 +456,43 @@ drop down. New pieces are spawned randomly at the top, with a chance of being
- Recommended Minimum: 1
- Recommended Maximum: 20
- Benchmark Mean Reward: Depends on the number of tiles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add these along side the environment addition PRs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed and moved to the environment PR

makes learning what to do as an individual difficult - you may get a win
for doing nothing, and a loss for doing your best.

In ML-Agents, we provide MA-POCA (MultiAgent POsthumous Credit Assignment), which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to not say anything. Although I am worried someone will coin the name.

@@ -933,7 +933,8 @@ hyperparameter hierarchy in both.

Cooperative behavior in ML-Agents can be enabled by instantiating a `SimpleMultiAgentGroup`,
typically in an environment controller or similar script, and adding agents to it
using the `RegisterAgent(Agent agent)` method. Using `MultiAgentGroup` enables the
using the `RegisterAgent(Agent agent)` method. Note that all agents added to the same `MultiAgentGroup`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only have SimpleMultiAgentGroup and IMultiAgentGroup, no MultiAgentGroup. (To verify)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed and fixed

@delete-merged-branch delete-merged-branch bot deleted the branch main March 12, 2021 01:48
@ervteng ervteng changed the base branch from develop-coma2-trainer to main March 12, 2021 02:04
@ervteng ervteng merged commit 847d723 into main Mar 12, 2021
@delete-merged-branch delete-merged-branch bot deleted the develop-coma2-docs branch March 12, 2021 02:05
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants