-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Goal conditioning grid world : Example of goal conditioning #5193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
639c617
9db76ab
c3dba90
9ed6aa1
8a1737a
a3b1f61
ffe56d0
abd2bf0
79fff65
fb869ac
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
This file was deleted.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -82,16 +82,16 @@ you would like to contribute environments, please see our | |
|
||
 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Possible to link to this environment in the goal signal docs and the Changelog? Just in case a user wants an example of how to use these features |
||
|
||
- Set-up: A version of the classic grid-world task. Scene contains agent, goal, | ||
- Set-up: A multi-goal version of the grid-world task. Scene contains agent, goal, | ||
and obstacles. | ||
- Goal: The agent must navigate the grid to the goal while avoiding the | ||
obstacles. | ||
- Goal: The agent must navigate the grid to the appropriate goal while | ||
avoiding the obstacles. | ||
- Agents: The environment contains nine agents with the same Behavior | ||
Parameters. | ||
- Agent Reward Function: | ||
- -0.01 for every step. | ||
- +1.0 if the agent navigates to the goal position of the grid (episode ends). | ||
- -1.0 if the agent navigates to an obstacle (episode ends). | ||
- +1.0 if the agent navigates to the correct goal (episode ends). | ||
- -1.0 if the agent navigates to an incorrect goal (episode ends). | ||
- Behavior Parameters: | ||
- Vector Observation space: None | ||
- Actions: 1 discrete action branch with 5 actions, corresponding to movement in | ||
|
@@ -101,8 +101,10 @@ you would like to contribute environments, please see our | |
checkbox within the `trueAgent` GameObject). The trained model file provided | ||
was generated with action masking turned on. | ||
- Visual Observations: One corresponding to top-down view of GridWorld. | ||
- Float Properties: Three, corresponding to grid size, number of obstacles, and | ||
number of goals. | ||
- Goal Signal : A one hot vector corresponding to which color is the correct goal | ||
for the Agent | ||
- Float Properties: Three, corresponding to grid size, number of green goals, and | ||
number of red goals. | ||
- Benchmark Mean Reward: 0.8 | ||
|
||
## Push Block | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen somewhere else? It feels like abuse of CollectObservations(), since it's not touching the input VectorSensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VectorSensor is null here, I do not see an issue with this. Goal Signal is an observation, so it makes sense to me that it is called in CollectObservation.
Would it be better if I put this logic into a
CollectGoal
method with no arguments that I call in CollectObservations ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CollectGoal is maybe for the example (but let's not add it Agent). Let me think about a better way.
One problem (which I didn't realize until now) is that we don't check for null CollectObservationsSensor during the normal update step:
ml-agents/com.unity.ml-agents/Runtime/Agent.cs
Line 1062 in e4e9c51
but we do check for null when the agent is done:
ml-agents/com.unity.ml-agents/Runtime/Agent.cs
Lines 563 to 571 in e4e9c51