handle multiple dones in a single step #3700

chriselion · 2020-03-26T22:00:55Z

Proposed change(s)

Fix two bugs in the gym interface that could happen if an Agent called EndEpisode() multiple times in the same step:

The bookkeeping would try to remove the "transient" agent before it was added, which would fail. We also need to avoid adding it since it's already done.
For a single Agent, the checks in _sanitize_info() was too strict - we could have the case where step_result.n_agents()=3 self._n_agents=1 n_extra_agents=2 because there we 2 "done" agents in the step.

This would show up with stack traces:

Traceback (most recent call last):
  File "test_gym.py", line 38, in <module>
    observation, reward, done, info = multi_env.step(multi_env.action_space.sample())
  File "/Users/chris.elion/code/ml-agents/gym-unity/gym_unity/envs/__init__.py", line 222, in step
    step_result = self._step()
  File "/Users/chris.elion/code/ml-agents/gym-unity/gym_unity/envs/__init__.py", line 449, in _step
    return self._sanitize_info(info)
  File "/Users/chris.elion/code/ml-agents/gym-unity/gym_unity/envs/__init__.py", line 371, in _sanitize_info
    "The number of agents in the scene does not match the expected number."
gym_unity.envs.UnityGymException: The number of agents in the scene does not match the expected number.

with multiagent=False, or

Traceback (most recent call last):
  File "test_gym.py", line 32, in <module>
    observations, rewards, dones, info = multi_env.step(actions)
  File "/Users/chris.elion/code/ml-agents/gym-unity/gym_unity/envs/__init__.py", line 222, in step
    step_result = self._step()
  File "/Users/chris.elion/code/ml-agents/gym-unity/gym_unity/envs/__init__.py", line 445, in _step
    self.agent_mapper.mark_agent_done(agent_id, reward)
  File "/Users/chris.elion/code/ml-agents/gym-unity/gym_unity/envs/__init__.py", line 531, in mark_agent_done
    gym_index = self._agent_id_to_gym_index.pop(agent_id)
KeyError: 2

with multiagent=True

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

https://forum.unity.com/threads/errors-when-using-gym-unity-0-14-1-and-0-15-0.852586/

#3531 (comment)

https://jira.unity3d.com/browse/MLA-805

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

chriselion · 2020-03-26T22:02:33Z

gym-unity/gym_unity/envs/__init__.py

-        gym_index = self._agent_id_to_gym_index.pop(agent_id)
-        self._done_agents_index_to_last_reward[gym_index] = reward
+        if agent_id in self._agent_id_to_gym_index:
+            gym_index = self._agent_id_to_gym_index.pop(agent_id)


This is where the reported exceptions are. We could also make a check before the call to mark_agent_done, in _sanitize_info() but it seemed a bit cleaner to do it here.

chriselion · 2020-03-26T22:03:26Z

gym-unity/gym_unity/envs/__init__.py

@@ -386,6 +385,10 @@ def _sanitize_info(self, step_result: BatchedStepResult) -> BatchedStepResult:
        # only cares about the ordering.
        for index, agent_id in enumerate(step_result.agent_id):
            if not self._previous_step_result.contains_agent(agent_id):
+                if step_result.done[index]:


Just making the mark_agent_done() change without this means that we'll register the agent and run out of "slots" later.

chriselion · 2020-03-26T22:04:11Z

gym-unity/gym_unity/envs/__init__.py

@@ -364,9 +364,8 @@ def _check_agents(self, n_agents: int) -> None:

    def _sanitize_info(self, step_result: BatchedStepResult) -> BatchedStepResult:
        n_extra_agents = step_result.n_agents() - self._n_agents
-        if n_extra_agents < 0 or n_extra_agents > self._n_agents:
+        if n_extra_agents < 0:


I'm not 100% it's safe to remove this, but it's what was triggering in the single agent case (with 2 "done" agents)

ervteng

Looks good - though we really need to simplify the batched step result output, the gym wrapper is getting pretty fat

…iple-episode-dones

chriselion · 2020-03-27T00:22:00Z

Bypassing C# yamato tests.

* handle multiple dones in a single step

* [bug-fix] Increase height of wall in CrawlerStatic (#3650) * [bug-fix] Improve performance for PPO with continuous actions (#3662) * Corrected a typo in a name of a function (#3670) OnEpsiodeBegin was corrected to OnEpisodeBegin in Migrating.md document * Add Academy.AutomaticSteppingEnabled to migration (#3666) * Fix editor port in Dockerfile (#3674) * Hotfix memory leak on Python (#3664) * Hotfix memory leak on Python * Fixing * Fixing a bug in the heuristic policy. A decision should not be requested when the agent is done * [bug-fix] Make Python able to deal with 0-step episodes (#3671) * adding some comments Co-authored-by: Ervin T <ervin@unity3d.com> * Remove vis_encode_type from list of required (#3677) * Update changelog (#3678) * Shorten timeout duration for environment close (#3679) The timeout duration for closing an environment was set to the same duration as the timeout when waiting for a response from the still-running environment. This led to long waits for the error response when communication version wasn't matching. This change forces a timeout duration of 0 when handling errors. * Bumping the versions * handle multiple dones in a single step (#3700) * handle multiple dones in a single step * [tests] Make end-to-end tests more stable (#3697) * [bug-fix] Fix entropy computation for GaussianDistribution (#3684) * Fix how we set logging levels (#3703) * cleanup logging * comments and cleanup * pylint, gym * [skip-ci] Update changelog for logging fix. (#3707) * [skip ci] Update README * [skip ci] Fixed a typo Co-authored-by: Ervin T <ervin@unity3d.com> Co-authored-by: Adam Streck <adam.streck@gmail.com> Co-authored-by: Chris Elion <chris.elion@unity3d.com> Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com>

* Bumping version on the release (#3615) * Update examples project to 2018.4.18f1 (#3618) From 2018.4.14f1. An internal package dependency was updated as a side effect. * Remove dead components from the examples scenes (#3619) (#3624) * Improve warnings and exception if using unsupported combo * add meta file * fix unit test * enforce onnx conversion (expect tf2 CI to fail) (#3600) * Update error message * Updated the release branch docs (#3621) * Updated the release branch docs * Edited the README * make sure top-level timer is closed before writing * Remove space from Product Name for examples In #2588 it was suggested that the space in the Product Name for our example environments causes confusion when using a default build because of the need to escape the space in the build filename. This change removes the space from the Product Name in the project's player settings. * [bug-fix] Increase 3dballhard and GAIL default steps (#3636) * Updating the NN models (#3632) * Updating the NN models * Update gridworld * [skip ci] Update BallHard * Update hallway * Hotfixes for Release 0.15.1 (#3698) * [bug-fix] Increase height of wall in CrawlerStatic (#3650) * [bug-fix] Improve performance for PPO with continuous actions (#3662) * Corrected a typo in a name of a function (#3670) OnEpsiodeBegin was corrected to OnEpisodeBegin in Migrating.md document * Add Academy.AutomaticSteppingEnabled to migration (#3666) * Fix editor port in Dockerfile (#3674) * Hotfix memory leak on Python (#3664) * Hotfix memory leak on Python * Fixing * Fixing a bug in the heuristic policy. A decision should not be requested when the agent is done * [bug-fix] Make Python able to deal with 0-step episodes (#3671) * adding some comments Co-authored-by: Ervin T <ervin@unity3d.com> * Remove vis_encode_type from list of required (#3677) * Update changelog (#3678) * Shorten timeout duration for environment close (#3679) The timeout duration for closing an environment was set to the same duration as the timeout when waiting for a response from the still-running environment. This led to long waits for the error response when communication version wasn't matching. This change forces a timeout duration of 0 when handling errors. * Bumping the versions * handle multiple dones in a single step (#3700) * handle multiple dones in a single step * [tests] Make end-to-end tests more stable (#3697) * [bug-fix] Fix entropy computation for GaussianDistribution (#3684) * Fix how we set logging levels (#3703) * cleanup logging * comments and cleanup * pylint, gym * [skip-ci] Update changelog for logging fix. (#3707) * [skip ci] Update README * [skip ci] Fixed a typo Co-authored-by: Ervin T <ervin@unity3d.com> Co-authored-by: Adam Streck <adam.streck@gmail.com> Co-authored-by: Chris Elion <chris.elion@unity3d.com> Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com> * fix changelog * keep master gridworld Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com> Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com> Co-authored-by: Ervin T <ervin@unity3d.com> Co-authored-by: Adam Streck <adam.streck@gmail.com>

handle multiple dones in a single step

812ccfc

chriselion commented Mar 26, 2020

View reviewed changes

chriselion requested review from vincentpierre and ervteng March 26, 2020 22:04

vincentpierre approved these changes Mar 26, 2020

View reviewed changes

ervteng approved these changes Mar 26, 2020

View reviewed changes

Chris Elion added 4 commits March 26, 2020 16:07

test new agents done in same step

eb7a460

handle in reference impl too

6415d11

update changelog

6e1e82f

Merge remote-tracking branch 'origin/master' into MLA-805-handle-mult…

b61fa76

…iple-episode-dones

chriselion mentioned this pull request Mar 26, 2020

Multiagent Bug in Gym-Unity 14.1 #3531

Closed

chriselion merged commit 1161d33 into master Mar 27, 2020

delete-merged-branch bot deleted the MLA-805-handle-multiple-episode-dones branch March 27, 2020 00:22

vincentpierre pushed a commit that referenced this pull request Mar 27, 2020

handle multiple dones in a single step (#3700)

cd01450

* handle multiple dones in a single step

github-actions bot locked as resolved and limited conversation to collaborators May 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle multiple dones in a single step #3700

handle multiple dones in a single step #3700

chriselion commented Mar 26, 2020 •

edited

Loading

chriselion Mar 26, 2020 •

edited

Loading

chriselion Mar 26, 2020

chriselion Mar 26, 2020

ervteng left a comment

chriselion commented Mar 27, 2020

handle multiple dones in a single step #3700

handle multiple dones in a single step #3700

Conversation

chriselion commented Mar 26, 2020 • edited Loading

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Other comments

chriselion Mar 26, 2020 • edited Loading

Choose a reason for hiding this comment

chriselion Mar 26, 2020

Choose a reason for hiding this comment

chriselion Mar 26, 2020

Choose a reason for hiding this comment

ervteng left a comment

Choose a reason for hiding this comment

chriselion commented Mar 27, 2020

chriselion commented Mar 26, 2020 •

edited

Loading

chriselion Mar 26, 2020 •

edited

Loading