Hotfixes for Release 0.15.1 (#3698)

* [bug-fix] Increase height of wall in CrawlerStatic (#3650) * [bug-fix] Improve performance for PPO with continuous actions (#3662) * Corrected a typo in a name of a function (#3670) OnEpsiodeBegin was corrected to OnEpisodeBegin in Migrating.md document * Add Academy.AutomaticSteppingEnabled to migration (#3666) * Fix editor port in Dockerfile (#3674) * Hotfix memory leak on Python (#3664) * Hotfix memory leak on Python * Fixing * Fixing a bug in the heuristic policy. A decision should not be requested when the agent is done * [bug-fix] Make Python able to deal with 0-step episodes (#3671) * adding some comments Co-authored-by: Ervin T <ervin@unity3d.com> * Remove vis_encode_type from list of required (#3677) * Update changelog (#3678) * Shorten timeout duration for environment close (#3679) The timeout duration for closing an environment was set to the same duration as the timeout when waiting for a response from the still-running environment. This led to long waits for the error response when communication version wasn't matching. This change forces a timeout duration of 0 when handling errors. * Bumping the versions * handle multiple dones in a single step (#3700) * handle multiple dones in a single step * [tests] Make end-to-end tests more stable (#3697) * [bug-fix] Fix entropy computation for GaussianDistribution (#3684) * Fix how we set logging levels (#3703) * cleanup logging * comments and cleanup * pylint, gym * [skip-ci] Update changelog for logging fix. (#3707) * [skip ci] Update README * [skip ci] Fixed a typo Co-authored-by: Ervin T <ervin@unity3d.com> Co-authored-by: Adam Streck <adam.streck@gmail.com> Co-authored-by: Chris Elion <chris.elion@unity3d.com> Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com>
Unity-Technologies · Mar 30, 2020 · 377a25c · 377a25c
1 parent 7507a5d
commit 377a25c
Show file tree

Hide file tree

Showing 45 changed files with 320 additions and 147 deletions.
diff --git a/.pylintrc b/.pylintrc
@@ -44,3 +44,5 @@ disable =
     # Appears to be https://github.com/PyCQA/pylint/issues/2981
     W0201,
 
+    # Using the global statement
+    W0603,
diff --git a/Dockerfile b/Dockerfile
@@ -132,7 +132,9 @@ COPY ml-agents /ml-agents
 WORKDIR /ml-agents
 RUN pip install -e .
 
-# port 5005 is the port used in in Editor training.
-EXPOSE 5005
+# Port 5004 is the port used in in Editor training.
+# Environments will start from port 5005, 
+# so allow enough ports for several environments.
+EXPOSE 5004-5050
 
 ENTRYPOINT ["mlagents-learn"]
diff --git a/Project/Assets/ML-Agents/Examples/Crawler/Prefabs/FixedPlatform.prefab b/Project/Assets/ML-Agents/Examples/Crawler/Prefabs/FixedPlatform.prefab
@@ -1690,8 +1690,8 @@ MonoBehaviour:
   m_InferenceDevice: 0
   m_BehaviorType: 0
   m_BehaviorName: CrawlerStatic
-  m_TeamID: 0
-  m_useChildSensors: 1
+  TeamId: 0
+  m_UseChildSensors: 1
 --- !u!114 &114230237520033992
 MonoBehaviour:
   m_ObjectHideFlags: 0
@@ -1704,6 +1704,9 @@ MonoBehaviour:
   m_Script: {fileID: 11500000, guid: 2f37c30a5e8d04117947188818902ef3, type: 3}
   m_Name: 
   m_EditorClassIdentifier: 
+  agentParameters:
+    maxStep: 0
+  hasUpgradedFromAgentParameters: 1
   maxStep: 5000
   target: {fileID: 4749909135913778}
   ground: {fileID: 4856650706546504}
@@ -1759,7 +1762,7 @@ MonoBehaviour:
   m_Name: 
   m_EditorClassIdentifier: 
   DecisionPeriod: 5
-  RepeatAction: 0
+  TakeActionsBetweenDecisions: 0
   offsetStep: 0
 --- !u!1 &1492926997393242
 GameObject:
@@ -2959,8 +2962,8 @@ Transform:
   m_PrefabAsset: {fileID: 0}
   m_GameObject: {fileID: 1995322274649904}
   m_LocalRotation: {x: 0, y: -0, z: -0, w: 1}
-  m_LocalPosition: {x: -0, y: 0.5, z: 0}
-  m_LocalScale: {x: 0.01, y: 0.01, z: 0.01}
+  m_LocalPosition: {x: -0, y: 1.5, z: 0}
+  m_LocalScale: {x: 0.01, y: 0.03, z: 0.01}
   m_Children: []
   m_Father: {fileID: 4924174722017668}
   m_RootOrder: 1

diff --git a/README.md b/README.md
@@ -44,7 +44,7 @@ developer communities.
 * Train using concurrent Unity environment instances
 
 ## Releases & Documentation
-**Our latest, stable release is 0.15.0. Click
+**Our latest, stable release is 0.15.1. Click
 [here](docs/Readme.md) to
 get started with the latest release of ML-Agents.**
 
@@ -61,6 +61,7 @@ details of the changes between versions.
 
 | **Version** | **Release Date** | **Source** | **Documentation** | **Download** |
 |:-------:|:------:|:-------------:|:-------:|:------------:|
+| **0.15.0** | March 18, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.15.0) |  [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.15.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.15.0.zip) |
 | **0.14.1** | February 26, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.14.1) |  [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.14.1/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.14.1.zip) |
 | **0.14.0**  | February 13, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.14.0) |  [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.14.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.14.0.zip) |
 | **0.13.1**  | January 21, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.13.1) |  [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.13.1/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.13.1.zip) |

diff --git a/com.unity.ml-agents/CHANGELOG.md b/com.unity.ml-agents/CHANGELOG.md
@@ -5,6 +5,17 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
 
 
+## [0.15.1-preview] - 2020-03-30
+### Bug Fixes
+ - Raise the wall in CrawlerStatic scene to prevent Agent from falling off. (#3650)
+ - Fixed an issue where specifying `vis_encode_type` was required only for SAC. (#3677)
+ - Fixed the reported entropy values for continuous actions (#3684)
+ - Fixed an issue where switching models using `SetModel()` during training would use an excessive amount of memory. (#3664)
+ - Environment subprocesses now close immediately on timeout or wrong API version. (#3679)
+ - Fixed an issue in the gym wrapper that would raise an exception if an Agent called EndEpisode multiple times in the same step. (#3700)
+ - Fixed an issue where logging output was not visible; logging levels are now set consistently (#3703).
+
+
 ## [0.15.0-preview] - 2020-03-18
 ### Major Changes
  - `Agent.CollectObservations` now takes a VectorSensor argument. (#3352, #3389)

diff --git a/com.unity.ml-agents/Runtime/Academy.cs b/com.unity.ml-agents/Runtime/Academy.cs
@@ -64,7 +64,7 @@ public class Academy : IDisposable
         /// Unity package version of com.unity.ml-agents.
         /// This must match the version string in package.json and is checked in a unit test.
         /// </summary>
-        internal const string k_PackageVersion = "0.15.0-preview";
+        internal const string k_PackageVersion = "0.15.1-preview";
 
         const int k_EditorTrainingPort = 5004;
 

diff --git a/com.unity.ml-agents/Runtime/Agent.cs b/com.unity.ml-agents/Runtime/Agent.cs
@@ -315,6 +315,7 @@ protected virtual void OnDisable()
 
         void NotifyAgentDone(DoneReason doneReason)
         {
+            m_Info.episodeId = m_EpisodeId;
             m_Info.reward = m_Reward;
             m_Info.done = true;
             m_Info.maxStepReached = doneReason == DoneReason.MaxStepReached;
@@ -376,7 +377,7 @@ public void SetModel(
                 // If everything is the same, don't make any changes.
                 return;
             }
-
+            NotifyAgentDone(DoneReason.Disabled);
             m_PolicyFactory.model = model;
             m_PolicyFactory.inferenceDevice = inferenceDevice;
             m_PolicyFactory.behaviorName = behaviorName;

diff --git a/com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs b/com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs
@@ -458,13 +458,20 @@ UnityRLInitializationOutputProto GetTempUnityRlInitializationOutput()
             {
                 if (m_CurrentUnityRlOutput.AgentInfos.ContainsKey(behaviorName))
                 {
-                    if (output == null)
+                    if (m_CurrentUnityRlOutput.AgentInfos[behaviorName].CalculateSize() > 0)
                     {
-                        output = new UnityRLInitializationOutputProto();
-                    }
+                        // Only send the BrainParameters if there is a non empty list of
+                        // AgentInfos ready to be sent.
+                        // This is to ensure that The Python side will always have a first
+                        // observation when receiving the BrainParameters
+                        if (output == null)
+                        {
+                            output = new UnityRLInitializationOutputProto();
+                        }
 
-                    var brainParameters = m_UnsentBrainKeys[behaviorName];
-                    output.BrainParameters.Add(brainParameters.ToProto(behaviorName, true));
+                        var brainParameters = m_UnsentBrainKeys[behaviorName];
+                        output.BrainParameters.Add(brainParameters.ToProto(behaviorName, true));
+                    }
                 }
             }
 

diff --git a/com.unity.ml-agents/Runtime/Policies/HeuristicPolicy.cs b/com.unity.ml-agents/Runtime/Policies/HeuristicPolicy.cs
@@ -29,7 +29,10 @@ public HeuristicPolicy(Func<float[]> heuristic)
         public void RequestDecision(AgentInfo info, List<ISensor> sensors)
         {
             StepSensors(sensors);
-            m_LastDecision = m_Heuristic.Invoke();
+            if (!info.done)
+            {
+                m_LastDecision = m_Heuristic.Invoke();
+            }
         }
 
         /// <inheritdoc />

diff --git a/com.unity.ml-agents/package.json b/com.unity.ml-agents/package.json
@@ -1,7 +1,7 @@
 {
   "name": "com.unity.ml-agents",
   "displayName": "ML Agents",
-  "version": "0.15.0-preview",
+  "version": "0.15.1-preview",
   "unity": "2018.4",
   "description": "Add interactivity to your game with Machine Learning Agents trained using Deep Reinforcement Learning.",
   "dependencies": {

diff --git a/docs/Migrating.md b/docs/Migrating.md
@@ -34,6 +34,7 @@ The versions can be found in
 * The interface for SideChannels was changed:
   * In C#, `OnMessageReceived` now takes a `IncomingMessage` argument, and `QueueMessageToSend` takes an `OutgoingMessage` argument.
   * In python, `on_message_received` now takes a `IncomingMessage` argument, and `queue_message_to_send` takes an `OutgoingMessage` argument.
+  * Automatic stepping for Academy is now controlled from the AutomaticSteppingEnabled property.
 
 ### Steps to Migrate
 * Add the `using MLAgents.Sensors;` in addition to `using MLAgents;` on top of your Agent's script.
@@ -45,11 +46,12 @@ The versions can be found in
 * We strongly recommend replacing the following methods with their new equivalent as they will be removed in a later release:
   * `InitializeAgent()` to `Initialize()`
   * `AgentAction()` to `OnActionReceived()`
-  * `AgentReset()` to `OnEpsiodeBegin()`
+  * `AgentReset()` to `OnEpisodeBegin()`
   * `Done()` to `EndEpisode()`
   * `GiveModel()` to `SetModel()`
 * Replace `IFloatProperties` variables with `FloatPropertiesChannel` variables.
 * If you implemented custom `SideChannels`, update the signatures of your methods, and add your data to the `OutgoingMessage` or read it from the `IncomingMessage`.
+* Replace calls to Academy.EnableAutomaticStepping()/DisableAutomaticStepping() with Academy.AutomaticSteppingEnabled = true/false.
 
 ## Migrating from 0.13 to 0.14
 

diff --git a/gym-unity/gym_unity/__init__.py b/gym-unity/gym_unity/__init__.py
@@ -1 +1 @@
-__version__ = "0.15.0"
+__version__ = "0.15.1"
diff --git a/gym-unity/gym_unity/envs/__init__.py b/gym-unity/gym_unity/envs/__init__.py
@@ -1,4 +1,3 @@
-import logging
 import itertools
 import numpy as np
 from typing import Any, Dict, List, Optional, Tuple, Union
@@ -8,6 +7,7 @@
 
 from mlagents_envs.environment import UnityEnvironment
 from mlagents_envs.base_env import BatchedStepResult
+from mlagents_envs import logging_util
 
 
 class UnityGymException(error.Error):
@@ -18,9 +18,8 @@ class UnityGymException(error.Error):
     pass
 
 
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger("gym_unity")
-
+logger = logging_util.get_logger(__name__)
+logging_util.set_log_level(logging_util.INFO)
 
 GymSingleStepResult = Tuple[np.ndarray, float, bool, Dict]
 GymMultiStepResult = Tuple[List[np.ndarray], List[float], List[bool], Dict]
@@ -364,9 +363,8 @@ def _check_agents(self, n_agents: int) -> None:
 
     def _sanitize_info(self, step_result: BatchedStepResult) -> BatchedStepResult:
         n_extra_agents = step_result.n_agents() - self._n_agents
-        if n_extra_agents < 0 or n_extra_agents > self._n_agents:
+        if n_extra_agents < 0:
             # In this case, some Agents did not request a decision when expected
-            # or too many requested a decision
             raise UnityGymException(
                 "The number of agents in the scene does not match the expected number."
             )
@@ -386,6 +384,10 @@ def _sanitize_info(self, step_result: BatchedStepResult) -> BatchedStepResult:
         # only cares about the ordering.
         for index, agent_id in enumerate(step_result.agent_id):
             if not self._previous_step_result.contains_agent(agent_id):
+                if step_result.done[index]:
+                    # If the Agent is already done (e.g. it ended its epsiode twice in one step)
+                    # Don't try to register it here.
+                    continue
                 # Register this agent, and get the reward of the previous agent that
                 # was in its index, so that we can return it to the gym.
                 last_reward = self.agent_mapper.register_new_agent_id(agent_id)
@@ -528,8 +530,12 @@ def mark_agent_done(self, agent_id: int, reward: float) -> None:
         """
         Declare the agent done with the corresponding final reward.
         """
-        gym_index = self._agent_id_to_gym_index.pop(agent_id)
-        self._done_agents_index_to_last_reward[gym_index] = reward
+        if agent_id in self._agent_id_to_gym_index:
+            gym_index = self._agent_id_to_gym_index.pop(agent_id)
+            self._done_agents_index_to_last_reward[gym_index] = reward
+        else:
+            # Agent was never registered in the first place (e.g. EndEpisode called multiple times)
+            pass
 
     def register_new_agent_id(self, agent_id: int) -> float:
         """
@@ -581,9 +587,13 @@ def set_initial_agents(self, agent_ids: List[int]) -> None:
         self._gym_id_order = list(agent_ids)
 
     def mark_agent_done(self, agent_id: int, reward: float) -> None:
-        gym_index = self._gym_id_order.index(agent_id)
-        self._done_agents_index_to_last_reward[gym_index] = reward
-        self._gym_id_order[gym_index] = -1
+        try:
+            gym_index = self._gym_id_order.index(agent_id)
+            self._done_agents_index_to_last_reward[gym_index] = reward
+            self._gym_id_order[gym_index] = -1
+        except ValueError:
+            # Agent was never registered in the first place (e.g. EndEpisode called multiple times)
+            pass
 
     def register_new_agent_id(self, agent_id: int) -> float:
         original_index = self._gym_id_order.index(-1)

diff --git a/gym-unity/gym_unity/tests/test_gym.py b/gym-unity/gym_unity/tests/test_gym.py
@@ -129,6 +129,50 @@ def test_sanitize_action_one_agent_done(mock_env):
         assert expected_agent_id == agent_id
 
 
+@mock.patch("gym_unity.envs.UnityEnvironment")
+def test_sanitize_action_new_agent_done(mock_env):
+    mock_spec = create_mock_group_spec(
+        vector_action_space_type="discrete", vector_action_space_size=[2, 2, 3]
+    )
+    mock_step = create_mock_vector_step_result(num_agents=3)
+    mock_step.agent_id = np.array(range(5))
+    setup_mock_unityenvironment(mock_env, mock_spec, mock_step)
+    env = UnityEnv(" ", use_visual=False, multiagent=True)
+
+    received_step_result = create_mock_vector_step_result(num_agents=7)
+    received_step_result.agent_id = np.array(range(7))
+    # agent #3 (id = 2) is Done
+    # so is the "new" agent (id = 5)
+    done = [False] * 7
+    done[2] = True
+    done[5] = True
+    received_step_result.done = np.array(done)
+    sanitized_result = env._sanitize_info(received_step_result)
+    for expected_agent_id, agent_id in zip([0, 1, 6, 3, 4], sanitized_result.agent_id):
+        assert expected_agent_id == agent_id
+
+
+@mock.patch("gym_unity.envs.UnityEnvironment")
+def test_sanitize_action_single_agent_multiple_done(mock_env):
+    mock_spec = create_mock_group_spec(
+        vector_action_space_type="discrete", vector_action_space_size=[2, 2, 3]
+    )
+    mock_step = create_mock_vector_step_result(num_agents=1)
+    mock_step.agent_id = np.array(range(1))
+    setup_mock_unityenvironment(mock_env, mock_spec, mock_step)
+    env = UnityEnv(" ", use_visual=False, multiagent=False)
+
+    received_step_result = create_mock_vector_step_result(num_agents=3)
+    received_step_result.agent_id = np.array(range(3))
+    # original agent (id = 0) is Done
+    # so is the "new" agent (id = 1)
+    done = [True, True, False]
+    received_step_result.done = np.array(done)
+    sanitized_result = env._sanitize_info(received_step_result)
+    for expected_agent_id, agent_id in zip([2], sanitized_result.agent_id):
+        assert expected_agent_id == agent_id
+
+
 # Helper methods
 
 
@@ -200,6 +244,10 @@ def test_agent_id_index_mapper(mapper_cls):
     mapper.mark_agent_done(1001, 42.0)
     mapper.mark_agent_done(1004, 1337.0)
 
+    # Make sure we can handle an unknown agent id being marked done.
+    # This can happen when an agent ends an episode on the same step it starts.
+    mapper.mark_agent_done(9999, -1.0)
+
     # Now add new agents, and get the rewards of the agent they replaced.
     old_reward1 = mapper.register_new_agent_id(2001)
     old_reward2 = mapper.register_new_agent_id(2002)

diff --git a/ml-agents-envs/mlagents_envs/__init__.py b/ml-agents-envs/mlagents_envs/__init__.py
@@ -1 +1 @@
-__version__ = "0.15.0"
+__version__ = "0.15.1"
-Original file line number
+Diff line change
@@ Expand Up / @@ -44,3 +44,5 @@ disable = @@
         # Appears to be https://github.com/PyCQA/pylint/issues/2981
         W0201,
+        # Using the global statement
+        W0603,