Skip to content

Commit

Permalink
Hotfixes for Release 0.15.1 (#3698)
Browse files Browse the repository at this point in the history
* [bug-fix] Increase height of wall in CrawlerStatic (#3650)

* [bug-fix] Improve performance for PPO with continuous actions (#3662)

* Corrected a typo in a name of a function (#3670)

OnEpsiodeBegin was corrected to OnEpisodeBegin in Migrating.md document

* Add Academy.AutomaticSteppingEnabled to migration (#3666)

* Fix editor port in Dockerfile (#3674)

* Hotfix memory leak on Python (#3664)

* Hotfix memory leak on Python

* Fixing

* Fixing a bug in the heuristic policy. A decision should not be requested when the agent is done

* [bug-fix] Make Python able to deal with 0-step episodes (#3671)

* adding some comments

Co-authored-by: Ervin T <ervin@unity3d.com>

* Remove vis_encode_type from list of required (#3677)

* Update changelog (#3678)

* Shorten timeout duration for environment close (#3679)

The timeout duration for closing an environment was set to the
same duration as the timeout when waiting for a response from the
still-running environment.  This led to long waits for the error
response when communication version wasn't matching.

This change forces a timeout duration of 0 when handling errors.

* Bumping the versions

* handle multiple dones in a single step (#3700)

* handle multiple dones in a single step

* [tests] Make end-to-end tests more stable (#3697)

* [bug-fix] Fix entropy computation for GaussianDistribution (#3684)

* Fix how we set logging levels (#3703)

* cleanup logging

* comments and cleanup

* pylint, gym

* [skip-ci] Update changelog for logging fix. (#3707)

* [skip ci] Update README

* [skip ci] Fixed a typo

Co-authored-by: Ervin T <ervin@unity3d.com>
Co-authored-by: Adam Streck <adam.streck@gmail.com>
Co-authored-by: Chris Elion <chris.elion@unity3d.com>
Co-authored-by: Jonathan Harper <jharper+moar@unity3d.com>
  • Loading branch information
5 people authored Mar 30, 2020
1 parent 7507a5d commit 377a25c
Show file tree
Hide file tree
Showing 45 changed files with 320 additions and 147 deletions.
2 changes: 2 additions & 0 deletions .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,5 @@ disable =
# Appears to be https://github.com/PyCQA/pylint/issues/2981
W0201,

# Using the global statement
W0603,
6 changes: 4 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,9 @@ COPY ml-agents /ml-agents
WORKDIR /ml-agents
RUN pip install -e .

# port 5005 is the port used in in Editor training.
EXPOSE 5005
# Port 5004 is the port used in in Editor training.
# Environments will start from port 5005,
# so allow enough ports for several environments.
EXPOSE 5004-5050

ENTRYPOINT ["mlagents-learn"]
Original file line number Diff line number Diff line change
Expand Up @@ -1690,8 +1690,8 @@ MonoBehaviour:
m_InferenceDevice: 0
m_BehaviorType: 0
m_BehaviorName: CrawlerStatic
m_TeamID: 0
m_useChildSensors: 1
TeamId: 0
m_UseChildSensors: 1
--- !u!114 &114230237520033992
MonoBehaviour:
m_ObjectHideFlags: 0
Expand All @@ -1704,6 +1704,9 @@ MonoBehaviour:
m_Script: {fileID: 11500000, guid: 2f37c30a5e8d04117947188818902ef3, type: 3}
m_Name:
m_EditorClassIdentifier:
agentParameters:
maxStep: 0
hasUpgradedFromAgentParameters: 1
maxStep: 5000
target: {fileID: 4749909135913778}
ground: {fileID: 4856650706546504}
Expand Down Expand Up @@ -1759,7 +1762,7 @@ MonoBehaviour:
m_Name:
m_EditorClassIdentifier:
DecisionPeriod: 5
RepeatAction: 0
TakeActionsBetweenDecisions: 0
offsetStep: 0
--- !u!1 &1492926997393242
GameObject:
Expand Down Expand Up @@ -2959,8 +2962,8 @@ Transform:
m_PrefabAsset: {fileID: 0}
m_GameObject: {fileID: 1995322274649904}
m_LocalRotation: {x: 0, y: -0, z: -0, w: 1}
m_LocalPosition: {x: -0, y: 0.5, z: 0}
m_LocalScale: {x: 0.01, y: 0.01, z: 0.01}
m_LocalPosition: {x: -0, y: 1.5, z: 0}
m_LocalScale: {x: 0.01, y: 0.03, z: 0.01}
m_Children: []
m_Father: {fileID: 4924174722017668}
m_RootOrder: 1
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ developer communities.
* Train using concurrent Unity environment instances

## Releases & Documentation
**Our latest, stable release is 0.15.0. Click
**Our latest, stable release is 0.15.1. Click
[here](docs/Readme.md) to
get started with the latest release of ML-Agents.**

Expand All @@ -61,6 +61,7 @@ details of the changes between versions.

| **Version** | **Release Date** | **Source** | **Documentation** | **Download** |
|:-------:|:------:|:-------------:|:-------:|:------------:|
| **0.15.0** | March 18, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.15.0) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.15.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.15.0.zip) |
| **0.14.1** | February 26, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.14.1) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.14.1/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.14.1.zip) |
| **0.14.0** | February 13, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.14.0) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.14.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.14.0.zip) |
| **0.13.1** | January 21, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.13.1) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.13.1/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.13.1.zip) |
Expand Down
11 changes: 11 additions & 0 deletions com.unity.ml-agents/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,17 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).


## [0.15.1-preview] - 2020-03-30
### Bug Fixes
- Raise the wall in CrawlerStatic scene to prevent Agent from falling off. (#3650)
- Fixed an issue where specifying `vis_encode_type` was required only for SAC. (#3677)
- Fixed the reported entropy values for continuous actions (#3684)
- Fixed an issue where switching models using `SetModel()` during training would use an excessive amount of memory. (#3664)
- Environment subprocesses now close immediately on timeout or wrong API version. (#3679)
- Fixed an issue in the gym wrapper that would raise an exception if an Agent called EndEpisode multiple times in the same step. (#3700)
- Fixed an issue where logging output was not visible; logging levels are now set consistently (#3703).


## [0.15.0-preview] - 2020-03-18
### Major Changes
- `Agent.CollectObservations` now takes a VectorSensor argument. (#3352, #3389)
Expand Down
2 changes: 1 addition & 1 deletion com.unity.ml-agents/Runtime/Academy.cs
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ public class Academy : IDisposable
/// Unity package version of com.unity.ml-agents.
/// This must match the version string in package.json and is checked in a unit test.
/// </summary>
internal const string k_PackageVersion = "0.15.0-preview";
internal const string k_PackageVersion = "0.15.1-preview";

const int k_EditorTrainingPort = 5004;

Expand Down
3 changes: 2 additions & 1 deletion com.unity.ml-agents/Runtime/Agent.cs
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,7 @@ protected virtual void OnDisable()

void NotifyAgentDone(DoneReason doneReason)
{
m_Info.episodeId = m_EpisodeId;
m_Info.reward = m_Reward;
m_Info.done = true;
m_Info.maxStepReached = doneReason == DoneReason.MaxStepReached;
Expand Down Expand Up @@ -376,7 +377,7 @@ public void SetModel(
// If everything is the same, don't make any changes.
return;
}

NotifyAgentDone(DoneReason.Disabled);
m_PolicyFactory.model = model;
m_PolicyFactory.inferenceDevice = inferenceDevice;
m_PolicyFactory.behaviorName = behaviorName;
Expand Down
17 changes: 12 additions & 5 deletions com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs
Original file line number Diff line number Diff line change
Expand Up @@ -458,13 +458,20 @@ UnityRLInitializationOutputProto GetTempUnityRlInitializationOutput()
{
if (m_CurrentUnityRlOutput.AgentInfos.ContainsKey(behaviorName))
{
if (output == null)
if (m_CurrentUnityRlOutput.AgentInfos[behaviorName].CalculateSize() > 0)
{
output = new UnityRLInitializationOutputProto();
}
// Only send the BrainParameters if there is a non empty list of
// AgentInfos ready to be sent.
// This is to ensure that The Python side will always have a first
// observation when receiving the BrainParameters
if (output == null)
{
output = new UnityRLInitializationOutputProto();
}

var brainParameters = m_UnsentBrainKeys[behaviorName];
output.BrainParameters.Add(brainParameters.ToProto(behaviorName, true));
var brainParameters = m_UnsentBrainKeys[behaviorName];
output.BrainParameters.Add(brainParameters.ToProto(behaviorName, true));
}
}
}

Expand Down
5 changes: 4 additions & 1 deletion com.unity.ml-agents/Runtime/Policies/HeuristicPolicy.cs
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,10 @@ public HeuristicPolicy(Func<float[]> heuristic)
public void RequestDecision(AgentInfo info, List<ISensor> sensors)
{
StepSensors(sensors);
m_LastDecision = m_Heuristic.Invoke();
if (!info.done)
{
m_LastDecision = m_Heuristic.Invoke();
}
}

/// <inheritdoc />
Expand Down
2 changes: 1 addition & 1 deletion com.unity.ml-agents/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "com.unity.ml-agents",
"displayName": "ML Agents",
"version": "0.15.0-preview",
"version": "0.15.1-preview",
"unity": "2018.4",
"description": "Add interactivity to your game with Machine Learning Agents trained using Deep Reinforcement Learning.",
"dependencies": {
Expand Down
4 changes: 3 additions & 1 deletion docs/Migrating.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ The versions can be found in
* The interface for SideChannels was changed:
* In C#, `OnMessageReceived` now takes a `IncomingMessage` argument, and `QueueMessageToSend` takes an `OutgoingMessage` argument.
* In python, `on_message_received` now takes a `IncomingMessage` argument, and `queue_message_to_send` takes an `OutgoingMessage` argument.
* Automatic stepping for Academy is now controlled from the AutomaticSteppingEnabled property.

### Steps to Migrate
* Add the `using MLAgents.Sensors;` in addition to `using MLAgents;` on top of your Agent's script.
Expand All @@ -45,11 +46,12 @@ The versions can be found in
* We strongly recommend replacing the following methods with their new equivalent as they will be removed in a later release:
* `InitializeAgent()` to `Initialize()`
* `AgentAction()` to `OnActionReceived()`
* `AgentReset()` to `OnEpsiodeBegin()`
* `AgentReset()` to `OnEpisodeBegin()`
* `Done()` to `EndEpisode()`
* `GiveModel()` to `SetModel()`
* Replace `IFloatProperties` variables with `FloatPropertiesChannel` variables.
* If you implemented custom `SideChannels`, update the signatures of your methods, and add your data to the `OutgoingMessage` or read it from the `IncomingMessage`.
* Replace calls to Academy.EnableAutomaticStepping()/DisableAutomaticStepping() with Academy.AutomaticSteppingEnabled = true/false.

## Migrating from 0.13 to 0.14

Expand Down
2 changes: 1 addition & 1 deletion gym-unity/gym_unity/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.15.0"
__version__ = "0.15.1"
32 changes: 21 additions & 11 deletions gym-unity/gym_unity/envs/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
import logging
import itertools
import numpy as np
from typing import Any, Dict, List, Optional, Tuple, Union
Expand All @@ -8,6 +7,7 @@

from mlagents_envs.environment import UnityEnvironment
from mlagents_envs.base_env import BatchedStepResult
from mlagents_envs import logging_util


class UnityGymException(error.Error):
Expand All @@ -18,9 +18,8 @@ class UnityGymException(error.Error):
pass


logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("gym_unity")

logger = logging_util.get_logger(__name__)
logging_util.set_log_level(logging_util.INFO)

GymSingleStepResult = Tuple[np.ndarray, float, bool, Dict]
GymMultiStepResult = Tuple[List[np.ndarray], List[float], List[bool], Dict]
Expand Down Expand Up @@ -364,9 +363,8 @@ def _check_agents(self, n_agents: int) -> None:

def _sanitize_info(self, step_result: BatchedStepResult) -> BatchedStepResult:
n_extra_agents = step_result.n_agents() - self._n_agents
if n_extra_agents < 0 or n_extra_agents > self._n_agents:
if n_extra_agents < 0:
# In this case, some Agents did not request a decision when expected
# or too many requested a decision
raise UnityGymException(
"The number of agents in the scene does not match the expected number."
)
Expand All @@ -386,6 +384,10 @@ def _sanitize_info(self, step_result: BatchedStepResult) -> BatchedStepResult:
# only cares about the ordering.
for index, agent_id in enumerate(step_result.agent_id):
if not self._previous_step_result.contains_agent(agent_id):
if step_result.done[index]:
# If the Agent is already done (e.g. it ended its epsiode twice in one step)
# Don't try to register it here.
continue
# Register this agent, and get the reward of the previous agent that
# was in its index, so that we can return it to the gym.
last_reward = self.agent_mapper.register_new_agent_id(agent_id)
Expand Down Expand Up @@ -528,8 +530,12 @@ def mark_agent_done(self, agent_id: int, reward: float) -> None:
"""
Declare the agent done with the corresponding final reward.
"""
gym_index = self._agent_id_to_gym_index.pop(agent_id)
self._done_agents_index_to_last_reward[gym_index] = reward
if agent_id in self._agent_id_to_gym_index:
gym_index = self._agent_id_to_gym_index.pop(agent_id)
self._done_agents_index_to_last_reward[gym_index] = reward
else:
# Agent was never registered in the first place (e.g. EndEpisode called multiple times)
pass

def register_new_agent_id(self, agent_id: int) -> float:
"""
Expand Down Expand Up @@ -581,9 +587,13 @@ def set_initial_agents(self, agent_ids: List[int]) -> None:
self._gym_id_order = list(agent_ids)

def mark_agent_done(self, agent_id: int, reward: float) -> None:
gym_index = self._gym_id_order.index(agent_id)
self._done_agents_index_to_last_reward[gym_index] = reward
self._gym_id_order[gym_index] = -1
try:
gym_index = self._gym_id_order.index(agent_id)
self._done_agents_index_to_last_reward[gym_index] = reward
self._gym_id_order[gym_index] = -1
except ValueError:
# Agent was never registered in the first place (e.g. EndEpisode called multiple times)
pass

def register_new_agent_id(self, agent_id: int) -> float:
original_index = self._gym_id_order.index(-1)
Expand Down
48 changes: 48 additions & 0 deletions gym-unity/gym_unity/tests/test_gym.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,50 @@ def test_sanitize_action_one_agent_done(mock_env):
assert expected_agent_id == agent_id


@mock.patch("gym_unity.envs.UnityEnvironment")
def test_sanitize_action_new_agent_done(mock_env):
mock_spec = create_mock_group_spec(
vector_action_space_type="discrete", vector_action_space_size=[2, 2, 3]
)
mock_step = create_mock_vector_step_result(num_agents=3)
mock_step.agent_id = np.array(range(5))
setup_mock_unityenvironment(mock_env, mock_spec, mock_step)
env = UnityEnv(" ", use_visual=False, multiagent=True)

received_step_result = create_mock_vector_step_result(num_agents=7)
received_step_result.agent_id = np.array(range(7))
# agent #3 (id = 2) is Done
# so is the "new" agent (id = 5)
done = [False] * 7
done[2] = True
done[5] = True
received_step_result.done = np.array(done)
sanitized_result = env._sanitize_info(received_step_result)
for expected_agent_id, agent_id in zip([0, 1, 6, 3, 4], sanitized_result.agent_id):
assert expected_agent_id == agent_id


@mock.patch("gym_unity.envs.UnityEnvironment")
def test_sanitize_action_single_agent_multiple_done(mock_env):
mock_spec = create_mock_group_spec(
vector_action_space_type="discrete", vector_action_space_size=[2, 2, 3]
)
mock_step = create_mock_vector_step_result(num_agents=1)
mock_step.agent_id = np.array(range(1))
setup_mock_unityenvironment(mock_env, mock_spec, mock_step)
env = UnityEnv(" ", use_visual=False, multiagent=False)

received_step_result = create_mock_vector_step_result(num_agents=3)
received_step_result.agent_id = np.array(range(3))
# original agent (id = 0) is Done
# so is the "new" agent (id = 1)
done = [True, True, False]
received_step_result.done = np.array(done)
sanitized_result = env._sanitize_info(received_step_result)
for expected_agent_id, agent_id in zip([2], sanitized_result.agent_id):
assert expected_agent_id == agent_id


# Helper methods


Expand Down Expand Up @@ -200,6 +244,10 @@ def test_agent_id_index_mapper(mapper_cls):
mapper.mark_agent_done(1001, 42.0)
mapper.mark_agent_done(1004, 1337.0)

# Make sure we can handle an unknown agent id being marked done.
# This can happen when an agent ends an episode on the same step it starts.
mapper.mark_agent_done(9999, -1.0)

# Now add new agents, and get the rewards of the agent they replaced.
old_reward1 = mapper.register_new_agent_id(2001)
old_reward2 = mapper.register_new_agent_id(2002)
Expand Down
2 changes: 1 addition & 1 deletion ml-agents-envs/mlagents_envs/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.15.0"
__version__ = "0.15.1"
Loading

0 comments on commit 377a25c

Please sign in to comment.