Releases: Unity-Technologies/ml-agents
ML-Agents Beta 0.3.1
Features
- We have upgraded our Docker contain, which now supports Brains which contain camera-based Visual Observations.
Documentation
- We have added a partial Chinese translation of our documentation. It is available here.
Fixes & Performance Improvements
- Missing component reference in BananaRL environment.
- Neural Network for multiple visual observations was not properly generated.
- Episode time-out value estimate bootstrapping used incorrect observation as input.
Acknowledgements
Thanks to everyone at Unity who contributed to v0.3.1, as well as to the following community contributors:
@sterlingcrispin, @andersonaddo, @palomagr, @imankgoyal, @luchris429.
ML-Agents Beta 0.3.0b
Fixes
- Fixes internal brain for Banana Imitation.
- Fixes Discrete Control training for Imitation Learning.
- Fixes Visual Observations in internal brain with non-square inputs.
ML-Agents Beta 0.3.0a
Fixes
Added the missing Ray Perception components to the agents in the BananaImitation scene.
ML-Agents Beta 0.3.0
Environments
To learn more about new and improved environments, see our Example Environments page.
New
-
Soccer Twos - Multi-agent competitive and cooperative environment where behavior comes about because of reward function. Used to demonstrate multi-brain training.
-
Banana Collectors - Multi-agent resource collection environment where competitive or cooperative behavior comes about dynamically based on available resources. Used to demonstrate Imitation Learning.
-
Hallway - Single agent environment in which an agent must explore a room, remember the object within the room, and use that information to navigate to the correct goal. Used to demonstrate LSTM models.
-
Bouncer - Single agent environment provided as an example of our new On-Demand Decision-Making feature. In this environment, an agent can apply force to itself in order to bounce around a platform, and attempt to collide with floating bananas.
Improved
- All environments have been visually refreshed with a consistent color pallet and design language.
- Revamped GridWorld to only use visual observations and a 5x5 grid by default.
- Revamped Tennis to use continuous actions.
- Revamped Push Block to use local perception.
- Revamped Wall Jump to use local perception.
- Added Hard version of 3DBall which doesn’t contain velocity information in observations.
New Features
- [Unity] On Demand Decision Making - It is now possible to have agents only request decisions from their brains when necessary, using
RequestDecision()
andRequestAction()
. For more information, see here. - [Unity] Added vector-observation stacking - The past n vector observations for each agent can now be stored and used as input to a Brain for decision making.
- [Python] Added Behavioral Cloning (Imitation Learning) algorithm - Train a neural network to imitate either player behavior or a hand-coded game bot using behavioral cloning. For more info, see here.
- [Python] Support for training multiple brains simultaneously - Two or more different brains can now be trained simultaneously using the provided PPO algorithm.
- [Python] Added LSTM models - We now support training and embedding recurrent neural networks using the PPO algorithm. This allows for learning temporal dependencies between observations.
- [Unity] [Python] Added Docker Image for RL-training - We now provide a Docker image which allows users to train their brains in an isolated environment without the need to install Python, TensorFlow, and other dependencies. For more information, see here.
- [Python] Ability to provide random seed to training process and environment - Allows for reproducible experimentation. For more information, see here. (Note: Unity Physics is non-deterministic, as such fully-reproducible experiments are currently not possible when using physics based interactions.)
Changes
- [Unity] Memory size has been removed as a user-facing brain parameter. It is now defined when creating models from
unitytrainers
. - [Unity] [Python] The API as well as the general semantics used throughout ML-Agents has changed. See here for information on these changes, and how to easily adjust current projects to be compatible with these changes.
- [Python] Training hyperparameters are now defined in a
.yaml
file instead of via command line arguments. - [Python] Training now takes place via learn.py, which launches trainers for multiple brains.
- [Python] Python 2 is no longer supported.
Documentation
Documentation has been significantly re-written to include many new sections, in addition to updated tutorials and guides. Check it out here.
Fixes & Performance Improvements
- [Unity] Improved memory management - Reduced garbage collection memory usage by up to 5x when using External Brain.
- [Unity] Time.captureFramerate is now set by default to help sync Update and FixedUpdate frequencies.
- [Unity] Added tooltips to relevant inspector objects.
- [Unity] It is now possible to instantiate and destroy GameObjects which are Agents.
- [Unity] Improved visual observation inference time by 3x.
- [Unity] Tooltips added to Unity Inspector for ML-Agents variables and functions.
- [Unity] [Python] Epsilon is now a built-in part of PPO graph. It is no longer necessary to specify it additionally in “Graph Placeholders” from Unity.
- [Python] Changed value bootstrapping in PPO algorithm to properly calculate returns on episode time-out.
- [Python] The neural network graph is now automatically saved as a
.bytes
file when training is interrupted.
Acknowledgements
Thanks to everyone at Unity who contributed to v0.3, as well as:
@asolano, @LionH, @MarcoMeter, @srcnalt, @wouterhardeman, @60days, @floAr, @Coac, @Zamaroht, @slightperturbation
ML-Agents Beta 0.2.1d
Fixes
- Fixes bug where visual observations could not be used with PPO.
ML-Agents Beta 0.2.1c
Fixes
- Require TensorFlow 1.4 to prevent incompatibilities between models built using TensorFlow 1.5 and current TensorFlowSharp bindings.
ML-Agents Beta 0.2.1b
Fixes & Performance Improvements
- [Python] Fixes a bug that prevented the creation of network graphs which did not contain visual observations.
ML-Agents Beta 0.2.1a
Features
- [Python] Adds support for training brains with multiple visual observations using PPO. Thanks to @asolano for contributing this!
ML-Agents Beta 0.2
Environments
-
Four new example environments added (learn more):
- Crawler
- Reacher
- Wall Area
- Push Area
-
Environments no longer use normalized state values due to optional auto-normalizing done in PPO.
Features
Communication API Updated. Be sure both Unity project files and Python api are most current version.
Python
- PPO now optionally auto-normalizes states using running-average and running-variance (with
--normalize
flag). - unityagents package now includes Curriculum Learning support (learn more).
- Absolute path to training environments can now be used when running
UnityEnvironment()
. - The Environment now logs errors and exceptions on the Unity side into the
unity-environment.log
file.
Unity
- New more flexible Monitor which allows for displaying arbitrary information (learn more).
- Broadcast support for internal, heuristic, and player brains which allows all relevant agent information to be sent to python-side for supervised/imitation learning (learn more).
Bug Fixes & Performance Improvements
Python
- Communication code now supports arbitrarily large observation cameras and states.
Unity
- Cumulative reward now accurately tracks reward.
AcademyReset()
now called before agent reset.isInference
is now correctly set when running in Editor.- Frame-rate is unlocked by default when in
isInference
is false.
ML-Agents Beta 0.1.2
Features & Additions
Unity
- Added
Basic
Environment for testing discrete state environments
Python
- Reconfigured PPO model generation to support:
- Discrete control w/ discrete-state input
- Continuous Control w/ visual and discrete-state input
- Combined visual/state inputs for CC and DC
- Color (3-channel) observations
General
- Added pre-configured AWS AMI for cloud-training
- Move wiki to
docs
directory for better community collaboration
Bug Fixes
Unity
- Provides message for state size mismatch
- Defaults to continuous state space for new brains