-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop generalization training #2232
Conversation
…and min_lesson_length
def __init__(self, intervals, **kwargs): | ||
self.intervals = intervals | ||
# Measure the length of the intervals | ||
self.interval_lengths = list(map(lambda x: abs(x[1] - x[0]), self.intervals)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find list comprehensions more readable than maps and reduces (and they were also removed from python3). I think this works:
self.interval_lengths = [abs(x[1] - x[0]) for x in self.intervals]
self.cum_interval_length = sum(self.interval_lengths)
self.interval_weights = [x / self.cum_interval_length for x in self.interval_lengths]
and I also think you can keep interval_lengths and cum_interval_length as local variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converted them to local variables
If self.samplers is empty, then bool of it returns false, indicating | ||
there is no sampler manager. | ||
""" | ||
return not bool(self.sampler_manager.samplers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making this a method or property on SamplerManager instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on making check_empty_sampler_manager
part of SamplerManager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the check part of the SamplerManager class
if changed: | ||
self.trainers[brain_name].reward_buffer.clear() | ||
|
||
if ( ((self.meta_curriculum) and any(lessons_incremented.values())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is way too complicated. If you're actually checking what the comment says, then split it up into something like
lessons_were_incremented = ...
ready_for_reset = ...
if lessons_were_incremented or ready_for_reset:
# do stuff
(also watch the modulo by zero)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check to ensure that global_step isn't 0 to ensure modulo safety
@@ -37,6 +39,8 @@ def __init__( | |||
external_brains: Dict[str, BrainParameters], | |||
training_seed: int, | |||
fast_simulation: bool, | |||
sampler_manager, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type annotation here: sampler_manager: SamplerManager,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added type annotation
ml-agents/mlagents/trainers/learn.py
Outdated
@@ -73,6 +76,21 @@ def run_training(sub_id: int, run_seed: int, run_options, process_queue): | |||
docker_target_name=docker_target_name | |||
) | |||
|
|||
sampler = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think sampler_config
is a better name for this. sampler
implies it's an instance of a Sampler
|
||
def sample_all(self): | ||
res = {} | ||
if self.samplers == {}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is what you want here; it's definitely more pythonic to do if not self.samplers:
But even then, you don't need this if
, since doing a for
loop will iterate over 0 items.
if self.samplers == {}: | ||
pass | ||
else: | ||
for param_name, param_sampler in list(self.samplers.items()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need the list()
here (only if we were trying for some sort of python2+3 compatibility).
@@ -203,20 +215,30 @@ def _create_model_path(model_path): | |||
"permissions are set correctly.".format(model_path) | |||
) | |||
|
|||
@staticmethod | |||
def _check_reset_params(reset_params, new_config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking, where is this method called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code trailing from use of lesson_controller; not needed anymore so removed from next update.
|
||
from .exception import SamplerException | ||
|
||
class SamplerException(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this should inherit from UnityException
as with all of the other exception classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instance of dead code I was using for testing; removed in the next update
|
||
Reinforcement learning has a rather unique setup as opposed to supervised and | ||
unsupervised learning. Agents here are trained and tested on the same exact environment, | ||
which is analogous to a model being trained and tested on an identical dataset in supervised learning! This setting results in overfitting; the inability of the agent to generalize to slight tweaks or variations in the environment. This is problematic in instances when environments are randomly instantiated with varying properties. To make agents more robust, we train an agent over multiple variations of the environment. The agent is trained with the intent that it learns to maintain a minimum performance regardless of the environment variant and that it generalizes to maintain this in unseen future variants of the environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 3 and 4 are cut but line 5 is not. You need to make sure that the number of characters per line is consistent throughout the document.
|
||
_Variations of the 3D Ball environment._ | ||
|
||
To vary the environments, we first decided what the reset parameters for the environment. These parameters are known as `Reset Parameters`, introduced in [Curriculum Learning](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Curriculum-Learning.md). In the 3D ball environment example displayed in the figure above, the reset parameters are `gravity`, `ball_mass` and `ball_scale`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You say "we first decided what the reset parameters are" and then introduce the reset Parameters in the following sentence. This is confusing. Also, reset parameters were not introduced in Curriculum learning, so I would say Curriculum learning also uses reset parameters instead of introduced them.
|
||
_Variations of the 3D Ball environment._ | ||
|
||
To vary the environments, we first decided what the reset parameters for the environment. These parameters are known as `Reset Parameters`, introduced in [Curriculum Learning](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Curriculum-Learning.md). In the 3D ball environment example displayed in the figure above, the reset parameters are `gravity`, `ball_mass` and `ball_scale`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To vary the environments, we first decided what the reset parameters for the environment.
This is grammatically incorrect I think.
## In-practice | ||
|
||
To test the effectiveness of this training procedure, we train 3 models over 50000 steps: | ||
1. Model Trained on Default reset parameter values (Default model). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This numbering does not look good in the markdown. I think you need to skip a line somewhere for it to be a numbered list.
|
||
To test the effectiveness of this training procedure, we train 3 models over 50000 steps: | ||
1. Model Trained on Default reset parameter values (Default model). | ||
2. Model Trained on a range of reset parameter values (Random model). Reset parameter values are picked uniformly over the range and the model is trained on each configuration for 5000 steps before they are randomly sampled again. The range consists of the default values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ranges consist of the default values
--> What does this mean ?
|
||
For generalization training, we need to provide a way to modify the environment by supplying a set of reset parameters. This provision can be either deterministic or randomized. Each reset parameter is assigned a sampler. If a sampler isn't provided for a reset parameter, the parameter maintains the default value throughout the training, remaining unchanged. The samplers for all the reset parameters are handled by a **Sampler Manager**, which is also responsible for generating a new set of values for the reset parameters when needed. | ||
|
||
To setup the Sampler Manager, we setup a YAML file that specifies how we wish to generate new samples. In this file, we specify the samplers and the `resampling-duration` (number of training steps after which reset parameters are resampled). Below is an example of a sampler file for the 3D ball environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resampling duration
does not exist in the YAML provided.
|
||
For generalization training, we need to provide a way to modify the environment by supplying a set of reset parameters. This provision can be either deterministic or randomized. Each reset parameter is assigned a sampler. If a sampler isn't provided for a reset parameter, the parameter maintains the default value throughout the training, remaining unchanged. The samplers for all the reset parameters are handled by a **Sampler Manager**, which is also responsible for generating a new set of values for the reset parameters when needed. | ||
|
||
To setup the Sampler Manager, we setup a YAML file that specifies how we wish to generate new samples. In this file, we specify the samplers and the `resampling-duration` (number of training steps after which reset parameters are resampled). Below is an example of a sampler file for the 3D ball environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of training steps after which reset parameters are resampled
--> these are not necessarily training steps, they are just simulations steps.
```yaml | ||
mass: | ||
sampler-type: "custom-sampler" | ||
argA: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the ordering of the args relevant or only their names ? This should be specified.
We first begin with setting up the sampler file. After the sampler file is defined and configured, we proceed by launching `mlagents-learn` and specify our configured sampler file with the `--sampler` flag. To demonstrate, if we wanted to train a 3D ball agent with generalization using the `generalization-test.yaml` sampling setup, we can run | ||
|
||
```sh | ||
mlagents-learn config/trainer_config.yaml --sampler=config/generalize_test.yaml --run-id=3D-Ball-generalization --train |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generalization-test.yaml
=/= config/generalize_test.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the sample code you use config/generalize_test.yaml but in the text you use config/generalization-test.yaml
episode-length: 5000 | ||
|
||
mass: | ||
sampler-type: "uniform" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You provide multi-range, normal and uniform sampling, you need to explain what they are and what their arguments correspond to.
identical dataset in supervised learning! This setting results in overfitting; | ||
the inability of the agent to generalize to slight tweaks or variations in the | ||
environment. This is problematic in instances when environments are randomly | ||
instantiated with varying properties. To make agents more robust, we train an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would rephrase: "To make agents more robust, one approach is to train over multiple variations of the environment."
@@ -0,0 +1,123 @@ | |||
# Training Generalized Reinforcement Learning agents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: capitalize agents
instantiated with varying properties. To make agents more robust, we train an | ||
agent over multiple variations of the environment. The agent is trained with | ||
the intent that it learns to maintain a minimum performance regardless of the | ||
environment variant and that it generalizes to maintain this in unseen future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would rephrase: "... and that it generalizes to be robust to future unseen variants of the environment."
## How-to | ||
|
||
For generalization training, we need to provide a way to modify the environment | ||
by supplying a set of reset parameters. This provision can be either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this clearer, I'd break this into two paragraphs. First: "For generalization training, we need to provide a way to ... reset parameters, and vary them over time. The parameters could be chosen either deterministically or randomly."
2nd paragraph: "This is done by assigning each reset parameter a sampler, which (insert description of what a sampler does). ..."
resampled). Below is an example of a sampler file for the 3D ball environment. | ||
|
||
```yaml | ||
episode-length: 5000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which on is it ? resampling-duration or episode length ?
We first begin with setting up the sampler file. After the sampler file is defined and configured, we proceed by launching `mlagents-learn` and specify our configured sampler file with the `--sampler` flag. To demonstrate, if we wanted to train a 3D ball agent with generalization using the `generalization-test.yaml` sampling setup, we can run | ||
|
||
```sh | ||
mlagents-learn config/trainer_config.yaml --sampler=config/generalize_test.yaml --run-id=3D-Ball-generalization --train |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the sample code you use config/generalize_test.yaml but in the text you use config/generalization-test.yaml
@@ -197,6 +200,7 @@ are conducting, see: | |||
* [Training with PPO](Training-PPO.md) | |||
* [Using Recurrent Neural Networks](Feature-Memory.md) | |||
* [Training with Curriculum Learning](Training-Curriculum-Learning.md) | |||
* [Training with Generalization](Training-Generalization-Learning.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Training with Generalization sounds weird to me. I would say Training with Environment Parameters Sampling
Uniformly draws a single sample in the range [min_value, max_value). | ||
""" | ||
|
||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need a doc string on the init methods so we know what the arguments correspond to
@@ -23,3 +23,5 @@ class MetaCurriculumError(TrainerError): | |||
""" | |||
Any error related to the configuration of a metacurriculum. | |||
""" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra line ?
@@ -99,6 +107,8 @@ def run_training( | |||
lesson, | |||
run_seed, | |||
fast_simulation, | |||
sampler_manager, | |||
resampling_interval, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is resampling duration the same as resampling interval ? The naming needs to be consistent if this is the case.
@@ -108,6 +118,28 @@ def run_training( | |||
tc.start_learning(env, trainer_config) | |||
|
|||
|
|||
def create_sampler_manager(sampler_file_path, env_reset_params): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved - docs need a couple fixes, we'll submit a separate PR for that. Remaining changes (they won't affect functionality)
- Fix doc references to
resampling_interval
- Rename
generalize_test.yaml
to3dball_generalize.yaml
- Fix references to
yaml
in docs
Reopening a new PR since I accidentally closed the other one; This PR consists of all the edits from the previous PR and omit the lesson_controller class.