-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
12 changed files
with
102 additions
and
446 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,7 @@ | ||
Using the ARLBench States | ||
========================== | ||
|
||
In addition to providing different objectives, ARLBench also provides insights into the target algorithms' internal states. | ||
In addition to providing different objectives, ARLBench also provides insights into the target algorithms' internal states. This is done using so called `StateFeatures`. | ||
As of now, we implement the `GradInfo` state feature which returns the norm the gradients observed during training. | ||
|
||
The used state features can be defined using the `state_features` key in the config passed to the AutoRL Environment. Please include `grad_info` in this list if you want to use this state feature for your approach. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,22 @@ | ||
ARLBench and Different AutoRL Paradigms | ||
======================================= | ||
|
||
TODO: relationship to other AutoRL paradigms | ||
In this chapter, we elaborate on the relationship between ARLBench in various AutoRL Paradigms. | ||
|
||
Hyperparameter Optimization (HPO) | ||
--------------------------------- | ||
(Static) Hyperparameter optimization is one of the core use cases of ARLBench. As stated in our examples, ARLBench supports all kinds of black-box optimizers to perform hyperparameter optimization for RL. | ||
|
||
Dynamic Algorithm Configuration (DAC) | ||
------------------------------------- | ||
When it comes to dynamic approaches, ARLBench supports different kinds of optimization techniques that adapt the current hyperparameter configuration during training. As stated in the examples, | ||
this can be done using the CLI or the AutoRL Environment. Using checkpointing, trainings can be continued seamlessly which allows for flexible dynamic approaches. | ||
|
||
Neural Architecture Search (NAS) | ||
-------------------------------- | ||
In addition to HPO, ARLBench supports NAS approaches that set the size of hidden layers and activation functions. However, as of now this is limited to these two architecture hyperparameters. | ||
In the future, ARLBench could be extended by more powerful search space interfaces for NAS. | ||
|
||
Meta-Gradients | ||
-------------- | ||
As of now, ARLBench does not include meta-gradient based approaches for AutoRL. However, we allow for reactive dynamic approaches that use the gradient informatio during training to select the next hyperparameter configuration as stated in our examples. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,11 @@ | ||
Dynamic Configuration in ARLBench | ||
================================== | ||
|
||
How to dynamic? | ||
In addition to static approaches, which run the whole training given a fixed configuration, ARLBench supports dynamic configuration approaches. | ||
These methods, in contrast, can adapt the current hyperparameter configuration during training. | ||
To do this, you can use the CLI or the AutoRL Environment as shown in our examples. | ||
|
||
When using the CLI, you have to pass a checkpoint path for the current training state. Then, the training is proceeded using the given configuration. | ||
|
||
For the AutoRL Environment, you can set `n_steps` to the number of configuration updates you want to perform during training. | ||
By adjusting the number of training steps (`n_total_timesteps`) accordingly and calling the `step()` function multiple times to perform dynamic configuration. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,8 @@ | ||
Considerations for Seeding | ||
============================ | ||
|
||
Seeding is important both on the level of RL algorithms as well as the AutoRL level. | ||
Seeding is important both on the level of RL algorithms as well as the AutoRL level. In general, we propose to use three different random seeds for training, validation, and testing. | ||
For training and validation, ARLBench takes care of the seeding. When you pass a seed to the AutoRL Environment, it uses this seed for training but `seed + 1` for the validation during training. | ||
We recommend to use seeds `0` - `9` for training and validation, i.e., by passing them to the AutoRL Environment for the tuning process. | ||
|
||
When it comes to testing HPO methods, we provide a evaluation script in our examples. We propose to use seeds `100, 101, ...` here to make sure the method is tested on a different set of random seeds. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,7 +12,7 @@ hydra: | |
job: | ||
chdir: true | ||
|
||
jax_enable_x64: true | ||
jax_enable_x64: false | ||
load_checkpoint: "" | ||
|
||
autorl: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,3 +16,5 @@ hp_config: | |
target_update_interval: 10 | ||
tau: 0.52 | ||
reward_scale: 2.32 | ||
|
||
jax_enable_x64: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,3 +16,5 @@ hp_config: | |
target_update_interval: 10 | ||
tau: 0.52 | ||
reward_scale: 2.32 | ||
|
||
jax_enable_x64: true |
Oops, something went wrong.