feat: Updated docs and exampels

automl · Jun 4, 2024 · 8bfd15b · 8bfd15b
1 parent 2502e04
commit 8bfd15b
Show file tree

Hide file tree

Showing 12 changed files with 102 additions and 446 deletions.
diff --git a/README.md b/README.md
@@ -36,7 +36,7 @@ The ARLBench is a benchmark for HPO in RL - evaluate your HPO methods fast and o
 
 - **Lightning-fast JAX-Based implementations of DQN, PPO, and SAC**
 - **Compatible with many different environment domains via Gymnax, XLand and EnvPool**
-- **Representative benchmark set of HPO settings** 
+- **Representative benchmark set of HPO settings**
 
 <p align="center">
     <a href="./docs/images/subsets.png">
@@ -46,15 +46,15 @@ The ARLBench is a benchmark for HPO in RL - evaluate your HPO methods fast and o
 
 ## Installation
 
-There are currently two different ways to install ARLBench. 
+There are currently two different ways to install ARLBench.
 Whichever you choose, we recommend to create a virtual environment for the installation:
 
 ```bash
 conda create -n arlbench python=3.10
 conda activate arlbench
 ```
 
-The instructions below will help you install the default version of ARLBench with the CPU version of JAX. 
+The instructions below will help you install the default version of ARLBench with the CPU version of JAX.
 If you want to run the ARLBench on GPU, we recommend you check out the [JAX installation guide](https://jax.readthedocs.io/en/latest/installation.html) to see how you can install the correct version for your GPU setup before proceeding.
 
 <details>
@@ -66,6 +66,7 @@ pip install arlbench
 ```
 
 If you want to use envpool environments (not currently supported for Mac!), instead choose:
+
 ```bash
 pip install arlbench[envpool]
 ```
@@ -82,41 +83,48 @@ cd arlbench
 ```
 
 Then you can install the benchmark. For the base version, use:
+
 ```bash
 make install
 ```
+
 For the envpool functionality (not available on Mac!), instead use:
 
 ```bash
 make install-envpool
 ```
+
 </details>
 
 > [!CAUTION]
 > Windows is currently not supported and also not tested. We recommend using the [Linux subsytem](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux) if you're on a Windows machine.
 
 ## Quickstart
 
-Here are the two ways you can use ARLBench: via the command line or as an environment. To see them in action, take a look at our [examples](https://github.com/automl/arlbench/tree/main/examples). 
+Here are the two ways you can use ARLBench: via the command line or as an environment. To see them in action, take a look at our [examples](https://github.com/automl/arlbench/tree/main/examples).
 
 ### Use the CLI
 
 We provide a command line script for black-box configuration in ARLBench which will also save the results in a 'results' directory. To execute one run of DQN on CartPole, simply run:
+
 ```bash
 python run_arlbench.py
 ```
 
 You can use the [hydra](https://hydra.cc/) command line syntax to override some of the configuration like this to change to PPO:
+
 ```bash
 python run_arlbench.py algorithm=ppo
 ```
 
 Or run multiple different seeds after one another:
+
 ```bash
 python run_arlbench.py -m autorl.seed=0,1,2,3,4
 ```
 
 All hyperparamters to adapt are in the 'hpo_config' and architecture settings in the 'nas_config', so to run a grid of different configurations for 5 seeds each , you can do this:
+
 ```bash
 python run_arlbench.py -m autorl.seed=0,1,2,3,4 nas_config.hidden_size=8,16,32 hp_config.learning_rate=0.001,0.01
 ```
@@ -148,7 +156,7 @@ If you use ARLBench in your work, please cite us:
 
 ```bibtex
 @misc{beckdierkes24,
-  author    = {J. Becktepe and J. Dierkes and C. Benjamins and D. Salinas and A. Mohan and R. Rajan and T. Eimer and F. Hutter and H. Hoos and M. Lindauer},
+  author    = {J. Becktepe and J. Dierkes and C. Benjamins and D. Salinas and A. Mohan and R. Rajan and F. Hutter and H. Hoos and M. Lindauer and T. Eimer},
   title     = {ARLBench},
   year      = {2024},
   url = {https://github.com/automl/arlbench},

diff --git a/docs/advanced_usage/algorithm_states.rst b/docs/advanced_usage/algorithm_states.rst
@@ -1,4 +1,7 @@
 Using the ARLBench States
 ==========================
 
-In addition to providing different objectives, ARLBench also provides insights into the target algorithms' internal states.
+In addition to providing different objectives, ARLBench also provides insights into the target algorithms' internal states. This is done using so called `StateFeatures`.
+As of now, we implement the `GradInfo` state feature which returns the norm the gradients observed during training.
+
+The used state features can be defined using the `state_features` key in the config passed to the AutoRL Environment. Please include `grad_info` in this list if you want to use this state feature for your approach.
diff --git a/docs/advanced_usage/autorl_paradigms.rst b/docs/advanced_usage/autorl_paradigms.rst
@@ -1,4 +1,22 @@
 ARLBench and Different AutoRL Paradigms
 =======================================
 
-TODO: relationship to other AutoRL paradigms
+In this chapter, we elaborate on the relationship between ARLBench in various AutoRL Paradigms.
+
+Hyperparameter Optimization (HPO)
+---------------------------------
+(Static) Hyperparameter optimization is one of the core use cases of ARLBench. As stated in our examples, ARLBench supports all kinds of black-box optimizers to perform hyperparameter optimization for RL.
+
+Dynamic Algorithm Configuration (DAC)
+-------------------------------------
+When it comes to dynamic approaches, ARLBench supports different kinds of optimization techniques that adapt the current hyperparameter configuration during training. As stated in the examples,
+this can be done using the CLI or the AutoRL Environment. Using checkpointing, trainings can be continued seamlessly which allows for flexible dynamic approaches.
+
+Neural Architecture Search (NAS)
+--------------------------------
+In addition to HPO, ARLBench supports NAS approaches that set the size of hidden layers and activation functions. However, as of now this is limited to these two architecture hyperparameters.
+In the future, ARLBench could be extended by more powerful search space interfaces for NAS.
+
+Meta-Gradients
+--------------
+As of now, ARLBench does not include meta-gradient based approaches for AutoRL. However, we allow for reactive dynamic approaches that use the gradient informatio during training to select the next hyperparameter configuration as stated in our examples.
diff --git a/docs/advanced_usage/dynamic_configuration.rst b/docs/advanced_usage/dynamic_configuration.rst
@@ -1,4 +1,11 @@
 Dynamic Configuration in ARLBench
 ==================================
 
-How to dynamic?
+In addition to static approaches, which run the whole training given a fixed configuration, ARLBench supports dynamic configuration approaches. 
+These methods, in contrast, can adapt the current hyperparameter configuration during training.
+To do this, you can use the CLI or the AutoRL Environment as shown in our examples.
+
+When using the CLI, you have to pass a checkpoint path for the current training state. Then, the training is proceeded using the given configuration.
+
+For the AutoRL Environment, you can set `n_steps` to the number of configuration updates you want to perform during training.
+By adjusting the number of training steps (`n_total_timesteps`) accordingly and calling the `step()` function multiple times to perform dynamic configuration.
diff --git a/docs/basic_usage/env_subsets.rst b/docs/basic_usage/env_subsets.rst
@@ -9,4 +9,6 @@ We analyzed the hyperparameter landscapes of PPO, DQN and SAC on 20 environments
 
 We strongly recommend you focus your benchmarking on these exact environments to ensure you cover the space total landscape of RL behaviors well. 
 The data generated for selecting these environments is available on `HuggingFace <https://huggingface.co/datasets/autorl-org/arlbench>`_ for you to use in your experiments.
-For more information how the subset selection was done, please refer to our paper.
+For more information how the subset selection was done, please refer to our paper.
+
+For more information on how to evaluate your method on these subsets, please refer to the examples in our GitHub repository.
diff --git a/docs/basic_usage/seeding.rst b/docs/basic_usage/seeding.rst
@@ -1,4 +1,8 @@
 Considerations for Seeding
 ============================
 
-Seeding is important both on the level of RL algorithms as well as the AutoRL level.
+Seeding is important both on the level of RL algorithms as well as the AutoRL level. In general, we propose to use three different random seeds for training, validation, and testing.
+For training and validation, ARLBench takes care of the seeding. When you pass a seed to the AutoRL Environment, it uses this seed for training but `seed + 1` for the validation during training.
+We recommend to use seeds `0` - `9` for training and validation, i.e., by passing them to the AutoRL Environment for the tuning process.
+
+When it comes to testing HPO methods, we provide a evaluation script in our examples. We propose to use seeds `100, 101, ...` here to make sure the method is tested on a different set of random seeds.
diff --git a/examples/Readme.md b/examples/Readme.md
@@ -186,9 +186,52 @@ Now we can build a schedule that takes the gradient information into account.
 
 ## 4. Evaluation
 
-### PPO
+### Evaluation of Static Approaches
+
+You can use ARLBench to evaluate your benchmark method. We recommend running your method on the proposed subset of environments for each algorithm. After that, you need to store the final hyperparameter configurations for the environments and algorithms. This is how the configuration for DQN on Acrobot-v1 might look like:
+
+```yaml
+# @package _global_
+defaults:
+  - override /environment: cc_acrobot
+  - override /algorithm: dqn
+
+hpo_method: my_optimizer
+
+hp_config:
+  buffer_batch_size: 64
+  buffer_size: 100000
+  buffer_prio_sampling: false
+  initial_epsilon: 0.64
+  target_epsilon: 0.112
+  gamma: 0.99
+  gradient_steps: 1
+  learning_rate: 0.0023
+  learning_starts: 1032
+  use_target_network: true
+  target_update_interval: 10
+```
+
+You should replace `my_optimizer` with the name of your method to make sure the results are stored in the right directory. You can then set your incumbent configuration for the algorithm/environment accordingly.
+
+As soon as you have stored all your incumbents (in this example in the `incumbent` directory in `configs`), you can run the evaluation script:
 
 ```bash
-python run_arlbench.py --config-name=evaluate -m "autorl.seed=100,101,102" "+incumbent=glob(*)"
+python run_arlbench.py --config-name=evaluate -m "autorl.seed=100,101,102" "incumbent=glob(*)"
+```
+
+The command will evaluate all configurations on the three test seeds `100,101,102`. Make sure not to use these during the design or tuning of your methods as this will invalidate the evaluation results.
+
+The final evaluation results are stored in the `evaluation` directory for each algorithm and environment.
+
+To run the evaluation only for a single algorithm, e.g. PPO, you can adapt the `incumbent` argument:
 
+```bash
+python run_arlbench.py --config-name=evaluate -m "autorl.seed=100,101,102" "incumbent=glob(ppo*)"
 ```
+
+The same can be done for single combinations of environments and algorithms.
+
+### Evaluation of Dynamic Approaches
+
+When it comes to dynamic HPO methods, you cannot simply return the incumbent but have to evaluate the whole method. For this case, we recommend to use the Hypersweeper or AutoRL Environment as shown in the examples above. Make sure to set the seed of the AutoRL Environment accordingly (`100, 101, 102, ...`).
diff --git a/examples/configs/base.yaml b/examples/configs/base.yaml
@@ -12,7 +12,7 @@ hydra:
   job:
     chdir: true
 
-jax_enable_x64: true
+jax_enable_x64: false
 load_checkpoint: ""
 
 autorl:

diff --git a/examples/configs/incumbent/sac_continuous_mountain_car_my_optimizer.yaml b/examples/configs/incumbent/sac_continuous_mountain_car_my_optimizer.yaml
@@ -16,3 +16,5 @@ hp_config:
   target_update_interval: 10
   tau: 0.52
   reward_scale: 2.32
+
+jax_enable_x64: true
diff --git a/examples/configs/incumbent/sac_pendulum_my_optimizer.yaml b/examples/configs/incumbent/sac_pendulum_my_optimizer.yaml
@@ -16,3 +16,5 @@ hp_config:
   target_update_interval: 10
   tau: 0.52
   reward_scale: 2.32
+
+jax_enable_x64: true