Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/dmc wrapper #222

Merged
merged 5 commits into from
Feb 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions howto/learn_in_dmc.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,10 @@ python sheeprl.py exp=dreamer_v3 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.enc
```

## DeepMind Control
In order to train your agents on the [DeepMind control suite](https://github.com/deepmind/dm_control/blob/main/dm_control/suite/README.md), you have to select the *DMC* environment (`env=dmc`) and set the id of the environment you want to use. A list of the available environments can be found [here](https://arxiv.org/abs/1801.00690). For instance, if you want to train your agent on the *walker walk* environment, you need to set the `env.id` to `"walker_walk"`.
In order to train your agents on the [DeepMind control suite](https://github.com/deepmind/dm_control/blob/main/dm_control/suite/README.md), you have to select the *DMC* environment (`env=dmc`) and set the `domain` and the `task` of the environment you want to use. A list of the available environments can be found [here](https://arxiv.org/abs/1801.00690). For instance, if you want to train your agent on the *walker walk* environment, you need to set the `env.wrapper.domain_name` to `"walker"` and the `env.wrapper.task_name` to `"walk"`.

```bash
python sheeprl.py exp=dreamer_v3 env=dmc env.id=walker_walk algo.cnn_keys.encoder=[rgb]
python sheeprl.py exp=dreamer_v3 env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk algo.cnn_keys.encoder=[rgb]
```

> [!NOTE]
Expand Down
2 changes: 1 addition & 1 deletion howto/run_experiments.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,6 @@ In this document, we give the user some advice to execute its experiments.
Now that you are familiar with [hydra](https://hydra.cc/docs/intro/) and the organization of the configs of this repository, we can introduce few constraints to launch experiments:

1. When you launch an experiment you **must** specify the experiment config of the agent you want to train: `python sheeprl.py exp=...`. The list of the available experiment configs can be retrieved with the following command: `python sheeprl.py --help`
2. Then you have to specify the hyper-parameters of your experiment: you can override the hyper-parameters by specifying them as cli arguments (e.g., `exp=dreamer_v3 algo=dreamer_v3 env=dmc env.id=walker_walk env.action_repeat=2 ...`) or you can write your custom experiment file (you must put it in the `./sheeprl/configs/exp` folder) and call your script with the command `python sheeprl.py exp=custom_experiment` (the last option is recommended). There are some available examples, just check the [exp folder](../sheeprl/configs/exp/).
2. Then you have to specify the hyper-parameters of your experiment: you can override the hyper-parameters by specifying them as cli arguments (e.g., `exp=dreamer_v3 algo=dreamer_v3 env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk env.action_repeat=2 ...`) or you can write your custom experiment file (you must put it in the `./sheeprl/configs/exp` folder) and call your script with the command `python sheeprl.py exp=custom_experiment` (the last option is recommended). There are some available examples, just check the [exp folder](../sheeprl/configs/exp/).
3. You **cannot mix the agent command with the configs of another algorithm**, this might raise an error or create anomalous behaviors. So if you want to train the `dreamer_v3` agent, be sure to select the correct algorithm configuration (in our case `algo=dreamer_v3`)
4. To change the optimizer of an algorithm through the CLI you must do the following: suppose that you want to run an experiment with Dreamer-V3 and want to change the world model optimizer from Adam (default in the `sheeprl/configs/algo/dreamer_v3.yaml` config) with SGD, then in the CLI you must type `python sheeprl.py algo=dreamer_v3 ... optim@algo.world_model.optimizer=sgd`, where `optim@algo.world_model.optimizer=sgd` means that the `optimizer` field of the `world_model` of the `algo` config choosen (the dreamer_v3.yaml one) will be equal to the config `sgd.yaml` found under the `sheeprl/configs/optim` folder
4 changes: 2 additions & 2 deletions sheeprl/algos/dreamer_v1/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,9 @@ There are two versions for most Atari environments: one version uses the *frame
For more information see the official documentation of [Gymnasium Atari environments](https://gymnasium.farama.org/environments/atari/).

## DMC environments
It is possible to use the environments provided by the [DeepMind Control suite](https://www.deepmind.com/open-source/deepmind-control-suite). To use such environments it is necessary to specify "dmc" and the name of the environment in the `env` and `env.id` hyper-parameters respectively, e.g., `env=dmc env.id=walker_walk` will create an instance of the walker walk environment. For more information about all the environments, check their [paper](https://arxiv.org/abs/1801.00690).
It is possible to use the environments provided by the [DeepMind Control suite](https://www.deepmind.com/open-source/deepmind-control-suite). To use such environments it is necessary to specify "dmc", the domain and the task of the environment in the `env`, `env.wrapper.domain_name` and `env.wrapper.task_name` hyper-parameters respectively, e.g., `env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk` will create an instance of the walker walk environment. For more information about all the environments, check their [paper](https://arxiv.org/abs/1801.00690).

When running DreamerV1 in a DMC environment on a server (or a PC without a video terminal) it could be necessary to add two variables to the command to launch the script: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa <command>`. For instance, to run walker walk with DreamerV1 on two gpus (0 and 1) it is necessary to runthe following command: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa python sheeprl.py exp=dreamer_v1 fabric.devices=2 fabric.accelerator=gpu env=dmc env.id=walker_walk env.action_repeat=2 env.capture_video=True checkpoint.every=100000 cnn_keys.encoder=[rgb]`.
When running DreamerV1 in a DMC environment on a server (or a PC without a video terminal) it could be necessary to add two variables to the command to launch the script: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa <command>`. For instance, to run walker walk with DreamerV1 on two gpus (0 and 1) it is necessary to runthe following command: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa python sheeprl.py exp=dreamer_v1 fabric.devices=2 fabric.accelerator=gpu env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk env.action_repeat=2 env.capture_video=True checkpoint.every=100000 algo.cnn_keys.encoder=[rgb]`.
Other possibitities for the variable `MUJOCO_GL` are: `GLFW` for rendering to an X11 window or and `EGL` for hardware accelerated headless. (For more information, click [here](https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl)).
Moreover, it could be necessary to decomment two rows in the `sheeprl.algos.dreamer_v1.dreamer_v1.py` file.

Expand Down
7 changes: 4 additions & 3 deletions sheeprl/algos/dreamer_v2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,9 +147,9 @@ checkpoint.every=100000 \
```

## DMC environments
It is possible to use the environments provided by the [DeepMind Control suite](https://www.deepmind.com/open-source/deepmind-control-suite). To use such environments it is necessary to specify "dmc" and the name of the environment in the `env` and `env.id` hyper-parameters respectively, e.g., `env=dmc env.id=walker_walk` will create an instance of the walker walk environment. For more information about all the environments, check their [paper](https://arxiv.org/abs/1801.00690).
It is possible to use the environments provided by the [DeepMind Control suite](https://www.deepmind.com/open-source/deepmind-control-suite). To use such environments it is necessary to specify "dmc", the domain and the task of the environment in the `env`, `env.wrapper.domain_name` and `env.wrapper.task_name` hyper-parameters respectively, e.g., `env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk` will create an instance of the walker walk environment. For more information about all the environments, check their [paper](https://arxiv.org/abs/1801.00690).

When running DreamerV2 in a DMC environment on a server (or a PC without a video terminal) it could be necessary to add two variables to the command to launch the script: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa <command>`. For instance, to run walker walk with DreamerV2 on two gpus (0 and 1) it is necessary to runthe following command: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa CUDA_VISIBLE_DEVICES="2,3" python sheeprl.py exp=dreamer_v2 fabric.devices=2 fabric.accelerator=gpu env=dmc env.id=walker_walk env.action_repeat=2 env.capture_video=True checkpoint.every=80000 cnn_keys.encoder=[rgb]`.
When running DreamerV2 in a DMC environment on a server (or a PC without a video terminal) it could be necessary to add two variables to the command to launch the script: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa <command>`. For instance, to run walker walk with DreamerV2 on two gpus (0 and 1) it is necessary to runthe following command: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa CUDA_VISIBLE_DEVICES="2,3" python sheeprl.py exp=dreamer_v2 fabric.devices=2 fabric.accelerator=gpu env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk env.action_repeat=2 env.capture_video=True checkpoint.every=80000 algo.cnn_keys.encoder=[rgb]`.
Other possibitities for the variable `MUJOCO_GL` are: `GLFW` for rendering to an X11 window or and `EGL` for hardware accelerated headless. (For more information, click [here](https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl)).
Moreover, it could be necessary to decomment two rows in the `sheeprl.algos.dreamer_v1.dreamer_v1.py` file.

Expand All @@ -160,7 +160,8 @@ PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa python sheeprl.py \
exp=dreamer_v2 \
fabric.devices=1 \
env=dmc \
env.id=walker_walk \
env.wrapper.domain_name=walker \
env.wrapper.task_name=walk \
env.capture_video=True \
env.action_repeat=2 \
env.clip_rewards=False \
Expand Down
4 changes: 2 additions & 2 deletions sheeprl/algos/sac_ae/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,9 +199,9 @@ There are two versions for most Atari environments: one version uses the *frame
For more information see the official documentation of [Gymnasium Atari environments](https://gymnasium.farama.org/environments/atari/).

## DMC environments
It is possible to use the environments provided by the [DeepMind Control suite](https://www.deepmind.com/open-source/deepmind-control-suite). To use such environments it is necessary to specify "dmc" and the name of the environment in the `env` and `env.id` hyper-parameters respectively, e.g., `env=dmc env.id=walker_walk` will create an instance of the walker walk environment. For more information about all the environments, check their [paper](https://arxiv.org/abs/1801.00690).
It is possible to use the environments provided by the [DeepMind Control suite](https://www.deepmind.com/open-source/deepmind-control-suite). To use such environments it is necessary to specify "dmc", the domain and the task of the environment in the `env`, the `env.wrapper.domain_name` and `env.wrapper.task_name` hyper-parameters respectively, e.g., `env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk` will create an instance of the walker walk environment. For more information about all the environments, check their [paper](https://arxiv.org/abs/1801.00690).

When running DreamerV1 in a DMC environment on a server (or a PC without a video terminal) it could be necessary to add two variables to the command to launch the script: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa <command>`. For instance, to run walker walk with DreamerV1 on two gpus (0 and 1) it is necessary to runthe following command: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa python sheeprl.py exp=sac_ae fabric.devices=2 fabric.accelerator=gpu env=dmc env.id=walker_walk env.action_repeat=2 env.capture_video=True checkpoint.every=80000 cnn_keys.encoder=[rgb]`.
When running DreamerV1 in a DMC environment on a server (or a PC without a video terminal) it could be necessary to add two variables to the command to launch the script: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa <command>`. For instance, to run walker walk with DreamerV1 on two gpus (0 and 1) it is necessary to runthe following command: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa python sheeprl.py exp=sac_ae fabric.devices=2 fabric.accelerator=gpu env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk env.action_repeat=2 env.capture_video=True checkpoint.every=80000 algo.cnn_keys.encoder=[rgb]`.
Other possibitities for the variable `MUJOCO_GL` are: `GLFW` for rendering to an X11 window or and `EGL` for hardware accelerated headless. (For more information, click [here](https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl)).

## Recommendations
Expand Down
5 changes: 3 additions & 2 deletions sheeprl/configs/env/dmc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,16 @@ defaults:
- _self_

# Override from `default` config
id: walker_walk
id: ${env.wrapper.domain_name}_${env.wrapper.task_name}
action_repeat: 1
max_episode_steps: 1000
sync_env: True

# Wrapper to be instantiated
wrapper:
_target_: sheeprl.envs.dmc.DMCWrapper
id: ${env.id}
domain_name: walker
task_name: walk
width: ${env.screen_size}
height: ${env.screen_size}
seed: null
Expand Down
47 changes: 47 additions & 0 deletions sheeprl/configs/exp/dreamer_v3_dmc_cartpole_swingup_sparse.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# @package _global_

defaults:
- dreamer_v3
- override /algo: dreamer_v3_S
- override /env: dmc
- _self_

# Experiment
seed: 5

# Environment
env:
num_envs: 4
action_repeat: 2
max_episode_steps: -1
wrapper:
domain_name: cartpole
task_name: swingup_sparse
from_vectors: False
from_pixels: True
seed: ${seed}

# Checkpoint
checkpoint:
every: 10000

# Buffer
buffer:
size: 100000
checkpoint: True
memmap: True

# Algorithm
algo:
total_steps: 1000000
cnn_keys:
encoder:
- rgb
mlp_keys:
encoder: []
learning_starts: 1024
train_every: 2

# Metric
metric:
log_every: 5000
3 changes: 2 additions & 1 deletion sheeprl/configs/exp/dreamer_v3_dmc_walker_walk.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ seed: 5
env:
num_envs: 4
max_episode_steps: -1
id: walker_walk
wrapper:
domain_name: walker
task_name: walk
from_vectors: False
from_pixels: True

Expand Down
7 changes: 4 additions & 3 deletions sheeprl/envs/dmc.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,8 @@ def _flatten_obs(obs: Dict[Any, Any]) -> np.ndarray:
class DMCWrapper(gym.Wrapper):
def __init__(
self,
id: str,
domain_name: str,
task_name: str,
from_pixels: bool = False,
from_vectors: bool = True,
height: int = 84,
Expand All @@ -74,7 +75,8 @@ def __init__(
by the camera specified through the `camera_id` parameter

Args:
id (str): the task id, e.g. 'walker_walk'. The id must be 'underscore' separated.
domain_name (str): the domain of the environment, e.g., "walker".
task_name (str): the task of the environment, e.g., "walk".
from_pixels (bool, optional): whether to return the image observation.
If both 'from_pixels' and 'from_vectors' are True, then the observation space
will be a `gymnasium.spaces.Dict` with two keys: 'rgb' and 'state' for the
Expand Down Expand Up @@ -112,7 +114,6 @@ def __init__(
f"got {from_vectors} and {from_pixels} respectively."
)

domain_name, task_name = id.split("_")
self._from_pixels = from_pixels
self._from_vectors = from_vectors
self._height = height
Expand Down
Loading