diff --git a/howto/learn_in_dmc.md b/howto/learn_in_dmc.md index 5b604abe..c847f970 100644 --- a/howto/learn_in_dmc.md +++ b/howto/learn_in_dmc.md @@ -29,10 +29,10 @@ python sheeprl.py exp=dreamer_v3 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.enc ``` ## DeepMind Control -In order to train your agents on the [DeepMind control suite](https://github.com/deepmind/dm_control/blob/main/dm_control/suite/README.md), you have to select the *DMC* environment (`env=dmc`) and set the id of the environment you want to use. A list of the available environments can be found [here](https://arxiv.org/abs/1801.00690). For instance, if you want to train your agent on the *walker walk* environment, you need to set the `env.id` to `"walker_walk"`. +In order to train your agents on the [DeepMind control suite](https://github.com/deepmind/dm_control/blob/main/dm_control/suite/README.md), you have to select the *DMC* environment (`env=dmc`) and set the `domain` and the `task` of the environment you want to use. A list of the available environments can be found [here](https://arxiv.org/abs/1801.00690). For instance, if you want to train your agent on the *walker walk* environment, you need to set the `env.wrapper.domain_name` to `"walker"` and the `env.wrapper.task_name` to `"walk"`. ```bash -python sheeprl.py exp=dreamer_v3 env=dmc env.id=walker_walk algo.cnn_keys.encoder=[rgb] +python sheeprl.py exp=dreamer_v3 env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk algo.cnn_keys.encoder=[rgb] ``` > [!NOTE] diff --git a/howto/run_experiments.md b/howto/run_experiments.md index daf9dd62..3c0fa0ce 100644 --- a/howto/run_experiments.md +++ b/howto/run_experiments.md @@ -9,6 +9,6 @@ In this document, we give the user some advice to execute its experiments. Now that you are familiar with [hydra](https://hydra.cc/docs/intro/) and the organization of the configs of this repository, we can introduce few constraints to launch experiments: 1. When you launch an experiment you **must** specify the experiment config of the agent you want to train: `python sheeprl.py exp=...`. The list of the available experiment configs can be retrieved with the following command: `python sheeprl.py --help` -2. Then you have to specify the hyper-parameters of your experiment: you can override the hyper-parameters by specifying them as cli arguments (e.g., `exp=dreamer_v3 algo=dreamer_v3 env=dmc env.id=walker_walk env.action_repeat=2 ...`) or you can write your custom experiment file (you must put it in the `./sheeprl/configs/exp` folder) and call your script with the command `python sheeprl.py exp=custom_experiment` (the last option is recommended). There are some available examples, just check the [exp folder](../sheeprl/configs/exp/). +2. Then you have to specify the hyper-parameters of your experiment: you can override the hyper-parameters by specifying them as cli arguments (e.g., `exp=dreamer_v3 algo=dreamer_v3 env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk env.action_repeat=2 ...`) or you can write your custom experiment file (you must put it in the `./sheeprl/configs/exp` folder) and call your script with the command `python sheeprl.py exp=custom_experiment` (the last option is recommended). There are some available examples, just check the [exp folder](../sheeprl/configs/exp/). 3. You **cannot mix the agent command with the configs of another algorithm**, this might raise an error or create anomalous behaviors. So if you want to train the `dreamer_v3` agent, be sure to select the correct algorithm configuration (in our case `algo=dreamer_v3`) 4. To change the optimizer of an algorithm through the CLI you must do the following: suppose that you want to run an experiment with Dreamer-V3 and want to change the world model optimizer from Adam (default in the `sheeprl/configs/algo/dreamer_v3.yaml` config) with SGD, then in the CLI you must type `python sheeprl.py algo=dreamer_v3 ... optim@algo.world_model.optimizer=sgd`, where `optim@algo.world_model.optimizer=sgd` means that the `optimizer` field of the `world_model` of the `algo` config choosen (the dreamer_v3.yaml one) will be equal to the config `sgd.yaml` found under the `sheeprl/configs/optim` folder diff --git a/sheeprl/algos/dreamer_v1/README.md b/sheeprl/algos/dreamer_v1/README.md index 21b81379..f33a17d1 100644 --- a/sheeprl/algos/dreamer_v1/README.md +++ b/sheeprl/algos/dreamer_v1/README.md @@ -80,9 +80,9 @@ There are two versions for most Atari environments: one version uses the *frame For more information see the official documentation of [Gymnasium Atari environments](https://gymnasium.farama.org/environments/atari/). ## DMC environments -It is possible to use the environments provided by the [DeepMind Control suite](https://www.deepmind.com/open-source/deepmind-control-suite). To use such environments it is necessary to specify "dmc" and the name of the environment in the `env` and `env.id` hyper-parameters respectively, e.g., `env=dmc env.id=walker_walk` will create an instance of the walker walk environment. For more information about all the environments, check their [paper](https://arxiv.org/abs/1801.00690). +It is possible to use the environments provided by the [DeepMind Control suite](https://www.deepmind.com/open-source/deepmind-control-suite). To use such environments it is necessary to specify "dmc", the domain and the task of the environment in the `env`, `env.wrapper.domain_name` and `env.wrapper.task_name` hyper-parameters respectively, e.g., `env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk` will create an instance of the walker walk environment. For more information about all the environments, check their [paper](https://arxiv.org/abs/1801.00690). -When running DreamerV1 in a DMC environment on a server (or a PC without a video terminal) it could be necessary to add two variables to the command to launch the script: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa `. For instance, to run walker walk with DreamerV1 on two gpus (0 and 1) it is necessary to runthe following command: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa python sheeprl.py exp=dreamer_v1 fabric.devices=2 fabric.accelerator=gpu env=dmc env.id=walker_walk env.action_repeat=2 env.capture_video=True checkpoint.every=100000 cnn_keys.encoder=[rgb]`. +When running DreamerV1 in a DMC environment on a server (or a PC without a video terminal) it could be necessary to add two variables to the command to launch the script: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa `. For instance, to run walker walk with DreamerV1 on two gpus (0 and 1) it is necessary to runthe following command: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa python sheeprl.py exp=dreamer_v1 fabric.devices=2 fabric.accelerator=gpu env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk env.action_repeat=2 env.capture_video=True checkpoint.every=100000 algo.cnn_keys.encoder=[rgb]`. Other possibitities for the variable `MUJOCO_GL` are: `GLFW` for rendering to an X11 window or and `EGL` for hardware accelerated headless. (For more information, click [here](https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl)). Moreover, it could be necessary to decomment two rows in the `sheeprl.algos.dreamer_v1.dreamer_v1.py` file. diff --git a/sheeprl/algos/dreamer_v2/README.md b/sheeprl/algos/dreamer_v2/README.md index 5aab1e86..ccf10a7b 100644 --- a/sheeprl/algos/dreamer_v2/README.md +++ b/sheeprl/algos/dreamer_v2/README.md @@ -147,9 +147,9 @@ checkpoint.every=100000 \ ``` ## DMC environments -It is possible to use the environments provided by the [DeepMind Control suite](https://www.deepmind.com/open-source/deepmind-control-suite). To use such environments it is necessary to specify "dmc" and the name of the environment in the `env` and `env.id` hyper-parameters respectively, e.g., `env=dmc env.id=walker_walk` will create an instance of the walker walk environment. For more information about all the environments, check their [paper](https://arxiv.org/abs/1801.00690). +It is possible to use the environments provided by the [DeepMind Control suite](https://www.deepmind.com/open-source/deepmind-control-suite). To use such environments it is necessary to specify "dmc", the domain and the task of the environment in the `env`, `env.wrapper.domain_name` and `env.wrapper.task_name` hyper-parameters respectively, e.g., `env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk` will create an instance of the walker walk environment. For more information about all the environments, check their [paper](https://arxiv.org/abs/1801.00690). -When running DreamerV2 in a DMC environment on a server (or a PC without a video terminal) it could be necessary to add two variables to the command to launch the script: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa `. For instance, to run walker walk with DreamerV2 on two gpus (0 and 1) it is necessary to runthe following command: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa CUDA_VISIBLE_DEVICES="2,3" python sheeprl.py exp=dreamer_v2 fabric.devices=2 fabric.accelerator=gpu env=dmc env.id=walker_walk env.action_repeat=2 env.capture_video=True checkpoint.every=80000 cnn_keys.encoder=[rgb]`. +When running DreamerV2 in a DMC environment on a server (or a PC without a video terminal) it could be necessary to add two variables to the command to launch the script: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa `. For instance, to run walker walk with DreamerV2 on two gpus (0 and 1) it is necessary to runthe following command: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa CUDA_VISIBLE_DEVICES="2,3" python sheeprl.py exp=dreamer_v2 fabric.devices=2 fabric.accelerator=gpu env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk env.action_repeat=2 env.capture_video=True checkpoint.every=80000 algo.cnn_keys.encoder=[rgb]`. Other possibitities for the variable `MUJOCO_GL` are: `GLFW` for rendering to an X11 window or and `EGL` for hardware accelerated headless. (For more information, click [here](https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl)). Moreover, it could be necessary to decomment two rows in the `sheeprl.algos.dreamer_v1.dreamer_v1.py` file. @@ -160,7 +160,8 @@ PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa python sheeprl.py \ exp=dreamer_v2 \ fabric.devices=1 \ env=dmc \ -env.id=walker_walk \ +env.wrapper.domain_name=walker \ +env.wrapper.task_name=walk \ env.capture_video=True \ env.action_repeat=2 \ env.clip_rewards=False \ diff --git a/sheeprl/algos/sac_ae/README.md b/sheeprl/algos/sac_ae/README.md index 0cda4f0a..3960ebfc 100644 --- a/sheeprl/algos/sac_ae/README.md +++ b/sheeprl/algos/sac_ae/README.md @@ -199,9 +199,9 @@ There are two versions for most Atari environments: one version uses the *frame For more information see the official documentation of [Gymnasium Atari environments](https://gymnasium.farama.org/environments/atari/). ## DMC environments -It is possible to use the environments provided by the [DeepMind Control suite](https://www.deepmind.com/open-source/deepmind-control-suite). To use such environments it is necessary to specify "dmc" and the name of the environment in the `env` and `env.id` hyper-parameters respectively, e.g., `env=dmc env.id=walker_walk` will create an instance of the walker walk environment. For more information about all the environments, check their [paper](https://arxiv.org/abs/1801.00690). +It is possible to use the environments provided by the [DeepMind Control suite](https://www.deepmind.com/open-source/deepmind-control-suite). To use such environments it is necessary to specify "dmc", the domain and the task of the environment in the `env`, the `env.wrapper.domain_name` and `env.wrapper.task_name` hyper-parameters respectively, e.g., `env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk` will create an instance of the walker walk environment. For more information about all the environments, check their [paper](https://arxiv.org/abs/1801.00690). -When running DreamerV1 in a DMC environment on a server (or a PC without a video terminal) it could be necessary to add two variables to the command to launch the script: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa `. For instance, to run walker walk with DreamerV1 on two gpus (0 and 1) it is necessary to runthe following command: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa python sheeprl.py exp=sac_ae fabric.devices=2 fabric.accelerator=gpu env=dmc env.id=walker_walk env.action_repeat=2 env.capture_video=True checkpoint.every=80000 cnn_keys.encoder=[rgb]`. +When running DreamerV1 in a DMC environment on a server (or a PC without a video terminal) it could be necessary to add two variables to the command to launch the script: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa `. For instance, to run walker walk with DreamerV1 on two gpus (0 and 1) it is necessary to runthe following command: `PYOPENGL_PLATFORM="" MUJOCO_GL=osmesa python sheeprl.py exp=sac_ae fabric.devices=2 fabric.accelerator=gpu env=dmc env.wrapper.domain_name=walker env.wrapper.task_name=walk env.action_repeat=2 env.capture_video=True checkpoint.every=80000 algo.cnn_keys.encoder=[rgb]`. Other possibitities for the variable `MUJOCO_GL` are: `GLFW` for rendering to an X11 window or and `EGL` for hardware accelerated headless. (For more information, click [here](https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl)). ## Recommendations diff --git a/sheeprl/configs/env/dmc.yaml b/sheeprl/configs/env/dmc.yaml index 9f184b92..84699ca9 100644 --- a/sheeprl/configs/env/dmc.yaml +++ b/sheeprl/configs/env/dmc.yaml @@ -3,7 +3,7 @@ defaults: - _self_ # Override from `default` config -id: walker_walk +id: ${env.wrapper.domain_name}_${env.wrapper.task_name} action_repeat: 1 max_episode_steps: 1000 sync_env: True @@ -11,7 +11,8 @@ sync_env: True # Wrapper to be instantiated wrapper: _target_: sheeprl.envs.dmc.DMCWrapper - id: ${env.id} + domain_name: walker + task_name: walk width: ${env.screen_size} height: ${env.screen_size} seed: null diff --git a/sheeprl/configs/exp/dreamer_v3_dmc_cartpole_swingup_sparse.yaml b/sheeprl/configs/exp/dreamer_v3_dmc_cartpole_swingup_sparse.yaml new file mode 100644 index 00000000..7003be3a --- /dev/null +++ b/sheeprl/configs/exp/dreamer_v3_dmc_cartpole_swingup_sparse.yaml @@ -0,0 +1,47 @@ +# @package _global_ + +defaults: + - dreamer_v3 + - override /algo: dreamer_v3_S + - override /env: dmc + - _self_ + +# Experiment +seed: 5 + +# Environment +env: + num_envs: 4 + action_repeat: 2 + max_episode_steps: -1 + wrapper: + domain_name: cartpole + task_name: swingup_sparse + from_vectors: False + from_pixels: True + seed: ${seed} + +# Checkpoint +checkpoint: + every: 10000 + +# Buffer +buffer: + size: 100000 + checkpoint: True + memmap: True + +# Algorithm +algo: + total_steps: 1000000 + cnn_keys: + encoder: + - rgb + mlp_keys: + encoder: [] + learning_starts: 1024 + train_every: 2 + +# Metric +metric: + log_every: 5000 diff --git a/sheeprl/configs/exp/dreamer_v3_dmc_walker_walk.yaml b/sheeprl/configs/exp/dreamer_v3_dmc_walker_walk.yaml index f4c0db04..4c16b627 100644 --- a/sheeprl/configs/exp/dreamer_v3_dmc_walker_walk.yaml +++ b/sheeprl/configs/exp/dreamer_v3_dmc_walker_walk.yaml @@ -13,8 +13,9 @@ seed: 5 env: num_envs: 4 max_episode_steps: -1 - id: walker_walk wrapper: + domain_name: walker + task_name: walk from_vectors: False from_pixels: True diff --git a/sheeprl/envs/dmc.py b/sheeprl/envs/dmc.py index bf0bc3cc..c2643993 100644 --- a/sheeprl/envs/dmc.py +++ b/sheeprl/envs/dmc.py @@ -49,7 +49,8 @@ def _flatten_obs(obs: Dict[Any, Any]) -> np.ndarray: class DMCWrapper(gym.Wrapper): def __init__( self, - id: str, + domain_name: str, + task_name: str, from_pixels: bool = False, from_vectors: bool = True, height: int = 84, @@ -74,7 +75,8 @@ def __init__( by the camera specified through the `camera_id` parameter Args: - id (str): the task id, e.g. 'walker_walk'. The id must be 'underscore' separated. + domain_name (str): the domain of the environment, e.g., "walker". + task_name (str): the task of the environment, e.g., "walk". from_pixels (bool, optional): whether to return the image observation. If both 'from_pixels' and 'from_vectors' are True, then the observation space will be a `gymnasium.spaces.Dict` with two keys: 'rgb' and 'state' for the @@ -112,7 +114,6 @@ def __init__( f"got {from_vectors} and {from_pixels} respectively." ) - domain_name, task_name = id.split("_") self._from_pixels = from_pixels self._from_vectors = from_vectors self._height = height