Skip to content

Commit

Permalink
Docs/update (#115)
Browse files Browse the repository at this point in the history
* docs: update

* fix: dependencies

* fix: version

* fix: added swig in pyproject.toml
  • Loading branch information
michele-milesi authored Oct 4, 2023
1 parent 1c98046 commit a1736f8
Show file tree
Hide file tree
Showing 9 changed files with 22 additions and 18 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ pip install "sheeprl[atari,mujoco,dev,test] @ git+https://github.com/Eclectic-Sh
>
> If you want to install the *minedojo* or *minerl* environment support, Java JDK 8 is required: you can install it by following the instructions at this [link](https://docs.minedojo.org/sections/getting_started/install.html#on-ubuntu-20-04).
>
> **MineRL**, **MineDojo**, and **DIAMBRA** environments have **conflicting requirements**, so **DO NOT install them together** with the `pip install -e .[minerl,minedojo,diambra]` command, but instead **install them individually** with either the command `pip install -e .[minerl]` or `pip install -e .[minedojo]` or `pip install -e .[diambra]` before running an experiment with the MineRL or MineDojo or DIAMBRA environment, respectively.
> **MineRL** and **MineDojo** environments have **conflicting requirements**, so **DO NOT install them together** with the `pip install -e .[minerl,minedojo]` command, but instead **install them individually** with either the command `pip install -e .[minerl]` or `pip install -e .[minedojo]` before running an experiment with the MineRL or MineDojo environment, respectively.
</details>

Expand Down
10 changes: 5 additions & 5 deletions howto/learn_in_diambra.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ diambra run -s=8 python sheeprl.py exp=dreamer_v3 env=diambra env.id=doapp env.n
The IDs of the DIAMBRA environments are specified [here](https://docs.diambra.ai/envs/games/). To train your agent on a DIAMBRA environment you have to select the diambra configs with the argument `env=diambra`, then set the `env.id` argument to the environment ID, e.g., to train your agent on the *Dead Or Alive ++* game, you have to set the `env.id` argument to `doapp` (i.e., `env.id=doapp`).

```bash
diambra run -s=4 python sheeprl.py exp=dreamer_v3 env=diambra env.id=doapp env.num_envs=4
diambra run -s=4 python sheeprl.py exp=dreamer_v3 env=diambra env.id=doapp env.num_envs=4 cnn_keys.encoder=[frame]
```

Another possibility is to create a new config file in the `sheeprl/configs/exp` folder, where you specify all the configs you want to use in your experiment. An example of custom configuration file is available [here](../sheeprl/configs/exp/dreamer_v3_L_doapp.yaml).
Expand All @@ -72,7 +72,7 @@ To modify the default settings or add other wrappers, you have to add the settin
For insance, in the following example, we create the `custom_exp.yaml` file in the `sheeprl/configs/exp` folder where the we select the diambra environment, in addition, the player one is selected and a step ratio of $5$ is choosen. Moreover, the rewards are normalized by a factor of $0.3$.


```diff
```yaml
# @package _global_

defaults:
Expand All @@ -81,15 +81,15 @@ defaults:
- _self_

env:
env:
id: doapp
wrapper:
diambra_settings:
characters: Kasumi
step_ratio: 5
role: diambra.arena.Roles.P1
diambra_wrappers:
reward_normalization: True
reward_normalization_factor: 0.3
normalize_reward: True
normalization_factor: 0.3
```
Now, to run your experiment, you have to execute the following command:
Expand Down
8 changes: 6 additions & 2 deletions howto/learn_in_dmc.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,15 @@ First you should install the proper environments:

MuJoCo/DMC supports three different OpenGL rendering backends: EGL (headless), GLFW (windowed), OSMesa (headless).
For each of them, you need to install some pakages:
- GLFW: `sudo apt-get install libglfw3 libglew2.0`
- EGL: `sudo apt-get install libglew2.0`
- GLFW: `sudo apt-get install libglfw3 libglew2.2`
- EGL: `sudo apt-get install libglew2.2`
- OSMesa: `sudo apt-get install libgl1-mesa-glx libosmesa6`
In order to use one of these rendering backends, you need to set the `MUJOCO_GL` environment variable to `"glfw"`, `"egl"`, `"osmesa"`, respectively.

> **Note**
>
> The `libglew2.2` could have a different name, based on your OS (e.g., `libglew2.2` is for Ubuntu 22.04.2 LTS).
For more information: [https://github.com/deepmind/dm_control](https://github.com/deepmind/dm_control) and [https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl](https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl)

## MuJoCo Gymnasium
Expand Down
2 changes: 1 addition & 1 deletion howto/learn_in_minedojo.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ It is possible to train your agents on all the tasks provided by MineDojo. You n
For instance, you can use the following command to select the MineDojo open-ended environment.

```bash
python sheeprl.py exp=p2e_dv2 env=minedojo env.id=open-ened algo.actor.cls=sheeprl.algos.p2e_dv2.agent.MinedojoActor cnn_keys.encoder=[rgb]
python sheeprl.py exp=p2e_dv2 env=minedojo env.id=open-ended algo.actor.cls=sheeprl.algos.p2e_dv2.agent.MinedojoActor cnn_keys.encoder=[rgb]
```

### Observation Space
Expand Down
2 changes: 1 addition & 1 deletion howto/register_new_algorithm.md
Original file line number Diff line number Diff line change
Expand Up @@ -431,7 +431,7 @@ np.float = np.float32
np.int = np.int64
np.bool = bool
__version__ = "0.3.2"
__version__ = "0.4.3"
```

Then if you run `python sheeprl/available_agents.py` you should see that `sota` appears in the list of all the available agents:
Expand Down
6 changes: 3 additions & 3 deletions howto/select_observations.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@ You just need to pass the `mlp_keys` and `cnn_keys` of the encoder and the decod
>
> We recommend to read [this](./work_with_multi-encoder_multi-decoder.md) to know how the encoder and decoder work with more observations.
For instance, to train the ppo algorithm on the *doapp* task provided by *DIAMBRA* using image observations and only the `P1_oppHealth` and `P1_ownHealth` as vector observation, you have to run the following command:
For instance, to train the ppo algorithm on the *doapp* task provided by *DIAMBRA* using image observations and only the `opp_health` and `own_health` as vector observation, you have to run the following command:
```bash
python sheeprl.py exp=ppo env=diambra env.id=doapp cnn_keys.encoder=[frame] mlp_keys.encoder=[P1_oppHealth,P1_ownHealth]
diambra run python sheeprl.py exp=ppo env=diambra env.id=doapp env.num_envs=1 cnn_keys.encoder=[frame] mlp_keys.encoder=[opp_health,own_health]
```

> **Note**
Expand All @@ -40,7 +40,7 @@ It is important to know the observations the environment provides, for instance,
> **Note**
>
> For some environments provided by gymnasium, e.g. `LunarLander-v2` or `CartPole-v1`, only vector observations are returned, but it is possible to extract the image observation from the render. To do this, it is sufficient to specify the `rgb` key to the `cnn_keys` args:
> `python sheeprl.py cnn_keys.encoder=[rgb]`
> `python sheeprl.py exp=... cnn_keys.encoder=[rgb]`
#### Frame Stack
For image observations it is possible to stack the last $n$ observations with the argument `frame_stack`. All the observations specified in the `cnn_keys` argument are stacked.
Expand Down
4 changes: 2 additions & 2 deletions howto/work_with_steps.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ The hyper-parameters which refer to the *policy steps* are:

* `total_steps`: the total number of policy steps to perform in an experiment. Effectively, this number will be divided in each process by $n \cdot m$ to obtain the number of training steps to be performed by each of them.
* `exploration_steps`: the number of policy steps in which the agent explores the environment in the P2E algorithms.
* `max_episode_steps`: the maximum number of policy steps an episode can last ($\text{max\_steps}$); when this number is reached a `terminated=True` is returned by the environment. This means that if you decide to have an action repeat greater than one ($\text{action\_repeat} > 1$), then the environment performs a maximum number of steps equal to: $\text{env\_steps} = \text{max\_steps} \cdot \text{action\_repeat}$.
* `max_episode_steps`: the maximum number of policy steps an episode can last (`max_steps`); when this number is reached a `terminated=True` is returned by the environment. This means that if you decide to have an action repeat greater than one (`action_repeat > 1`), then the environment performs a maximum number of steps equal to: `env_steps = max_steps * action_repeat`$.
* `learning_starts`: how many policy steps the agent has to perform before starting the training.
* `train_every`: how many policy steps the agent has to perform between one training and the next.

## Gradient steps
A *gradient step* consists of an update of the parameters of the agent, i.e., a call of the *train* function. The gradient step is proportional to the number of parallel processes, indeed, if there are $n$ parallel processes, $n \cdot \text{gradient\_steps}$ calls to the *train* method will be executed.
A *gradient step* consists of an update of the parameters of the agent, i.e., a call of the *train* function. The gradient step is proportional to the number of parallel processes, indeed, if there are $n$ parallel processes, `n * gradient_steps` calls to the *train* method will be executed.

The hyper-parameters which refer to the *gradient steps* are:
* `algo.per_rank_gradient_steps`: the number of gradient steps per rank to perform in a single iteration.
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
create = true
in-project = true
[build-system]
requires = ["setuptools >= 61.0.0"]
requires = ["setuptools >= 61.0.0", "swig==4.*"]
build-backend = "setuptools.build_meta"

[project]
Expand Down Expand Up @@ -65,7 +65,7 @@ atari = [
"gymnasium[other]==0.29.*",
]
minedojo = ["minedojo==0.1", "importlib_resources==5.12.0"]
minerl = ["minerl==0.4.4"]
minerl = ["setuptools==66.0.0", "minerl==0.4.4"]
diambra = ["diambra==0.0.16", "diambra-arena==2.2.1"]
crafter = ["crafter==1.8.1"]

Expand Down
2 changes: 1 addition & 1 deletion sheeprl/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@
np.int = np.int64
np.bool = bool

__version__ = "0.4.2"
__version__ = "0.4.3"

0 comments on commit a1736f8

Please sign in to comment.