From 8ce19b2c7338f66b8c750e4c6fc99f86dc664ddd Mon Sep 17 00:00:00 2001 From: niuyazhe Date: Tue, 9 May 2023 20:37:31 +0800 Subject: [PATCH] polish(nyz): polish env doc details --- source/13_envs/d4rl.rst | 2 +- source/13_envs/d4rl_zh.rst | 2 +- .../13_envs/{Evogym_zh.rst => evogym_zh.rst} | 0 source/13_envs/index_zh.rst | 2 +- source/13_envs/metadrive.rst | 37 +++++++++---------- source/13_envs/metadrive_zh.rst | 2 +- source/13_envs/mujoco.rst | 2 +- source/13_envs/mujoco_zh.rst | 2 +- 8 files changed, 24 insertions(+), 25 deletions(-) rename source/13_envs/{Evogym_zh.rst => evogym_zh.rst} (100%) diff --git a/source/13_envs/d4rl.rst b/source/13_envs/d4rl.rst index 5f33c479..c5163ca2 100644 --- a/source/13_envs/d4rl.rst +++ b/source/13_envs/d4rl.rst @@ -1,4 +1,4 @@ -D4RL (Mujoco) +D4RL (MuJoCo) ~~~~~~~~~~~~~~~ Abstract diff --git a/source/13_envs/d4rl_zh.rst b/source/13_envs/d4rl_zh.rst index 938d887e..6d05cbd7 100644 --- a/source/13_envs/d4rl_zh.rst +++ b/source/13_envs/d4rl_zh.rst @@ -1,4 +1,4 @@ -D4RL (Mujoco) +D4RL (MuJoCo) ~~~~~~~~~~~~~ 概述 diff --git a/source/13_envs/Evogym_zh.rst b/source/13_envs/evogym_zh.rst similarity index 100% rename from source/13_envs/Evogym_zh.rst rename to source/13_envs/evogym_zh.rst diff --git a/source/13_envs/index_zh.rst b/source/13_envs/index_zh.rst index 77fb677e..932b25a5 100644 --- a/source/13_envs/index_zh.rst +++ b/source/13_envs/index_zh.rst @@ -50,5 +50,5 @@ image_cls_zh sokoban_zh - Evogym_zh.rst + evogym_zh.rst metadrive_zh diff --git a/source/13_envs/metadrive.rst b/source/13_envs/metadrive.rst index 6b3cffa5..92e904e4 100644 --- a/source/13_envs/metadrive.rst +++ b/source/13_envs/metadrive.rst @@ -2,7 +2,7 @@ Metadrive ~~~~~~~~~~~~~~~~~~ Overview -======= +========== `MetaDrive Env `_ is an efficient compositional driving simulator. The goal of the environment is to control a car (or multiple vehicles) from the starting point to the finishing point safely and on time. It has the following properties: @@ -14,10 +14,10 @@ Overview :align: center Install -==== +========= Installation Method --------- +------------------- Users can choose to install it with one click through pip, or install it from source. @@ -35,7 +35,7 @@ Note: If the user does not have root privileges, please add \ ``--user`` \ after pip install -e . Verify Installation --------- +------------------- After the installation is complete, you can verify that the installation was successful by running the following command on the Python command line: @@ -47,19 +47,19 @@ After the installation is complete, you can verify that the installation was suc print(obs.shape) # output (259,) Image ----- +------ The image of DI-engine is equipped with the framework itself, which can be obtained by \ ``docker pull opendilab/ding:nightly`` \, or by visiting \ `docker hub `__ \ for more images. Currently, MetaDrive doesn't have a own image. Space before Transformation (Original Environment) -======================== +=================================================== For details, please refer to the code implementation of `MetaDrive `_ . Observation space --------- +----------------- The observation space of the vehicle is a 259-dimensional numpy array, the data type is float32, and the obs shape is (259,). The observation space consists of the following three parts: @@ -70,7 +70,7 @@ The observation space of the vehicle is a 259-dimensional numpy array, the data Action space --------- +------------ The action space of the MetaDrive environment is a 2-dimensional continuous action, and its valid range is [-1, 1]. Through this design, the action space of each agent is fixed as gym.spaces.Box(low=-1.0, high=1.0, shape=(2, )). - The first dimension represents the steering angle. When the action is 1 or -1, it means that the steering wheel is turned to the left or right to the maximum steering angle, and when it is 0, it means that the steering wheel is facing straight ahead. @@ -78,7 +78,7 @@ The action space of the MetaDrive environment is a 2-dimensional continuous acti - At the same time, it also provides a configuration called extra_action_dim (int). For example, if we set config["extra_action_dim"] = 1, then the action space of each agent will become Box(-1.0, 1.0, shape =(3, )). This allows users to write environment wrappers that introduce more dimensions of input operations. Reward space --------- +-------------- The default reward function in MetaDrive consists of a dense (obtained during driving) reward and a sparse final reward. @@ -116,11 +116,11 @@ Randomness: Transformed space (RL environment) -========================= +=================================== Observation space --------- +----------------- Different from the original version, the observation space is described as a 259-dimensional vector. In DI-engine, the observation space of the car is defined as a top view with a size of 5x84x84, where 5 represents the number of channels, and the last two dimensions (84x84) represent the size of the image for each channel. The semantics of the five channels are: @@ -141,11 +141,11 @@ In the current scenario, the observation of the vehicle can be represented by th Action Space --------- +-------------- - no change Reward Space --------- +------------- - no change Other @@ -158,18 +158,18 @@ Other ======== Lazy Initialization ----------- +-------------------- In order to support parallel operations such as environment vectorization, the specific environment instance generally adopts the lazy initialization method, that is, the \ ``__init__`` \ method of the environment does not initialize the real original environment instance, but only sets relevant parameters and configuration values. Instead, the concrete original environment instance is initialized when the \ ``reset`` \ method is called for the first time. Random Seed --------- +------------- - You can use the _reset_global_seed method to set the random seed of the environment. If you do not set it manually, the environment will randomly sample the random seed setting environment. -The difference between training and testing environments --------------------- +The difference between training and evaluation environments +------------------------------------------------------------ - The training environment uses a dynamic random seed, that is, the random seed of each episode is different and is generated by a random number generator, but the seed of this random number generator is fixed by the \ ``seed`` \ method of the environment . - The test environment uses a static random seed, that is, the random seed of each episode is the same, and is specified by the \ ``seed`` \ method. @@ -299,7 +299,7 @@ Here, for a specific configuration file, such as \ ``metadrive_onppo_config.py`` main(main_config) Benchmark Algorithm Performance -============== +================================ - MetaDrive (the average episode return of the test episodes is greater than or equal to 250, which is regarded as the algorithm converges to an approximate optimal value). @@ -307,4 +307,3 @@ Benchmark Algorithm Performance .. image:: images/metadrive_train1.png :align: center - diff --git a/source/13_envs/metadrive_zh.rst b/source/13_envs/metadrive_zh.rst index 51e6f7be..5eda7f71 100644 --- a/source/13_envs/metadrive_zh.rst +++ b/source/13_envs/metadrive_zh.rst @@ -1,4 +1,4 @@ -Metadrive 中文文档 +Metadrive ~~~~~~~~~~~~~~~~~~ 概述 diff --git a/source/13_envs/mujoco.rst b/source/13_envs/mujoco.rst index 222a0627..38905ddf 100644 --- a/source/13_envs/mujoco.rst +++ b/source/13_envs/mujoco.rst @@ -1,4 +1,4 @@ -Mujoco +MuJoCo ~~~~~~~ Overview diff --git a/source/13_envs/mujoco_zh.rst b/source/13_envs/mujoco_zh.rst index 99ca4336..04568771 100644 --- a/source/13_envs/mujoco_zh.rst +++ b/source/13_envs/mujoco_zh.rst @@ -1,4 +1,4 @@ -Mujoco +MuJoCo ~~~~~~~ 概述