Skip to content

Commit

Permalink
polish(nyz): polish env doc details
Browse files Browse the repository at this point in the history
  • Loading branch information
PaParaZz1 committed May 9, 2023
1 parent 3736109 commit 8ce19b2
Show file tree
Hide file tree
Showing 8 changed files with 24 additions and 25 deletions.
2 changes: 1 addition & 1 deletion source/13_envs/d4rl.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
D4RL (Mujoco)
D4RL (MuJoCo)
~~~~~~~~~~~~~~~

Abstract
Expand Down
2 changes: 1 addition & 1 deletion source/13_envs/d4rl_zh.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
D4RL (Mujoco)
D4RL (MuJoCo)
~~~~~~~~~~~~~

概述
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion source/13_envs/index_zh.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,5 +50,5 @@

image_cls_zh
sokoban_zh
Evogym_zh.rst
evogym_zh.rst
metadrive_zh
37 changes: 18 additions & 19 deletions source/13_envs/metadrive.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Metadrive
~~~~~~~~~~~~~~~~~~

Overview
=======
==========

`MetaDrive Env <https://metadrive-simulator.readthedocs.io/en/latest/index.html>`_ is an efficient compositional driving simulator. The goal of the environment is to control a car (or multiple vehicles) from the starting point to the finishing point safely and on time. It has the following properties:

Expand All @@ -14,10 +14,10 @@ Overview
:align: center

Install
====
=========

Installation Method
--------
-------------------

Users can choose to install it with one click through pip, or install it from source.

Expand All @@ -35,7 +35,7 @@ Note: If the user does not have root privileges, please add \ ``--user`` \ after
pip install -e .
Verify Installation
--------
-------------------

After the installation is complete, you can verify that the installation was successful by running the following command on the Python command line:

Expand All @@ -47,19 +47,19 @@ After the installation is complete, you can verify that the installation was suc
print(obs.shape) # output (259,)
Image
----
------

The image of DI-engine is equipped with the framework itself, which can be obtained by \ ``docker pull opendilab/ding:nightly`` \,
or by visiting \ `docker hub <https://hub.docker.com/r/opendilab/ding>`__ \ for more images. Currently, MetaDrive doesn't have a own image.


Space before Transformation (Original Environment)
========================
===================================================

For details, please refer to the code implementation of `MetaDrive <https://github.com/metadriverse/metadrive/blob/main/metadrive/envs/metadrive_env.py>`_ .

Observation space
--------
-----------------

The observation space of the vehicle is a 259-dimensional numpy array, the data type is float32, and the obs shape is (259,). The observation space consists of the following three parts:

Expand All @@ -70,15 +70,15 @@ The observation space of the vehicle is a 259-dimensional numpy array, the data


Action space
--------
------------
The action space of the MetaDrive environment is a 2-dimensional continuous action, and its valid range is [-1, 1]. Through this design, the action space of each agent is fixed as gym.spaces.Box(low=-1.0, high=1.0, shape=(2, )).

- The first dimension represents the steering angle. When the action is 1 or -1, it means that the steering wheel is turned to the left or right to the maximum steering angle, and when it is 0, it means that the steering wheel is facing straight ahead.
- The second dimension represents acceleration or braking. When the range is in the (0,1) interval, it means acceleration, and when the range is in (-1,0), it means braking; when it is 0, it means no action is taken.
- At the same time, it also provides a configuration called extra_action_dim (int). For example, if we set config["extra_action_dim"] = 1, then the action space of each agent will become Box(-1.0, 1.0, shape =(3, )). This allows users to write environment wrappers that introduce more dimensions of input operations.

Reward space
--------
--------------

The default reward function in MetaDrive consists of a dense (obtained during driving) reward and a sparse final reward.

Expand Down Expand Up @@ -116,11 +116,11 @@ Randomness:


Transformed space (RL environment)
=========================
===================================


Observation space
--------
-----------------
Different from the original version, the observation space is described as a 259-dimensional vector. In DI-engine,
the observation space of the car is defined as a top view with a size of 5x84x84, where 5 represents the number of channels, and the last two dimensions (84x84) represent the size of the image for each channel.
The semantics of the five channels are:
Expand All @@ -141,11 +141,11 @@ In the current scenario, the observation of the vehicle can be represented by th


Action Space
--------
--------------
- no change

Reward Space
--------
-------------
- no change

Other
Expand All @@ -158,18 +158,18 @@ Other
========

Lazy Initialization
----------
--------------------

In order to support parallel operations such as environment vectorization, the specific environment instance generally adopts the lazy initialization method, that is, the \ ``__init__`` \ method of the environment does not initialize the real original environment instance, but only sets relevant parameters and configuration values.
Instead, the concrete original environment instance is initialized when the \ ``reset`` \ method is called for the first time.

Random Seed
--------
-------------

- You can use the _reset_global_seed method to set the random seed of the environment. If you do not set it manually, the environment will randomly sample the random seed setting environment.

The difference between training and testing environments
--------------------
The difference between training and evaluation environments
------------------------------------------------------------

- The training environment uses a dynamic random seed, that is, the random seed of each episode is different and is generated by a random number generator, but the seed of this random number generator is fixed by the \ ``seed`` \ method of the environment .
- The test environment uses a static random seed, that is, the random seed of each episode is the same, and is specified by the \ ``seed`` \ method.
Expand Down Expand Up @@ -299,12 +299,11 @@ Here, for a specific configuration file, such as \ ``metadrive_onppo_config.py``
main(main_config)
Benchmark Algorithm Performance
==============
================================

- MetaDrive (the average episode return of the test episodes is greater than or equal to 250, which is regarded as the algorithm converges to an approximate optimal value).

- MetaDrive + PPO

.. image:: images/metadrive_train1.png
:align: center

2 changes: 1 addition & 1 deletion source/13_envs/metadrive_zh.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Metadrive 中文文档
Metadrive
~~~~~~~~~~~~~~~~~~

概述
Expand Down
2 changes: 1 addition & 1 deletion source/13_envs/mujoco.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Mujoco
MuJoCo
~~~~~~~

Overview
Expand Down
2 changes: 1 addition & 1 deletion source/13_envs/mujoco_zh.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Mujoco
MuJoCo
~~~~~~~

概述
Expand Down

0 comments on commit 8ce19b2

Please sign in to comment.