Self driving car benchmark based on the original DeepDrive simulator with added support for multi-objective RL.
To the best of our knowledge, the development for both the DeepDrive and DeepDrive-Zero benchmarks has been abandoned. There has been no update since June 2020 for the DeepDrive benchmark and November 2021 for the DeepDrive-Zero benchmark. Both benchmarks fail during installation due to outdated libraries and API calls. However, the DeepDrive zero benchmark is still a very interesting simulator for the development and introduction to the field of single-agent, multi-agent, single-objective and multi-objective reinforcement learning. Moreover, looking closer at the objective in the DeepDrive benchmark reveals that it is constructed of multiple objectives where those objectives are combined into a scalar reward. As such, we deconstruct this benchmark, reveal its underlying multi-objective nature, and update, extend, and modify it to serve as a novel benchmark for multi-objective (multi-agent) reinforcement learning literature.
from deepdrive_zero.envs.variants import OneWaypointEnv
config = {"multi_objective": True}
env = OneWaypointEnv(env_configuration=config, render_mode=None)
The OneWayPoint environment is the simplest environment variation in the benchmark. In this scenario, a single vehicle must drive to the end destination, highlighted as an orange circle. The placement of the orange circle is random and changes every episode.
The OneWayPoint environment. (Ignore the blue lines/path)
from deepdrive_zero.envs.variants import StaticObstacleEnv
config = {"multi_objective": True}
env = StaticObstacleEnv(env_configuration=config, render_mode=None)
The StaticObstacle environment is the second most challenging environment. This scenario includes a static obstacle which is represented as a bike. Here, a single vehicle must drive to the end destination without causing a collision with the bike. Both the placement of the obstacle and the end destination are random.
The StaticObstacle environment. (Ignore the blue lines/path)
from deepdrive_zero.envs.variants import IntersectionEnv
config = {"multi_objective": True}
env = IntersectionEnv(env_configuration=config, render_mode=None)
The Intersection environment is the most difficult environment and the only multi-agent one. This environment represents the unprotected left scenario, one of the most difficult scenarios for both autonomous vehicles and human drivers. In this scenario, two vehicles need to cross an intersection to reach two different destinations without colliding with each other. Both vehicles should also not deviate from their lane. Vehicle one (the vehicle at the bottom right of the intersection) has the easiest trajectory as it only needs to go straight until it reaches the orange circle. Vehicle two has the most difficult trajectory as it needs to go straight, turn to the left and reach the green circle at the left side of the intersection. A collision or not reaching the destination after a certain timestep ends the episode.
The multi-agent Intersection environment.
You can choose between a discrete or continous action space.
-
Discrete actions: (Between 0 and 21)
- 0: 'IDLE'
- 1: 'DECAY_STEERING_MAINTAIN_SPEED'
- ...
- 20: 'LARGE_STEER_RIGHT_DECREASE_SPEED',
- 21: 'LARGE_STEER_RIGHT_INCREASE_SPEED',
-
Continous actions: [steer, accel, brake] continous value between -1 and 1 for each action
- steer: Heading angle of the ego
- accel: m/s/s of the ego, positive for forward, negative for reverse
- brake: From 0g at -1 to 1g at 1 of brake force
Action Space Variable Range steer {-1, 1} accel {-1, 1} brake {-1, 1}
This benchmarj supports multi-objective RL and can be configured to return a vectorized reward instead of a scalar reward.
This can be done by configuring the env_config function of env after calling env.make() and adding the following
property multi_objective=True
.
Each environment has the following objectives:
OneWayPoint:
StaticObstacle:
Intersection:
The distance objective measures how close the vehicle is to the end destination and the destination objective will either return one if the destination was reached and 0 if not. Instead of enforcing a penalty for g-force, jerk and lane violation, we return the actual g-force and jerk value and the distance margin of lane violation. As a general rule, we want to maximise the distance and destination objectives and minimise the g-force, jerk, collision and lane violation objectives
git clone https://github.com/sofyanajridi/mo-deepdrive
cd mo-deepdrive
pip install .
from deepdrive_zero.envs.variants import OneWaypointEnv, StaticObstacleEnv, IntersectionEnv
from deepdrive_zero.discrete_actions.comfortable_actions2 import COMFORTABLE_ACTIONS2
config = {"multi_objective": True,
'discrete_actions': COMFORTABLE_ACTIONS2}
env = OneWaypointEnv(env_configuration=config, render_mode=None)
observation, info = env.reset()
for _ in range(1000):
action = env.action_space.sample()
observation, vectorial_reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
Read the documentation online.
- env.render() does not work anymore, see Issues.
Each environment can be further configured on the following properties:
Variables
- jerk_penalty_coeff (default 0)
- gforce_penalty_coeff (default 0)
- lane_penalty_coeff (default 0.02)
- collision_penalty_coeff (default 0.31)
- speed_reward_coeff (default 0.5)
- win_coefficient (default 1)
- gforce_threshold (default 1)
- jerk_threshold (default None)
- constrain_controls (default False)
- ignore_brake (default False)
- forbid_deceleration (default False)
- expect_normalized_action_deltas (default False)
- discrete_actions
- incent_win (default True)
- dummy_accel_agent_indices (default None)
- wait_for_action (default False)
- incent_yield_to_oncoming_traffic (default True)
- physics_steps_per_observation (default 6)
- end_on_lane_violation (default False)
- lane_margin (default 0)
- is_intersection_map (default True)
- end_on_harmful_gs (default False)