Skip to content

Deep Recurrent Q-Network with different exploration strategies for self-driving cars (using AirSim)

Notifications You must be signed in to change notification settings

ValentinaZangirolami/DRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dealing with uncertainty: balancing exploration and exploitation in deep recurrent reinforcement learning

Description

This repo contains an implementation of Double Dueling Deep Recurrent Q-Network which can be enhanced with several exploration strategies, like deterministic epsilon-greedy, adaptive epsilon-greedy (VDBE and BMC) [1], softmax, max-boltzmann exploration and VDBE-softmax, and an error masking strategy [2], [4].

Code Structure:

  • ./AirsimEnv/: folder where the two environments ( AirsimEnv.py and AirsimEnv_9actions.py ) are stored; the former includes five steering angles and the latter nine steering angles. Further, this folder contains:
    • DRQN_classes.py: implementation of agent, experience replay, exploration strategies, neural network and connection with AirSim NH are defined
    • bayesian.py: a support for BMC epsilon-greedy
    • final_reward_points.csv: a support for reward calculation (required for env scripts)
  • DRQN_airsim_training.py: contains training loop in which all files in the previous points are required (main script for training process)
  • DRQN_evaluation.py: contains training and test evaluation; each subset is defined with a different set of starting points to evaluate the model performance
  • The new implementation in Tensorflow 2.x is now available. You can check the implementation of all exploration strategies in the previous version while see updates of the neural network in the new code.

Prerequisites

  • Python 3.7.6
  • Tensorflow 2.5.0
  • Tornado 4.5.3
  • OpenCV 4.5.2.54
  • OpenAI Gym 0.18.3
  • Airsim 1.5.0

Hardware

  • 2 GPU Tesla M60 with 8 Gb

References

[1] Gimelfarb, M., S. Sanner, and C.-G. Lee, 2020: ε-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning. CoRR

[2] Juliani A., 2016: Simple Reinforcement Learning with Tensorflow Part 6: Partial Observability and Deep Recurrent Q-Networks. URL: https://github.com/awjuliani/DeepRL-Agents

[3] Riboni, A., A. Candelieri, and M. Borrotti, 2021: Deep Autonomous Agents comparison for Self-Driving Cars. Proceedings of The 7th International Conference on Machine Learning, Optimization and Big Data - LOD

[4] Welcome to AirSim, https://microsoft.github.io/AirSim/

How to cite

Zangirolami, V. and M. Borrotti, 2024: Dealing with uncertainty: balancing exploration and exploitation in deep recurrent reinforcement learning. In: Knowledge-Based Systems 293. Paper

Acknowledgements

I acknowledge Data Science Lab of Department of Economics, Management and Statistics (DEMS) of University of Milan-Bicocca for providing a virtual machine.

DEMO

DRQN-bmc.mp4