Note: All summaries/insights(found in the Python notebooks) are written keeping in mind the reader is conversant with the basics of RL and standard RL literature.
-
Playing Atari with Deep Reinforcement Learning, Mnih et al, 2013. Algorithm: DQN. [paper][Summary]
-
Deep Recurrent Q-Learning for Partially Observable MDPs, Hausknecht and Stone, 2015. Algorithm: Deep Recurrent Q-Learning.[paper][Summary]
-
Dueling Network Architectures for Deep Reinforcement Learning, Wang et al, 2015. Algorithm: Dueling DQN.[paper][Summary]
-
Deep Reinforcement Learning with Double Q-learning, Hasselt et al 2015. Algorithm: Double DQN. [paper] [Summary]
-
Prioritized Experience Replay, Schaul et al, 2015. Algorithm: Prioritized Experience Replay (PER). [paper] [Summary]
-
Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al, 2017. Algorithm: Rainbow DQN. [paper][Summary]
-
Asynchronous Methods for Deep Reinforcement Learning, Mnih et al, 2016. Algorithm: A3C.[paper][Summary]
-
Trust Region Policy Optimization, Schulman et al, 2015. Algorithm: TRPO. [paper][Summary]
-
High-Dimensional Continuous Control Using Generalized Advantage Estimation, Schulman et al, 2015. Algorithm: GAE. [paper][Summary]
-
A Distributional Perspective on Reinforcement Learning, Bellemare et al, 2017. Algorithm: C51. [paper][Summary]
-
Distributional Reinforcement Learning with Quantile Regression, Dabney et al, 2017. Algorithm: QR-DQN.[paper][Summary]
- Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic, Gu et al, 2016. Algorithm: Q-Prop.[paper][Summary]
-
VIME: Variational Information Maximizing Exploration, Houthooft et al, 2016. Algorithm: VIME. [paper][Summary]
-
Unifying Count-Based Exploration and Intrinsic Motivation, Bellemare et al, 2016. Algorithm: CTS-based Pseudocounts. [paper][Summary]
Last Updated : 20/9/2020 ✔️