Skip to content

Latest commit

 

History

History
52 lines (34 loc) · 5.15 KB

File metadata and controls

52 lines (34 loc) · 5.15 KB

Summaries of Key Papers in Deep RL

Note: All summaries/insights(found in the Python notebooks) are written keeping in mind the reader is conversant with the basics of RL and standard RL literature. :bowtie:

  1. Model-Free RL

  2. Exploration

Model-Free RL

Deep Q-Learning

  • Playing Atari with Deep Reinforcement Learning, Mnih et al, 2013. Algorithm: DQN. [paper][Summary]

  • Deep Recurrent Q-Learning for Partially Observable MDPs, Hausknecht and Stone, 2015. Algorithm: Deep Recurrent Q-Learning.[paper][Summary]

  • Dueling Network Architectures for Deep Reinforcement Learning, Wang et al, 2015. Algorithm: Dueling DQN.[paper][Summary]

  • Deep Reinforcement Learning with Double Q-learning, Hasselt et al 2015. Algorithm: Double DQN. [paper] [Summary]

  • Prioritized Experience Replay, Schaul et al, 2015. Algorithm: Prioritized Experience Replay (PER). [paper] [Summary]

  • Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al, 2017. Algorithm: Rainbow DQN. [paper][Summary]

Policy Gradient

  • Asynchronous Methods for Deep Reinforcement Learning, Mnih et al, 2016. Algorithm: A3C.[paper][Summary]

  • Trust Region Policy Optimization, Schulman et al, 2015. Algorithm: TRPO. [paper][Summary]

  • High-Dimensional Continuous Control Using Generalized Advantage Estimation, Schulman et al, 2015. Algorithm: GAE. [paper][Summary]

Distributional RL

  • A Distributional Perspective on Reinforcement Learning, Bellemare et al, 2017. Algorithm: C51. [paper][Summary]

  • Distributional Reinforcement Learning with Quantile Regression, Dabney et al, 2017. Algorithm: QR-DQN.[paper][Summary]

Policy Gradients with Action-Dependent Baselines

  • Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic, Gu et al, 2016. Algorithm: Q-Prop.[paper][Summary]

Exploration

Intrinsic Motivation

  • VIME: Variational Information Maximizing Exploration, Houthooft et al, 2016. Algorithm: VIME. [paper][Summary]

  • Unifying Count-Based Exploration and Intrinsic Motivation, Bellemare et al, 2016. Algorithm: CTS-based Pseudocounts. [paper][Summary]

Unsupervised RL

  • Variational Intrinsic Control, Gregor et al, 2016. Algorithm: VIC.[paper][Summary]

Last Updated : 20/9/2020 ✔️