A curated list of papers that apply representation learning (RepL) in reinforcement learning (RL).
A major reason to apply RepL in RL is to solve problems with high-dimensional state-action spaces. Another motivation of applying RepL in RL is to improve the sample efficiency problem. Specifically, we usually want to incorporate some inductive biases, i.e., structural information, about the tasks/envs into the representations towards better performance.
- Prevalent RL methods requires lots of supervisions.
- Instead of only learning from reward signals, we can also learn from the collected data.
- Previous methods are sample inefficient in vision-based RL.
- Good representations can accelerate learning from images.
- Most of current RL agents are task-specific.
- Good representations can generalize well across different tasks, or adapt quickly to new tasks.
- Effective exploration is challenging in many RL tasks.
- Good representations can accelerate exploration.
- Sequential data
- Interactive learning tasks
Some popular methods of applying RepL in RL.
- Auxiliary tasks, i.e., reconstruction, MI maximization, entropy maximization, dynamics prediction.
- ACL, APS, AVFs, CIC, CPC, DBC, Dreamer, DreamerV2, DyNE, IDDAC, PBL, PI-SAC, PlaNet, RCRL, SLAC, SAC-AE, SPR, ST-DIM, TIA, UNREAL, Value-Improvement Path, World Model.
- Contrastive learning.
- ACL, ATC, Contrastive Fourier, CURL, RCRL, CoBERL.
- Data augmentation.
- DrQ, DrQ-v2, PSEs, RAD.
- Bisimulation.
- DBC, PSEs.
- Causal inference.
- MISA.
- Self-supervision for Reinforcement Learning @ ICLR 21
- Unsupervised Reinforcement Learning @ ICML 2021
- Self-Supervised Learning
- Invariant Representation Learning
- [arXiv' 18][CPC] Representation Learning with Contrastive Predictive Coding
- [NeurIPS' 19][AVFs] A Geometric Perspective on Optimal Representations for Reinforcement Learning
- [NeurIPS' 19] Discovery of Useful Questions as Auxiliary Tasks
- [NeurIPS' 19][ST-DIM] Unsupervised state representation learning in atari (Code)
- [NeurIPS' 20][PI-SAC] Predictive Information Accelerates Learning in RL (Code)
- [NeurIPS' 20][SLAC] Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Mode (Code)
- [NeurIPS' 20][RAD] Reinforcement Learning with Augmented Data (Code)
- [ICML' 20][CURL] Contrastive Unsupervised Representations for Reinforcement Learning (Code)
- [ICLR' 20][DynE] Dynamics-aware Embeddings (Code)
- [NeurIPS' 21] An Empirical Investigation of Representation Learning for Imitation (Code)
- [NeurIPS' 21][SGI] Pretraining Representations for Data-Efficient Reinforcement Learning (Code)
- [AAAI' 21][SAC-AE] Improving Sample Efficiency in Model-Free Reinforcement Learning from Images (Code)
- [AAAI' 21][Value-Improvement Path] Towards Better Representations for Reinforcement Learning
- [AISTATS' 21] On The Effect of Auxiliary Tasks on Representation Dynamics
- [ICLR' 21][SPR] Data-Efficient RL with Self-Predictive Representations (Code)
- [ICLR' 21][DBC] Learning invariant representations for reinforcement learning without reconstruction (Code)
- [ICLR' 21][DrQ] Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels (Code)
- [ICLR' 21][RCRL] Return-based Contrastive Representation Learning for RL
- [ICML' 21][ATC] Decoupling representation learning from reinforcement learning (Code)
- [ICML' 21][APS] Active Pretraining with Successor Features
- [ICML'21][IDDAC] Decoupling Value and Policy for Generalization in Reinforcement Learning (Code)
- [ICLR' 22][DrQ-v2] Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning (Code)
- [ICLR' 22][CoBERL][Contrastive BERT for Reinforcement Learning]
- [arXiv' 22][R3M] A Universal Visual Representation for Robot Manipulation (Code)
- [ICML' 19] DeepMDP: Learning Continuous Latent Space Models for Representation Learning
- [ICML' 20] Learning with Good Feature Representations in Bandits and in RL with a Generative Model
- [ICML' 20] Representations for Stable Off-Policy Reinforcement Learning ❤️
- [ICLR' 20] Is a good representation sufficient for sample efficient reinforcement learning?
- [ICLR' 21] Impact of Representation Learning in Linear Bandits
- [arXiv' 21] Model-free Representation Learning and Exploration in Low-rank MDPs
- [arXiv' 21] Representation Learning for Online and Offline RL in Low-rank MDPs ❤️
- [arXiv' 21] Action-Sufficient State Representation Learning for Control with Structural Constraints
- [arXiv' 21] Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions
- [Model-free Representation Learning and Exploration in Low-rank MDPs]
- [FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs]
- [Representation Learning for Online and Offline RL in Low-rank MDPs]
- [Provably Efficient Representation Learning in Low-rank Markov Decision Processes]
- [NeurIPS' 21][Contrastive Fourier] Provable Representation Learning for Imitation with Contrastive Fourier Features (Code)
- [NeurIPS' 21][DR3] DR3: Value-Based Deep RL Requires Explicit Regularization ❤️
- [ICLR' 21] Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning ❤️
- [ICML' 21][ACL] Representation Matters: Offline Pretraining for Sequential Decision Making (Code)
- [ICML' 21] Instabilities of offline rl with pre-trained neural representation
- [NeurIPS' 18][World Model] Recurrent World Models Facilitate Policy Evolution
- [ICML' 19][PlaNet] Learning Latent Dynamics for Planning from Pixels (Code)
- [ICLR' 20][Dreamer] Dream to Control: Learning Behaviors by Latent Imagination (Code)
- [ICLR' 21][DreamerV2] Mastering Atari with Discrete World Models (Code)
- [ICML' 21][TIA] Learning Task Informed Abstractions (Code)
- [ICLR' 17][UNREAL] Reinforcement Learning with Unsupervised Auxiliary Tasks
- [ICML' 20][PBL] Bootstrap latent-predictive representations for multitask reinforcement learning
- [NeurIPS' 20] Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning (Code)
- [ICML' 21][RL-Proto] Reinforcement Learning with Prototypical Representations (Code)
- [ICML WS' 21][FittedKDE] Density-Based Bonuses on Learned Representations for Reward-Free Exploration in Deep Reinforcement Learning
-
[ICML' 20][MISA] Invariant Causal Prediction for Block MDPs (Code)
-
[ICML' 21][IDAAC] Decoupling Value and Policy for Generalization in Reinforcement Learning
-
[arXiv' 22][CIC] Contrastive Intrinsic Control for Unsupervised Skill Discovery (Code)
-
[AISTATS' 22] On the Generalization of Representations in Reinforcement Learning ❤️