Paper Collection for Batch RL with brief introductions.
Batch RL requires to learn purely from arbiturary datasets (batch data) and do not need to interact with the environment.
It is also knwon as offline RL.
- Tutorial
- Early Work
- Imitation without interacting with environments
- General Batch RL
- Meta/Multi-task
- Data Augmentation
- Benchmarks
- Applied Batch RL
- Representation Learning
- Thoeretical Batch RL
- <Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems> by Sergey Levine, Aviral Kumar, George Tucker, Justin Fu, 2020.
- <Batch Reinforcement Learning> by Sascha Lange, Thomas Gabel, Martin Riedmiller, 2012.
- [LSPI] <Least-squares policy iteration> by Michail G. Lagoudakis, Ronald Parr, 2003.
- [FQI] <Tree-based batch mode reinforcement learning> by Damien Ernst, Pierre Geurts, Louis Wehenkel, 2005.
- [NQI] <Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method> by Martin Riedmiller, 2005.
- <Exponentially Weighted Imitation Learning for Batched Historical Data> by Qing Wang, Jiechao Xiong, Lei Han, Peng Sun, Han Liu and Tong Zhang, NeurIPS 2018.
- [EDM] <Strictly Batch Imitation Learning by Energy-based Distribution Matching> by Daniel Jarrett, Ioana Bica and Mihaela van der Schaar, NeurIPS 2020.
- [MILO] <Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage> by Jonathan Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun, NeurIPS 2021.
- [DQfD] <Deep Q-learning from Demonstrations> by Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys, 2017.
- [NAC] <Reinforcement Learning from Imperfect Demonstrations> by Yang Gao, Huazhe Xu, Ji Lin, Fisher Yu, Sergey Levine, Trevor Darrell, ICML 2018.
- [BEAR] <Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction> by Aviral Kumar, Justin Fu, George Tucker and Sergey Levine, NeurIPS 2019.
- [DualDICE] <Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections> by Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li, ICML 2019.
- [SPIBB] <Safe policy improvement with baseline bootstrapping> by Romain Laroche, Paul Trichelair, Remi Tachet des Combes, ICML 2019.
- <Batch Policy Learning under Constraints> by Hoang M. Le, Cameron Voloshin, Yisong Yue, ICML 2019.
- [BCQ] <Off-Policy Deep Reinforcement Learning without Exploration> by Scott Fujimoto, David Meger and Doina Precup, ICML 2019.
- [2IWIL] <Imitation Learning from Imperfect Demonstration> by Yueh-Hua Wu, Nontawat Charoenphakdee, Han Bao, Voot Tangkaratt, Masashi Sugiyama, ICML 2019.
- <Truly Batch Apprenticeship Learning with Deep Successor Features>, Donghun Lee, Srivatsan Srinivasan and Finale Doshi-Velez, IJCAI 2019.
- [BCQ-Discrete] <Benchmarking Batch Deep Reinforcement Learning Algorithms> by Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, Joelle Pineau, 2019.
- [AWR] <Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning> by Xue Bin Peng, Aviral Kumar, Grace Zhang and Sergey Levine, 2019.
- [BRAC] <Behavior Regularized Offline Reinforcement Learning> by Yifan Wu, George Tucker, Ofir Nachum, 2019.
- [AlgaeDICE] <AlgaeDICE: Policy Gradient from Arbitrary Experience> by Ofir Nachum, Bo Dai, Ilya Kostrikov, Yinlam Chow, Lihong Li, Dale Schuurmans, 2019.
- [ABM] <Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning> by Siegel et al., ICLR 2020.
- [GenDICE] <GenDICE: Generalized Offline Estimation of Stationary Values> by Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans, ICLR 2020.
- [GradientDICE] <GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values> by Shangtong Zhang, Bo Liu, Shimon Whiteson, ICML 2020.
- [REM] <An Optimistic Perspective on Offline Reinforcement Learning> by Rishabh Agarwal, Dale Schuurmans and Mohammad Norouzi, ICML 2020.
- [BOPAH] <Batch Reinforcement Learning with Hyperparameter Gradients> by Byung-Jun Lee, Jongmin Lee, Peter Vrancx, Dongho Kim, Kee-Eung Kim, ICML 2020.
- [OFENet] <Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?> by Kei Ota, Tomoaki Oiki, Devesh K. Jha, Toshisada Mariyama, Daniel Nikovski, ICML 2020.
- [PFQI] <Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning> by Alberto Maria Metelli, Flavio Mazzolini, Lorenzo Bisi, Luca Sabbioni, Marcello Restelli, ICML 2020.
- [PESC-TD(0)] <Reducing Sampling Error in Batch Temporal Difference Learning> by Brahma S. Pavse, Ishan Durugkar, Josiah P. Hanna, Peter Stone, ICML 2020.
- <Provably Good Batch Reinforcement Learning Without Great Exploration> by Yao Liu, Adith Swaminathan, Alekh Agarwal and Emma Brunskill, NeurIPS 2020.
- [ESRL] <Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation> by Aaron Sonabend-W, Junwei Lu, Leo A. Celi, Tianxi Cai, Peter Szolovits, NeurIPS 2020.
- [MBML] <Multi-Task Batch Reinforcement Learning with Metric Learning> by Jiachen Li et al., NeurIPS 2020.
- [BAIL] <BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning> by Xinyue Chen, Zijian Zhou, Zheng Wang, Che Wang, Yanqiu Wu, Keith Ross, NeurIPS 2020.
- [AWAC] <Accelerating Online Reinforcement Learning with Offline Datasets> by Ashvin Nair, Murtaza Dalal, Abhishek Gupta, Sergey Levine, 2020.
- [CQL] <Conservative Q-Learning for Offline Reinforcement Learning> by Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine, NeurIPS 2020.
- [UWAC] <Uncertainty Weighted Offline Reinforcement Learning> by Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua M. Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh, 2020.
- [CRR] <Critic Regularized Regression> by Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas, NeurIPS 2020.
- [DAC-MDP] <DeepAveragers: Offline Reinforcement Learning By Solving Derived Non-Parametric MDPs> by Aayam Shrestha, Stefan Lee, Prasad Tadepalli, Alan Fern, ICLR 2021.
- [OPAL] <OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning> by Anurag Ajay, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum, ICLR 2021.
- [O-RAAC] <Risk-Averse Offline Reinforcement Learning> by Nuria Armengol Urpi, Sebastian Curi, Andreas Krause, ICLR 2021.
- [BREMEN] <Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization> by Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Shane Gu, ICLR 2021.
- [PEBL] <PEBL: Pessimistic Ensembles for Offline Deep Reinforcement Learning> by RJordi Smit, Canmanie T. Ponnambalam, Matthijs T. J. Spaan, Frans A. Oliehoek, IJCAI R2AW 2021.
- [R-BVE] <Regularized Behavior Value Estimation> by Caglar Gulcehre, Sergio Gómez Colmenarejo, Ziyu Wang, Jakub Sygnowski, Thomas Paine, Konrad Zołna,Yutian Chen, Matthew Hoffman, Razvan Pascanu, Nando de Freitas, 2021.
- [COIL] <Curriculum Offline Imitation Learning> by Minghuan Liu, Hanye Zhao, Zhengyu Yang, Jian Shen, Weinan Zhang, Li Zhao, Tie-Yan Liu, NeurIPS 2021.
- [EDAC] <Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble> by Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song, NeurIPS 2021.
- [TD3-BC] <A Minimalist Approach to Offline Reinforcement Learning> by Scott Fujimoto, Shixiang (Shane) Gu, NeurIPS 2021.
- [MOReL] <MOReL : Model-Based Offline Reinforcement Learning> by Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli and Thorsten Joachims, NeurIPS 2020.
- [MOPO] <MOPO: Model-based Offline Policy Optimization> by Tianhe Yu et al., NeurIPS 2020.
- [MOOSE] <Overcoming Model Bias for Robust Offline Deep Reinforcement Learning> by Phillip Swazinna, Steffen Udluft, Thomas Runkler, EAAI 2021.
- [MBOP] <Model-Based Offline Planning> by Arthur Argenson, Gabriel Dulac-Arnold, ICLR 2021.
- <Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization> by Michael R Zhang, Thomas Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, Ziyu Wang, Mohammad Norouzi, ICLR 2021.
- [COMBO] <COMBO: Conservative Offline Model-Based Policy Optimization> by Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn, NeurIPS 2021.
- [MAPLE] <Offline Model-based Adaptable Policy Learning> by Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Qin, Wenjie Shang, Jieping Ye, NeurIPS 2021.
- <Weighted model estimation for offline model-based reinforcement learning> by Toru Hishinuma, Kei Senda, NeurIPS 2021.
- [DT] <Decision Transformer: Reinforcement Learning via Sequence Modeling> by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch, NeurIPS 2021.
- [TT] <Offline Reinforcement Learning as One Big Sequence Modeling Problem> by Michael Janner, Qiyang Li, Sergey Levine, NeurIPS 2021.
- [MBML] <Multi-task Batch Reinforcement Learning with Metric Learning> by Jiachen Li, Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Henrik Christensen, Hao Su, NeurIPS 2020.
- [SMAC] <Offline Meta-Reinforcement Learning with Online Self-Supervision> by Vitchyr H. Pong, Ashvin Nair, Laura Smith, Catherine Huang, Sergey Levine, 2021.
- [MACAW] <Offline Meta-Reinforcement Learning with Advantage Weighting> by Eric Mitchell, Rafael Rafailov, Xue Bin Peng, Sergey Levine, Chelsea Finn, ICML 2021.
- [FOCAL] <FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization> by Lanqing Li, Rui Yang, Dijun Luo, ICLR 2021.
- [CDS] <Conservative Data Sharing for Multi-Task Offline Reinforcement Learning> by Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Sergey Levine, Chelsea Finn, NeurIPS 2021.
- [BOReL] <Offline Meta Reinforcement Learning – Identifiability Challenges and Effective Data Collection Strategies> by Ron Dorfman, Idan Shenfeld, Aviv Tamar, NeurIPS 2021.
- [DrQ] <Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels> by Denis Yarats, Ilya Kostrikov, Rob Fergus, ICLR 2021.
- [DrQ-v2] <Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning> by Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto, 2021.
- [S4RL] <S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning in Robotics> by Samarth Sinha, Ajay Mandlekar, Animesh Garg, CoRL 2021.
-
[D4RL] <D4RL: Datasets for Deep Data-Driven Reinforcement Learning> by Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, Sergey Levine, 2020.
-
<Rl Unplugged: Benchmarks for Offline Reinforcement Learning> by Caglar Gulcehre, et al., 2020.
-
[NeoRL] <NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning> by Rongjun Qin, Songyi Gao, Xingyuan Zhang, Zhen Xu, Shengkai Huang, Zewen Li, Weinan Zhang, Yang Yu, 2021.
-
<Scaling data-driven robotics with reward sketching and batch reinforcement learning> by Ajay Mandlekar, et al., 2019.
-
<Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog> by Natasha Jaques et al., 2019.
-
<Iris: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data> by Ajay Mandlekar, et al., 2020.
- <Representation Matters: Offline Pretraining for Sequential Decision Making> by Mengjiao Yang, Ofir Nachum, ICML 2021.
- <The importance of pessimism in fixed-dataset policy optimization> by Jacob Buckman, Carles Gelada, Marc G. Bellemare, ICLR 2020.
- <Is Pessimism Provably Efficient for Offline RL?> by Ying Jin, Zhuoran Yang, Zhaoran Wang, ICLR 2021.
- <Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism> by Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell, NeurIPS 2021.
- <What are the Statistical Limits of Offline RL with Linear Function Approximation?> by Ruosong Wang, Dean Foster, Sham M. Kakade, ICLR 2021.
- <Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration> by Runzhe Wu, Yufeng Zhang, Zhuoran Yang, Zhaoran Wang, NeurIPS 2021.