Awesome Explainable Reinforcement Learning

A list of selected paper and possible corresponding codes in our review paper A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges.

If you find there is a missed paper or a possible mistake in our survey, please feel free to email me (qingyunpeng@zju.edu.cn) or pull a request here. I am more than glad to receive your advice. Thanks!

Citation

If you find this survey useful for your research, please consider citing

@article{qing2022XRLsurvey,
  title={A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges},
  author={Qing, Yunpeng and Liu, Shunyu and Song, Jie and Wang, Huiqiong and Song, Mingli},
  journal={arXiv preprint arXiv:2211.06665},
  year={2022}
}

🔥 News

2023.11.01: We have updated our review paper with the latest revisions and incorporated newly published research from 2022 to 2023.

✨ Overview

📖 RL paradigm-based Explainable RL Taxonomy
👓 Review of human knowledge-based RL explainability
🚀 List of Current XRL research literatures and codes links

In this survey, we provide a comprehensive review of existing works on eXplainable Reinforcement Learning (XRL) and introduce a new taxonomy where prior works are clearly categorized into agent model-explaining, reward-explaining, state-explaining, and task-explaining methods. We also review and highlight RL methods that conversely leverage human knowledge to promote learning efficiency and performance of agents while this kind of method is often ignored in XRL field.

To know more about existing XRL framework and our taxonomy, the existing XRL papers within different typs are listed below and summerized in the next Figure. These literatures are categorize into our taxonomy. For each paper, we also include a link to its open-source code if available.

📝 Surveys

Explainable Reinforcement Learning: A Survey
- E. Puiutta and E. Veith. CD-MAKE 2020. [paper]
A Survey on Interpretable Reinforcement Learning
- C. Glanois, P. Weng, M. Zimmer, D. Li, T. Yang, J. Hao and W. Liu. arXiv 2021. [paper]
Explainable Reinforcement Learning for Broad-XAI: A Conceptual Framework and Survey
- R. Dazeley, P. Vamplew and F.Cruz. arXiv 2021. [paper]
Explainable AI and Reinforcement Learning—A Systematic Review of Current Approaches and Trends
- Lindsay Wells and Tomasz Bednarz. FRAI 2021. [paper]
Explainability in deep reinforcement learning
- A. Heuillet, F. Couthouis and N. Díaz-Rodríguez. KBS 2021. [paper]
Explainable Deep Reinforcement Learning: State of the Art and Challenges
- G. Vouros. CSUR 2022. [paper]
Explainable Reinforcement Learning: A Survey and Comparative Review
- S. Milani, N. Topin, M .Veloso and F. Fang. CSUR 2023. [paper]

⬆ back to top

Explainability in RL

Agent Model-Explaining

Self-Explainable

MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning
- S. Milani, Z. Zhang, N. Topin, Z. Shi, C. Kamhoua, E. Papalexakis and F. Fang. arXiv 2022. [paper] [code]
Learning to synthesize programs as interpretable and generalizable policies
- D. Trivedi, J. Zhang, S. Sun and J. Lim. NeurIPS 2021. [paper] [code]
Symbolic Regression via Deep Reinforcement Learning Enhanced Genetic Programming Seeding
- T. Mundhenk, M. Landajuela, R. Glatt, C. Santiago, D. Faissol and B. Petersen. NeurIPS 2021. [paper] [code]
Discovering symbolic policies with deep reinforcement learning
- M. Landajuela, B. Petersen, S. Kim, C. Santiago, R. Glatt, T. Mundhenk, J. Pettit and D. Faissol. ICML 2021. [paper] [code]
Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods
- N. Topin, S. Milani, F. Fang and M Veloso. AAAI 2021. [paper]
Incorporating relational background knowledge into reinforcement learning via differentiable inductive logic programming
- A. Payani and F. Fekri. arXiv 2020. [paper] [code]
Evolutionary learning of interpretable decision trees
- LL. Custode and G. Iacca. arXiv 2020. [paper] [code]
Optimization methods for interpretable differentiable decision trees applied to reinforcement learning
- A. Silva, M. Gombolay, T. Killian, I. Jimenez and S. Son. AISTATS 2020. [paper]
Neurosymbolic transformers for multi-agent communication
- J. Inala, Y. Yang, J. Paulos, Y. Pu, O. Bastani, V. Kumar, M. Rinard and A. Solar-Lezama. NeurIPS 2020. [paper] [code]
Generating interpretable reinforcement learning policies using genetic programming
- D. Hein, S. Udluft and T. Runkler. GECCO 2019. [paper]
Imitation-projected programmatic reinforcement learning
- A. Verma, H. Le, Y. Yue and S. Chaudhuri. NeurIPS 2019. [paper] [code]
Towards Reinforcement Learning of Human Readable Policies
- R. Akrour, D. Tateo and J. Peters. ECML-PKDD workshop 2019. [paper]
Neural Logic Reinforcement Learning
- Z. Jiang and S. Luo. ICML 2019. [paper] [code]
Inductive logic programming via differentiable deep neural logic networks
- A. Payani and F. Fekri. arXiv 2019. [paper] [code]
Conservative q-improvement: Reinforcement learning for an interpretable decision-tree policy
- A. Roth, N. Topin, P. Jamshidi and M. Veloso. arXiv 2019. [paper] [code]
Generation of policy-level explanations for reinforcement learning
- N. Topin and M. Veloso. AAAI 2019. [paper]
Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees
- G. Liu, O. Schulte, W. Zhu and Q. Li. ECML-PKDD 2018. [paper] [code]
Programmatically Interpretable Reinforcement Learning
- A. Verma, V. Murali, R. Singh, P. Kohli and S. Chaudhuri. ICML 2018. [paper] [code]
Interpretable policies for reinforcement learning by genetic programming
- D. Hein, S. Udluft and T. Runkler. EAAI 2018. [paper]
Verifiable Reinforcement Learning via Policy Extraction
- O. Bastani, Y. Pu and A. Solar-Lezama. NeurIPS 2018. [paper] [code]
Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies
- D. Hein, A. Hentschel,T Runkler and S Udluft. EAAI 2017. [paper]
Policy Search in a Space of Simple Closed-form Formulas: Towards Interpretability of Reinforcement Learning
- F. Maes, R. Fonteneau, L. Wehenkel and D. Ernst. DS 2012. [paper]
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
- S. Ross, G. Gordon and D. Bagnell. JMLR 2011. [paper] [code]

Explanation Generating

Explainable Multi-Agent Reinforcement Learning for Temporal Queries
- K. Boggess, S. Kraus and L. Feng. IJCAI 2023. [paper] [code]
Explainable Reinforcement Learning via a Causal World Model
- Z. Yu, J. Ruan and D. Xing. IJCAI 2023. [paper] [code]
A CEGAR-Driven Training and Verification Framework for Safe Deep Reinforcement Learning
- P. Jin, J. Tian, D. Zhi, X. Wen and M. Zhang. CAV 2022. [paper] [code]
Toward Policy Explanations for Multi-Agent Reinforcement Learning
- K. Boggess, S. Kraus and L. Feng. IJCAI 2022. [paper] [code]
Counterfactual state explanations for reinforcement learning agents via generative deep learning
- M. Olson, R. Khanna, L. Neal, F. Li and W. Wong. AI 2021. [paper] [code]
Generating high-quality explanations for navigation in partially-revealed environments
- G. Stein. NeurIPS 2021. [paper] [code]
Explainable Reinforcement Learning through a Causal Lens
- P. Madumal, T. Miller, L. Sonenberg and F. Vetere. AAAI 2020. [paper] [code]
Neurosymbolic reinforcement learning with formally verified exploration
- G. Anderson, A. Verma, I. Dillig and S. Chaudhuri. NeurIPS 2020. [paper] [code]
An inductive synthesis framework for verifiable reinforcement learning
- H. Zhu, Z. Xiong, S. Magill and S. Jagannathan. PLDI 2019. [paper]
Verifying Deep-RL-Driven Systems
- Y. Kazak, C. Barrett, G. Katz and M. Schapira SIGCOMM 2019 workshop. [paper]
Autonomous self-explanation of behavior for interactive reinforcement learning agents
- Y. Fukuchi, M. Osawa, H. Yamakawa and M. Imai. HAI 2017. [paper]
Application of Instruction-Based Behavior Explanation to a Reinforcement Learning Agent with Changing Policy
- Y. Fukuchi, M. Osawa, H. Yamakawa and M. Imai. ICONIP 2017. [paper]
Improving Robot Controller Transparency Through Autonomous Policy Explanation
- B. Hayes and J. Shah. HRI 2017. [paper]

⬆ back to top

Reward Explaining

Reward Decomposition

Shapley counterfactual credits for multi-agent reinforcement learning
- J. Li, K. Kuang, B. Wang, F. Liu, L. Chen, F. Wu and J. Xiao. SIGKDD 2021. [paper]
Shapley Q-value: A local reward approach to solve global reward games
- J. Wang, Y. Zhang, T. Kim and Y. Gu. AAAI 2020. [paper] [code]
Explainable reinforcement learning via reward decomposition
- Z. Juozapaitis, A. Koul. A. Fern, M. Erwig and F. Doshi-Velez. IJCAI/ECAI workshop 2019. [paper]
Counterfactual multi-agent policy gradients
- J. Foerster, G. Farquhar, T. Afouras, N. Nardelli and S. Whiteson. AAAI 2018. [paper] [code]

Reward Shaping

Creativity of AI: Automatic Symbolic Option Discovery for Facilitating Deep Reinforcement Learning
- M. Jin, Z. Ma, K. Jin, H. Zhuo, C. Chen and C. Yu. AAAI 2022. [paper]
Dynamic Inverse Reinforcement Learning for Characterizing Animal Behavior
- Z. Ashwood, A. Jha and JW. Pillow. NeurIPS 2022. [paper] [code]
Self-supervised attention-aware reinforcement learning
- H. Wu, K. Khetarpal and D. Precup. AAAI 2021. [paper]
Ella: Exploration through learned language abstraction
- S. Mirchandani, S. Karamcheti and D. Sadigh. NeurIPS 2021. [paper] [code]
Tree-structured policy based progressive reinforcement learning for temporally language grounding in video
- J. Wu, G. Li, S. Liu and L. Lin. AAAI 2020. [paper] [code]
Improving Human-Robot Interaction Through Explainable Reinforcement Learning
- A. Tabrez and B. Hayes. HRI 2019. [paper]
SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning
- D. Lyu, F. Yang, B. Liu and S. Gustafson. AAAI 2019. [paper]

⬆ back to top

State Explaining

History Trajectory

Towards interpretable deep reinforcement learning with human-friendly prototypes
- EM. Kenny, M. Tucker and J. Shah. ICLR 2023 [paper] [code]
Collective explainable AI: Explaining cooperative strategies and agent contribution in multiagent reinforcement learning with shapley values
- A. Heuillet, F. Couthouis and N. Díaz-Rodríguez. CIM 2022 [paper] [code]
ProtoX: Explaining a Reinforcement Learning Agent via Prototyping
- R. Ragodos, T. Wang, Q. Lin and X. Zhou. NeurIPS 2022 [paper] [code]
Explainable ai in deep reinforcement learning models for power system emergency control
- K. Zhang, J. Zhang, P. Xu, T. Gao and D. Gao. TCSS 2021. [paper]
Edge: Explaining deep reinforcement learning policies
- W. Guo, X. Wu, U. Khan and X. Xing. NeurIPS 2021. [paper] [code]
Interestingness elements for explainable reinforcement learning: Understanding agents' capabilities and limitations
- P. Sequeira and M. Gervasio. AI 2020. [paper] [code]
Visual sparse Bayesian reinforcement learning: a framework for interpreting what an agent has learned
- I. Mishra, G. Dao and M. Lee. SSCI 2018. [paper]
Robust bayesian inverse reinforcement learning with sparse behavior noise
- J. Zheng, S. Liu and L. Ni. AAAI 2014. [paper]

Current Observation

Explaining reinforcement learning with shapley values
- D. Beechey, TMS. Smith and Ö. Şimşek ICML 2023. [paper] [code]
Training characteristic functions with reinforcement learning: Xai-methods play connect four
- S. Waldchen, F. Huber and S. Pokutta. ICML 2022. [paper]
Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning
- D. Bertoin, A. Zouitine, M. Zouitine and E. Rachelson. NeurIPS 2022. [paper]
Inherently explainable reinforcement learning in natural language
- X. Peng, M. Riedl and P. Ammanabrolu. NeurIPS 2022. [paper] [code]
Machine versus human attention in deep reinforcement learning tasks
- S. Guo, R. Zhang, B. Liu, Y. Zhu, D. Ballard, M. Hayhoe and P. Stone. NeurIPS 2021. [paper]
The sensory neuron as a transformer: Permutation-invariant neural networks for reinforcement learning
- Y. Tang and D. Ha. NeurIPS 2021. [paper] [code]
Neuroevolution of self-interpretable agents
- Y. Tang, D. Nguyen and D Ha. GECCO 2020. [paper] [code]
Deep reinforcement learning with stacked hierarchical attention for text-based games
- Y. Xu, M. Fang, L. Chen, Y. Du, J. Zhou and C. Zhang. NeurIPS 2020. [paper] [code]
Xgail: Explainable generative adversarial imitation learning for explainable human decision analysis
- M. Pan, W. Huang, Y. Li, X. Zhou and J. Luo. SIGKDD 2020. [paper] [code]
Towards better interpretability in deep q-networks
- R. Annasamy and K. Sycara. AAAI 2019. [paper] [code]
Alphastock: A buying-winners-and-selling-losers investment strategy using interpretable deep reinforcement attention networks
- J. Wang, Y. Zhang, K. Tang, J. Wu and Z. Xiong. SIGKDD 2019. [paper]
Social attention for autonomous decision-making in dense traffic
- E. Leurent and J. Mercat. arXiv 2019. [paper] [code]
DQNViz: A Visual Analytics Approach to Understand Deep Q-Networks
- J. Wang, L. Gou, H. Shen and H. Yang. TVCG 2018. [paper]
Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees
- G. Liu, O. Schulte, W. Zhu and Q. Li. ECML-PKDD 2018. [paper] [code]
Learn to interpret atari agents
- Z. Yang, S. Bai, L. Zhang and P. Torr. arXiv 2018. [paper] [code]
Unsupervised video object segmentation for deep reinforcement learning
- V. Goel, J. Weng and P. Poupart. NeurIPS 2018. [paper] [code]
Visualizing and Understanding Atari Agents
- S. Greydanus, A. Koul, J. Dodge and A. Fern. ICML 2018 [paper] [code]
Rise: Randomized input sampling for explanation of black-box models
- V. Petsiuk, A. Das and K. Saenko. ICML 2018. [paper] [code]
Transparency and Explanation in Deep Reinforcement Learning Neural Networks
- R. Iyer, Y. Li, H. Li, M. Lewis, R. Sundar, K. Sycara. AIES 2018. [paper] [code]

Future Prediction

What did you think would happen? explaining agent behaviour through intended outcomes
- H. Yau, C. Russell and S. Hadfield. NeurIPS 2020. [paper] [code]
Weakly-supervised reinforcement learning for controllable behavior
- L. Lee, B. Eysenbach, R. Salakhutdinov, S. Gu and C. Finn. NeurIPS 2020. [paper]
Semantic Predictive Control for Explainable and Efficient Policy Learning
- X. Pan; X. Chen; Q. Cai; J. Canny and F. Yu. ICRA 2019. [paper]
Safe Reinforcement Learning With Model Uncertainty Estimates
- B. Lütjens, M. Everett and J. How. ICRA 2019. [paper]
Contrastive explanations for reinforcement learning in terms of expected consequences
- J. Waa, J. Diggelen, K. Bosch and M. Neerincx. arXiv 2018. [paper]

⬆ back to top

Task Explaining

Whole Top-down Structure

A Boolean task algebra for reinforcement learning
- G. Nangue Tasse, S. James and B. Rosman. NeurIPS 2020. [paper] [code]
Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning
- T. Shu, C. Xiong and R. Socher. ICLR 2018. [paper]

Simple Task Division

Multi-task reinforcement learning with context-based representations
- S. Sodhani, A. Zhang and J. Pineau. ICML 2021. [paper] [code]
Model primitives for hierarchical lifelong reinforcement learning
- B. Wu, J. Gupta and M. Kochenderfer. AAMAS 2020. [paper] [code]
Language as an abstraction for hierarchical deep reinforcement learning
- Y. Jiang, S. Gu, K. Murphy and C.Finn. NeurIPS 2019. [paper] [code]
SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning
- D. Lyu, F. Yang, B. Liu and S. Gustafson. AAAI 2019. [paper]
Dot-to-dot: Explainable hierarchical reinforcement learning for robotic manipulation
- B. Beyret, A. Shafti, A. Faisal. IROS 2019. [paper]

⬆ back to top

🤖️ Human knowledge for RL paradigm

Fuzzy Controller Representing Human Knowledge

Fuzzy Action-Masked Reinforcement Learning Behavior Planning for Highly Automated Drivin
- T. Rudolf, M. Gao, T. Schürmann, S. Schwab and S. Hohmann. ICCAR 2022. [paper]
Efficient hierarchical policy network with fuzzy rules
- W. Shi, Y. Feng, H. Huang, Z. Liu, J. Huang and G. Cheng. IJMLC 2022. [paper]
KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge
- P. Zhang, J. Hao, W. Wang, H. Tang, Y. Ma, Y. Duan and Y. Zheng. arXiv 2020. [paper]

Dense Reward on Human Command

Using Natural Language for Reward Shaping in Reinforcement Learning
- P. Goyal, S. Niekum and R. Mooney. arXiv 2019. [paper] [code]

Learn Mattered Features from Human Interactions

Curricular Subgoals for Inverse Reinforcement Learning
- S. Liu, Y. Qing, S. Xu, H. Wu, J. Zhang, J. Cong, T. Chen, Y. Liu and M. Song. arXiv. [paper] [code]
Local explanations for reinforcement learning
- R. Luss, A. Dhurandhar, M. Liu. AAAI 2023. [paper]
Textual Explanations for Self-Driving Vehicles
- J. Kim, A. Rohrbach, T. Darrell, J. Canny and Z. Akata. ECCV 2018. [paper] [code]

Subtask Scheduling with Human Annotation

Towards Effective and Interpretable Human-Agent Collaboration in MOBA Games: A Communication Perspective
- Y. Gao, F. Liu, L. Wang, Z. Lian, W. Wang, S. Li, X. Wang, X. Zeng, R. Wang, J. Wang, Q. Fu, W. Yang, L. Huang and W. Liu. ICLR 2023. [paper] [code]
LISA: Learning interpretable skill abstractions from language
- D. Garg, S. Vaidyanath, K. Kim, J. Song and S. Ermon. NeurIPS 2022. [paper]
Perceiving the world: Question-guided reinforcement learning for text-based games
- Y. Xu, M. Fang, L. Chen, Y. Du, JT. Zhou and C. Zhang. ACL 2022. [paper] [code]
Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning
- V. Chen, A. Gupta, K. Marino. arXiv 2020. [paper] [code]

⬆ back to top

🏠 Explainable AI Library

As for completeness, we also list the library of explainable AI methods to tackle the balck box problem of AI methods. They can emhance the AI model with transparency and explainability.

Explainable AI library	GitHub Stars
Aequitas
Alibi Explain
Captum
DeepVis Toolbox
ELI5
InterpretML
IBM AI Explainability 360
iModels
LIME
OmniXAI
SHAP

Show your support

Please ⭐️ this repository if this project helped you!

Starchart

⬆ back to top

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
Fig		Fig
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Explainable Reinforcement Learning

Citation

🔥 News

Table of Contents

✨ Overview

📝 Surveys

Explainability in RL

Agent Model-Explaining

Self-Explainable

Explanation Generating

Reward Explaining

Reward Decomposition

Reward Shaping

State Explaining

History Trajectory

Current Observation

Future Prediction

Task Explaining

Whole Top-down Structure

Simple Task Division

🤖️ Human knowledge for RL paradigm

Fuzzy Controller Representing Human Knowledge

Dense Reward on Human Command

Learn Mattered Features from Human Interactions

Subtask Scheduling with Human Annotation

🏠 Explainable AI Library

Show your support

Starchart

About

Releases

Packages

nlopt/awesome-explainable-reinforcement-learning

Folders and files

Latest commit

History

Repository files navigation

Awesome Explainable Reinforcement Learning

Citation

🔥 News

Table of Contents

✨ Overview

📝 Surveys

Explainability in RL

Agent Model-Explaining

Self-Explainable

Explanation Generating

Reward Explaining

Reward Decomposition

Reward Shaping

State Explaining

History Trajectory

Current Observation

Future Prediction

Task Explaining

Whole Top-down Structure

Simple Task Division

🤖️ Human knowledge for RL paradigm

Fuzzy Controller Representing Human Knowledge

Dense Reward on Human Command

Learn Mattered Features from Human Interactions

Subtask Scheduling with Human Annotation

🏠 Explainable AI Library

Show your support

Starchart

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages