Skip to content

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

Notifications You must be signed in to change notification settings

nlopt/awesome-explainable-reinforcement-learning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

Awesome Explainable Reinforcement Learning

arXiv Awesome Awesome XRL Document

A list of selected paper and possible corresponding codes in our review paper A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges.

If you find there is a missed paper or a possible mistake in our survey, please feel free to email me (qingyunpeng@zju.edu.cn) or pull a request here. I am more than glad to receive your advice. Thanks!

Citation

If you find this survey useful for your research, please consider citing

@article{qing2022XRLsurvey,
  title={A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges},
  author={Qing, Yunpeng and Liu, Shunyu and Song, Jie and Wang, Huiqiong and Song, Mingli},
  journal={arXiv preprint arXiv:2211.06665},
  year={2022}
}

🔥 News

  • 2023.11.01: We have updated our review paper with the latest revisions and incorporated newly published research from 2022 to 2023.

Table of Contents

✨ Overview

  • 📖 RL paradigm-based Explainable RL Taxonomy
  • 👓 Review of human knowledge-based RL explainability
  • 🚀 List of Current XRL research literatures and codes links

In this survey, we provide a comprehensive review of existing works on eXplainable Reinforcement Learning (XRL) and introduce a new taxonomy where prior works are clearly categorized into agent model-explaining, reward-explaining, state-explaining, and task-explaining methods. We also review and highlight RL methods that conversely leverage human knowledge to promote learning efficiency and performance of agents while this kind of method is often ignored in XRL field.

To know more about existing XRL framework and our taxonomy, the existing XRL papers within different typs are listed below and summerized in the next Figure. These literatures are categorize into our taxonomy. For each paper, we also include a link to its open-source code if available.

📝 Surveys

  • Explainable Reinforcement Learning: A Survey
    • E. Puiutta and E. Veith. CD-MAKE 2020. [paper]
  • A Survey on Interpretable Reinforcement Learning
    • C. Glanois, P. Weng, M. Zimmer, D. Li, T. Yang, J. Hao and W. Liu. arXiv 2021. [paper]
  • Explainable Reinforcement Learning for Broad-XAI: A Conceptual Framework and Survey
    • R. Dazeley, P. Vamplew and F.Cruz. arXiv 2021. [paper]
  • Explainable AI and Reinforcement Learning—A Systematic Review of Current Approaches and Trends
    • Lindsay Wells and Tomasz Bednarz. FRAI 2021. [paper]
  • Explainability in deep reinforcement learning
    • A. Heuillet, F. Couthouis and N. Díaz-Rodríguez. KBS 2021. [paper]
  • Explainable Deep Reinforcement Learning: State of the Art and Challenges
  • Explainable Reinforcement Learning: A Survey and Comparative Review
    • S. Milani, N. Topin, M .Veloso and F. Fang. CSUR 2023. [paper]

⬆ back to top

Explainability in RL

Agent Model-Explaining

Self-Explainable

  • MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning
    • S. Milani, Z. Zhang, N. Topin, Z. Shi, C. Kamhoua, E. Papalexakis and F. Fang. arXiv 2022. [paper] [code]
  • Learning to synthesize programs as interpretable and generalizable policies
    • D. Trivedi, J. Zhang, S. Sun and J. Lim. NeurIPS 2021. [paper] [code]
  • Symbolic Regression via Deep Reinforcement Learning Enhanced Genetic Programming Seeding
    • T. Mundhenk, M. Landajuela, R. Glatt, C. Santiago, D. Faissol and B. Petersen. NeurIPS 2021. [paper] [code]
  • Discovering symbolic policies with deep reinforcement learning
    • M. Landajuela, B. Petersen, S. Kim, C. Santiago, R. Glatt, T. Mundhenk, J. Pettit and D. Faissol. ICML 2021. [paper] [code]
  • Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods
    • N. Topin, S. Milani, F. Fang and M Veloso. AAAI 2021. [paper]
  • Incorporating relational background knowledge into reinforcement learning via differentiable inductive logic programming
  • Evolutionary learning of interpretable decision trees
  • Optimization methods for interpretable differentiable decision trees applied to reinforcement learning
    • A. Silva, M. Gombolay, T. Killian, I. Jimenez and S. Son. AISTATS 2020. [paper]
  • Neurosymbolic transformers for multi-agent communication
    • J. Inala, Y. Yang, J. Paulos, Y. Pu, O. Bastani, V. Kumar, M. Rinard and A. Solar-Lezama. NeurIPS 2020. [paper] [code]
  • Generating interpretable reinforcement learning policies using genetic programming
    • D. Hein, S. Udluft and T. Runkler. GECCO 2019. [paper]
  • Imitation-projected programmatic reinforcement learning
    • A. Verma, H. Le, Y. Yue and S. Chaudhuri. NeurIPS 2019. [paper] [code]
  • Towards Reinforcement Learning of Human Readable Policies
    • R. Akrour, D. Tateo and J. Peters. ECML-PKDD workshop 2019. [paper]
  • Neural Logic Reinforcement Learning
  • Inductive logic programming via differentiable deep neural logic networks
  • Conservative q-improvement: Reinforcement learning for an interpretable decision-tree policy
    • A. Roth, N. Topin, P. Jamshidi and M. Veloso. arXiv 2019. [paper] [code]
  • Generation of policy-level explanations for reinforcement learning
    • N. Topin and M. Veloso. AAAI 2019. [paper]
  • Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees
    • G. Liu, O. Schulte, W. Zhu and Q. Li. ECML-PKDD 2018. [paper] [code]
  • Programmatically Interpretable Reinforcement Learning
    • A. Verma, V. Murali, R. Singh, P. Kohli and S. Chaudhuri. ICML 2018. [paper] [code]
  • Interpretable policies for reinforcement learning by genetic programming
    • D. Hein, S. Udluft and T. Runkler. EAAI 2018. [paper]
  • Verifiable Reinforcement Learning via Policy Extraction
    • O. Bastani, Y. Pu and A. Solar-Lezama. NeurIPS 2018. [paper] [code]
  • Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies
    • D. Hein, A. Hentschel,T Runkler and S Udluft. EAAI 2017. [paper]
  • Policy Search in a Space of Simple Closed-form Formulas: Towards Interpretability of Reinforcement Learning
    • F. Maes, R. Fonteneau, L. Wehenkel and D. Ernst. DS 2012. [paper]
  • A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

Explanation Generating

  • Explainable Multi-Agent Reinforcement Learning for Temporal Queries
  • Explainable Reinforcement Learning via a Causal World Model
  • A CEGAR-Driven Training and Verification Framework for Safe Deep Reinforcement Learning
    • P. Jin, J. Tian, D. Zhi, X. Wen and M. Zhang. CAV 2022. [paper] [code]
  • Toward Policy Explanations for Multi-Agent Reinforcement Learning
  • Counterfactual state explanations for reinforcement learning agents via generative deep learning
    • M. Olson, R. Khanna, L. Neal, F. Li and W. Wong. AI 2021. [paper] [code]
  • Generating high-quality explanations for navigation in partially-revealed environments
  • Explainable Reinforcement Learning through a Causal Lens
    • P. Madumal, T. Miller, L. Sonenberg and F. Vetere. AAAI 2020. [paper] [code]
  • Neurosymbolic reinforcement learning with formally verified exploration
    • G. Anderson, A. Verma, I. Dillig and S. Chaudhuri. NeurIPS 2020. [paper] [code]
  • An inductive synthesis framework for verifiable reinforcement learning
    • H. Zhu, Z. Xiong, S. Magill and S. Jagannathan. PLDI 2019. [paper]
  • Verifying Deep-RL-Driven Systems
    • Y. Kazak, C. Barrett, G. Katz and M. Schapira SIGCOMM 2019 workshop. [paper]
  • Autonomous self-explanation of behavior for interactive reinforcement learning agents
    • Y. Fukuchi, M. Osawa, H. Yamakawa and M. Imai. HAI 2017. [paper]
  • Application of Instruction-Based Behavior Explanation to a Reinforcement Learning Agent with Changing Policy
    • Y. Fukuchi, M. Osawa, H. Yamakawa and M. Imai. ICONIP 2017. [paper]
  • Improving Robot Controller Transparency Through Autonomous Policy Explanation
    • B. Hayes and J. Shah. HRI 2017. [paper]

⬆ back to top

Reward Explaining

Reward Decomposition

  • Shapley counterfactual credits for multi-agent reinforcement learning
    • J. Li, K. Kuang, B. Wang, F. Liu, L. Chen, F. Wu and J. Xiao. SIGKDD 2021. [paper]
  • Shapley Q-value: A local reward approach to solve global reward games
  • Explainable reinforcement learning via reward decomposition
    • Z. Juozapaitis, A. Koul. A. Fern, M. Erwig and F. Doshi-Velez. IJCAI/ECAI workshop 2019. [paper]
  • Counterfactual multi-agent policy gradients
    • J. Foerster, G. Farquhar, T. Afouras, N. Nardelli and S. Whiteson. AAAI 2018. [paper] [code]

Reward Shaping

  • Creativity of AI: Automatic Symbolic Option Discovery for Facilitating Deep Reinforcement Learning
    • M. Jin, Z. Ma, K. Jin, H. Zhuo, C. Chen and C. Yu. AAAI 2022. [paper]
  • Dynamic Inverse Reinforcement Learning for Characterizing Animal Behavior
    • Z. Ashwood, A. Jha and JW. Pillow. NeurIPS 2022. [paper] [code]
  • Self-supervised attention-aware reinforcement learning
    • H. Wu, K. Khetarpal and D. Precup. AAAI 2021. [paper]
  • Ella: Exploration through learned language abstraction
    • S. Mirchandani, S. Karamcheti and D. Sadigh. NeurIPS 2021. [paper] [code]
  • Tree-structured policy based progressive reinforcement learning for temporally language grounding in video
  • Improving Human-Robot Interaction Through Explainable Reinforcement Learning
    • A. Tabrez and B. Hayes. HRI 2019. [paper]
  • SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning
    • D. Lyu, F. Yang, B. Liu and S. Gustafson. AAAI 2019. [paper]

⬆ back to top

State Explaining

History Trajectory

  • Towards interpretable deep reinforcement learning with human-friendly prototypes
  • Collective explainable AI: Explaining cooperative strategies and agent contribution in multiagent reinforcement learning with shapley values
    • A. Heuillet, F. Couthouis and N. Díaz-Rodríguez. CIM 2022 [paper] [code]
  • ProtoX: Explaining a Reinforcement Learning Agent via Prototyping
    • R. Ragodos, T. Wang, Q. Lin and X. Zhou. NeurIPS 2022 [paper] [code]
  • Explainable ai in deep reinforcement learning models for power system emergency control
    • K. Zhang, J. Zhang, P. Xu, T. Gao and D. Gao. TCSS 2021. [paper]
  • Edge: Explaining deep reinforcement learning policies
    • W. Guo, X. Wu, U. Khan and X. Xing. NeurIPS 2021. [paper] [code]
  • Interestingness elements for explainable reinforcement learning: Understanding agents' capabilities and limitations
  • Visual sparse Bayesian reinforcement learning: a framework for interpreting what an agent has learned
    • I. Mishra, G. Dao and M. Lee. SSCI 2018. [paper]
  • Robust bayesian inverse reinforcement learning with sparse behavior noise
    • J. Zheng, S. Liu and L. Ni. AAAI 2014. [paper]

Current Observation

  • Explaining reinforcement learning with shapley values
    • D. Beechey, TMS. Smith and Ö. Şimşek ICML 2023. [paper] [code]
  • Training characteristic functions with reinforcement learning: Xai-methods play connect four
    • S. Waldchen, F. Huber and S. Pokutta. ICML 2022. [paper]
  • Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning
    • D. Bertoin, A. Zouitine, M. Zouitine and E. Rachelson. NeurIPS 2022. [paper]
  • Inherently explainable reinforcement learning in natural language
    • X. Peng, M. Riedl and P. Ammanabrolu. NeurIPS 2022. [paper] [code]
  • Machine versus human attention in deep reinforcement learning tasks
    • S. Guo, R. Zhang, B. Liu, Y. Zhu, D. Ballard, M. Hayhoe and P. Stone. NeurIPS 2021. [paper]
  • The sensory neuron as a transformer: Permutation-invariant neural networks for reinforcement learning
  • Neuroevolution of self-interpretable agents
  • Deep reinforcement learning with stacked hierarchical attention for text-based games
    • Y. Xu, M. Fang, L. Chen, Y. Du, J. Zhou and C. Zhang. NeurIPS 2020. [paper] [code]
  • Xgail: Explainable generative adversarial imitation learning for explainable human decision analysis
    • M. Pan, W. Huang, Y. Li, X. Zhou and J. Luo. SIGKDD 2020. [paper] [code]
  • Towards better interpretability in deep q-networks
  • Alphastock: A buying-winners-and-selling-losers investment strategy using interpretable deep reinforcement attention networks
    • J. Wang, Y. Zhang, K. Tang, J. Wu and Z. Xiong. SIGKDD 2019. [paper]
  • Social attention for autonomous decision-making in dense traffic
  • DQNViz: A Visual Analytics Approach to Understand Deep Q-Networks
    • J. Wang, L. Gou, H. Shen and H. Yang. TVCG 2018. [paper]
  • Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees
    • G. Liu, O. Schulte, W. Zhu and Q. Li. ECML-PKDD 2018. [paper] [code]
  • Learn to interpret atari agents
    • Z. Yang, S. Bai, L. Zhang and P. Torr. arXiv 2018. [paper] [code]
  • Unsupervised video object segmentation for deep reinforcement learning
  • Visualizing and Understanding Atari Agents
    • S. Greydanus, A. Koul, J. Dodge and A. Fern. ICML 2018 [paper] [code]
  • Rise: Randomized input sampling for explanation of black-box models
  • Transparency and Explanation in Deep Reinforcement Learning Neural Networks
    • R. Iyer, Y. Li, H. Li, M. Lewis, R. Sundar, K. Sycara. AIES 2018. [paper] [code]

Future Prediction

  • What did you think would happen? explaining agent behaviour through intended outcomes
    • H. Yau, C. Russell and S. Hadfield. NeurIPS 2020. [paper] [code]
  • Weakly-supervised reinforcement learning for controllable behavior
    • L. Lee, B. Eysenbach, R. Salakhutdinov, S. Gu and C. Finn. NeurIPS 2020. [paper]
  • Semantic Predictive Control for Explainable and Efficient Policy Learning
    • X. Pan; X. Chen; Q. Cai; J. Canny and F. Yu. ICRA 2019. [paper]
  • Safe Reinforcement Learning With Model Uncertainty Estimates
    • B. Lütjens, M. Everett and J. How. ICRA 2019. [paper]
  • Contrastive explanations for reinforcement learning in terms of expected consequences
    • J. Waa, J. Diggelen, K. Bosch and M. Neerincx. arXiv 2018. [paper]

⬆ back to top

Task Explaining

Whole Top-down Structure

  • A Boolean task algebra for reinforcement learning
    • G. Nangue Tasse, S. James and B. Rosman. NeurIPS 2020. [paper] [code]
  • Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning
    • T. Shu, C. Xiong and R. Socher. ICLR 2018. [paper]

Simple Task Division

  • Multi-task reinforcement learning with context-based representations
  • Model primitives for hierarchical lifelong reinforcement learning
    • B. Wu, J. Gupta and M. Kochenderfer. AAMAS 2020. [paper] [code]
  • Language as an abstraction for hierarchical deep reinforcement learning
    • Y. Jiang, S. Gu, K. Murphy and C.Finn. NeurIPS 2019. [paper] [code]
  • SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning
    • D. Lyu, F. Yang, B. Liu and S. Gustafson. AAAI 2019. [paper]
  • Dot-to-dot: Explainable hierarchical reinforcement learning for robotic manipulation
    • B. Beyret, A. Shafti, A. Faisal. IROS 2019. [paper]

⬆ back to top

🤖️ Human knowledge for RL paradigm

Fuzzy Controller Representing Human Knowledge

  • Fuzzy Action-Masked Reinforcement Learning Behavior Planning for Highly Automated Drivin
    • T. Rudolf, M. Gao, T. Schürmann, S. Schwab and S. Hohmann. ICCAR 2022. [paper]
  • Efficient hierarchical policy network with fuzzy rules
    • W. Shi, Y. Feng, H. Huang, Z. Liu, J. Huang and G. Cheng. IJMLC 2022. [paper]
  • KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge
    • P. Zhang, J. Hao, W. Wang, H. Tang, Y. Ma, Y. Duan and Y. Zheng. arXiv 2020. [paper]

Dense Reward on Human Command

  • Using Natural Language for Reward Shaping in Reinforcement Learning

Learn Mattered Features from Human Interactions

  • Curricular Subgoals for Inverse Reinforcement Learning
    • S. Liu, Y. Qing, S. Xu, H. Wu, J. Zhang, J. Cong, T. Chen, Y. Liu and M. Song. arXiv. [paper] [code]
  • Local explanations for reinforcement learning
    • R. Luss, A. Dhurandhar, M. Liu. AAAI 2023. [paper]
  • Textual Explanations for Self-Driving Vehicles
    • J. Kim, A. Rohrbach, T. Darrell, J. Canny and Z. Akata. ECCV 2018. [paper] [code]

Subtask Scheduling with Human Annotation

  • Towards Effective and Interpretable Human-Agent Collaboration in MOBA Games: A Communication Perspective
    • Y. Gao, F. Liu, L. Wang, Z. Lian, W. Wang, S. Li, X. Wang, X. Zeng, R. Wang, J. Wang, Q. Fu, W. Yang, L. Huang and W. Liu. ICLR 2023. [paper] [code]
  • LISA: Learning interpretable skill abstractions from language
    • D. Garg, S. Vaidyanath, K. Kim, J. Song and S. Ermon. NeurIPS 2022. [paper]
  • Perceiving the world: Question-guided reinforcement learning for text-based games
    • Y. Xu, M. Fang, L. Chen, Y. Du, JT. Zhou and C. Zhang. ACL 2022. [paper] [code]
  • Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning

⬆ back to top

🏠 Explainable AI Library

As for completeness, we also list the library of explainable AI methods to tackle the balck box problem of AI methods. They can emhance the AI model with transparency and explainability.

Explainable AI library GitHub Stars
Aequitas Star
Alibi Explain Star
Captum Star
DeepVis Toolbox Star
ELI5 Star
InterpretML Star
IBM AI Explainability 360 Star
iModels Star
LIME Star
OmniXAI Star
SHAP Star

Show your support

Please ⭐️ this repository if this project helped you!

Starchart

Star History Chart

⬆ back to top

About

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published