Skip to content

Latest commit

 

History

History
264 lines (254 loc) · 36.8 KB

README.md

File metadata and controls

264 lines (254 loc) · 36.8 KB

Explainable Reinforcement Learning (XRL) Resources

This repository aims to keep an up-to-date list of research on explainable reinforcement learning (XRL). The repository supplements the survey paper found here. If you find this helpful, please give this repository a star, share it, and cite the survey paper.

Missing resources, issues, or questions? Please open an issue here, or feel free to email me.

Resources

  • Awesome Explainable Reinforcement Learning. Link

Survey Papers

#/Link Title Venue/Journal Year
1 A Survey of Global Explanations in Reinforcement Learning Explainable Agency in Artificial Intelligence 2024
2 Explainable reinforcement learning (XRL): a systematic literature review and taxonomy Mach. Learn. 2024
3 Explainability in Deep Reinforcement Learning, a Review into Current Methods and Applications ACM Comput. Surv. 2023
4 Explainable Reinforcement Learning: A Survey and Comparative Review ACM Comput. Surv. 2023
5 A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges CoRR 2023
6 Explainable reinforcement learning for broad-XAI: a conceptual framework and survey Neural Comput. Appl. 2023
7 Explainable Deep Reinforcement Learning: State of the Art and Challenges ACM Comput. Surv. 2022
8 A Survey on Interpretable Reinforcement Learning CoRR 2022
9 Explainability in reinforcement learning: perspective and position CoRR 2022
10 Explainable AI and Reinforcement Learning - A Systematic Review of Current Approaches and Trends Frontiers Artif. Intell. 2021
11 Explainability in deep reinforcement learning Knowl. Based Syst. 2021
12 Explainable Reinforcement Learning: A Survey CD-MAKE 2020
13 Reinforcement Learning Interpretation Methods: A Survey IEEE Access 2020

Papers

#/Link Title Venue/Journal Year
1 Explaining Reinforcement Learning Agents Through Counterfactual Action Outcomes AAAI 2024
2 Detection of Important States through an Iterative Q-value Algorithm for Explainable Reinforcement Learning HICSS 2024
3 Local Explanations for Reinforcement Learning AAAI 2023
4 Explainable Reinforcement Learning Based on Q-Value Decomposition by Expected State Transitions AAAI-MAKE 2023
5 GANterfactual-RL: Understanding Reinforcement Learning Agents' Strategies through Visual Counterfactual Explanations AAMAS 2023
6 Interpreting a deep reinforcement learning model with conceptual embedding and performance analysis Appl. Intell. 2023
7 Deep Explainable Relational Reinforcement Learning: A Neuro-Symbolic Approach ECML PKDD 2023
8 Inherently Interpretable Deep Reinforcement Learning Through Online Mimicking EXTRAAMAS 2023
9 Towards Interpretable Deep Reinforcement Learning with Human-Friendly Prototypes ICLR 2023
10 Explaining Reinforcement Learning with Shapley Values ICML 2023
11 Explaining Black Box Reinforcement Learning Agents Through Counterfactual Policies IDA 2023
12 Extracting Decision Tree From Trained Deep Reinforcement Learning in Traffic Signal Control IEEE Trans. Comput. Soc. Syst. 2023
13 Explainable Reinforcement Learning via a Causal World Model IJCAI 2023
14 Unveiling Concepts Learned by a World-Class Chess-Playing Agent IJCAI 2023
15 Learning state importance for preference-based reinforcement learning Mach. Learn. 2023
16 Interpretable and Explainable Logical Policies via Neurally Guided Symbolic Abstraction NeurIPS 2023
17 State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding NeurIPS 2023
18 StateMask: Explaining Deep Reinforcement Learning through State Mask NeurIPS 2023
19 Comparing explanations in RL Neural Comput. Appl. 2023
20 Explainable robotic systems: understanding goal-driven actions in a reinforcement learning scenario Neural Comput. Appl. 2023
21 Hierarchical goals contextualize local reward decomposition explanations Neural Comput. Appl. 2023
22 Achieving efficient interpretability of reinforcement learning via policy distillation and selective input gradient regularization Neural Networks 2023
23 IxDRL: A Novel Explainable Deep Reinforcement Learning Toolkit Based on Analyses of Interestingness xAI 2023
24 "I Don't Think So": Summarizing Policy Disagreements for Agent Comparison AAAI 2022
25 CAPS: Comprehensible Abstract Policy Summaries for Explaining Reinforcement Learning Agents AAMAS 2022
26 Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions AAMAS 2022
27 Lazy-MDPs: Towards Interpretable RL by Learning When to Act AAMAS 2022
28 Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems ACSOS 2022
29 Analysis of Explainable Goal-Driven Reinforcement Learning in a Continuous Simulated Environment Algorithms 2022
30 BEERL: Both Ends Explanations for Reinforcement Learning Applied Sciences 2022
31 Energy-Efficient Driving for Adaptive Traffic Signal Control Environment via Explainable Reinforcement Learning Applied Sciences 2022
32 Comparing Strategies for Visualizing the High-Dimensional Exploration Behavior of CPS Design Agents DESTION 2022
33 InAction: Interpretable Action Decision Making for Autonomous Driving ECCV 2022
34 Enhanced Oblique Decision Tree Enabled Policy Extraction for Deep Reinforcement Learning in Power System Emergency Control Electric Power Systems Research 2022
35 Attributation Analysis of Reinforcement Learning-Based Highway Driver Electronics 2022
36 Multi-objective Genetic Programming for Explainable Reinforcement Learning EuroGP 2022
37 Deep-Learning-based Fuzzy Symbolic Processing with Agents Capable of Knowledge Communication ICAART 2022
38 Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Inscrutable Representations ICLR 2022
39 POETREE: Interpretable Policy Learning with Adaptive Decision Trees ICLR 2022
40 Programmatic Reinforcement Learning without Oracles ICLR 2022
41 Explaining Reinforcement Learning Policies through Counterfactual Trajectories ICML Workshop on HILL 2022
42 Mean-variance Based Risk-sensitive Reinforcement Learning with Interpretable Attention ICMVA 2022
43 Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning ICPR 2022
44 Explaining Intelligent Agent's Future Motion on Basis of Vocabulary Learning With Human Goal Inference IEEE Access 2022
45 Interpretable Autonomous Flight Via Compact Visualizable Neural Circuit Policies IEEE Robotics Autom. Lett. 2022
46 Explainable AI in Deep Reinforcement Learning Models for Power System Emergency Control IEEE Trans. Comput. Soc. Syst. 2022
47 Hierarchical Program-Triggered Reinforcement Learning Agents for Automated Driving IEEE Trans. Intell. Transp. Syst. 2022
48 Interpretable End-to-End Urban Autonomous Driving With Latent Deep Reinforcement Learning IEEE Trans. Intell. Transp. Syst. 2022
49 Continuous Action Reinforcement Learning From a Mixture of Interpretable Experts IEEE Trans. Pattern Anal. Mach. Intell. 2022
50 Self-Supervised Discovering of Interpretable Features for Reinforcement Learning IEEE Trans. Pattern Anal. Mach. Intell. 2022
51 Temporal-Spatial Causal Interpretations for Vision-Based Reinforcement Learning IEEE Trans. Pattern Anal. Mach. Intell. 2022
52 Visual Analytics for RNN-Based Deep Reinforcement Learning IEEE Trans. Vis. Comput. Graph. 2022
53 Toward Interpretable-AI Policies Using Evolutionary Nonlinear Decision Trees for Discrete-Action Systems IEEE Transactions on Cybernetics 2022
54 Understanding via Exploration: Discovery of Interpretable Features With Deep Reinforcement Learning IEEE Transactions on Neural Networks and Learning Systems 2022
55 Summarising and Comparing Agent Dynamics with Contrastive Spatiotemporal Abstraction IJCAI Workshop on XAI 2022
56 ACMViz: a visual analytics approach to understand DRL-based autonomous control model J. Vis. 2022
57 Incorporating Explanations to Balance the Exploration and Exploitation of Deep Reinforcement Learning KSEM 2022
58 Towards Explainable Reinforcement Learning Using Scoring Mechanism Augmented Agents KSEM 2022
59 Explainable Reinforcement Learning via Model Transforms NeurIPS 2022
60 GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis NeurIPS 2022
61 Inherently Explainable Reinforcement Learning in Natural Language NeurIPS 2022
62 Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning NeurIPS 2022
63 ProtoX: Explaining a Reinforcement Learning Agent via Prototyping NeurIPS 2022
64 Mo"ET: Mixture of Expert Trees and its application to verifiable reinforcement learning Neural Networks 2022
65 Analysing deep reinforcement learning agents trained with domain randomisation Neurocomputing 2022
66 Why? Why not? When? Visual Explanations of Agent Behaviour in Reinforcement Learning PacificVis 2022
67 Driving behavior explanation with multi-level fusion Pattern Recognit. 2022
68 Acquisition of chess knowledge in AlphaZero Proc. Natl. Acad. Sci. U.S.A. 2022
69 Learning Interpretable, High-Performing Policies for Autonomous Driving Robotics: Science and Systems 2022
70 Event-driven temporal models for explanations - ETeMoX: explaining reinforcement learning Softw. Syst. Model. 2022
71 Toward a Psychology of Deep Reinforcement Learning Agents Using a Cognitive Architecture Top. Cogn. Sci. 2022
72 DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning AAAI 2021
73 Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods AAAI 2021
74 TripleTree: A Versatile Interpretable Representation of Black Box Agents and their Environments AAAI 2021
75 Explaining Deep Reinforcement Learning Agents in the Atari Domain through a Surrogate Model AIIDE 2021
76 A framework of explanation generation toward reliable autonomous robots Adv. Robotics 2021
77 Explainable Deep Reinforcement Learning for UAV autonomous path planning Aerospace Science and Technology 2021
78 Explaining robot policies Applied AI Letters 2021
79 Counterfactual state explanations for reinforcement learning agents via generative deep learning Artif. Intell. 2021
80 Local and global explanations of agent behavior: Integrating strategy summaries with saliency maps Artif. Intell. 2021
81 XPM: An Explainable Deep Reinforcement Learning Framework for Portfolio Management CIKM 2021
82 Interactive Explanations: Diagnosis and Repair of Reinforcement Learning Based Agent Behaviors CoG 2021
83 CDT: Cascading Decision Trees for Explainable Reinforcement Learning CoRR 2021
84 Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents CoRR 2021
85 Approximating a deep reinforcement learning docking agent using linear model trees ECC 2021
86 Robotic Lever Manipulation using Hindsight Experience Replay and Shapley Additive Explanations ECC 2021
87 Off-Policy Differentiable Logic Reinforcement Learning ECML PKDD 2021
88 Neuro-Symbolic Reinforcement Learning with First-Order Logic EMNLP 2021
89 Explainable Reinforcement Learning for Longitudinal Control ICAART 2021
90 Explainable deep reinforcement learning for portfolio management: an empirical approach ICAIF 2021
91 Explainable Reinforcement Learning for Human-Robot Collaboration ICAR 2021
92 DRIVE: Deep Reinforced Accident Anticipation with Visual Explanation ICCV 2021
93 Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions ICLR 2021
94 Explaining by Imitating: Understanding Decisions by Interpretable Policy Learning ICLR 2021
95 Learning "What-if" Explanations for Sequential Decision-Making ICLR 2021
96 Discovering symbolic policies with deep reinforcement learning ICML 2021
97 Re-understanding Finite-State Representations of Recurrent Policy Networks ICML 2021
98 Explainable Reinforcement Learning with the Tsetlin Machine IEA/AIE 2021
99 A Blood Glucose Control Framework Based on Reinforcement Learning With Safety and Interpretability: In Silico Validation IEEE Access 2021
100 Symbolic Regression Methods for Reinforcement Learning IEEE Access 2021
101 Efficient Robotic Object Search Via HIEM: Hierarchical Policy Learning With Intrinsic-Extrinsic Modeling IEEE Robotics Autom. Lett. 2021
102 Learning to Discover Task-Relevant Features for Interpretable Reinforcement Learning IEEE Robotics Autom. Lett. 2021
103 Explaining Deep Learning Models Through Rule-Based Approximation and Visualization IEEE Trans. Fuzzy Syst. 2021
104 Interpretable Decision-Making for Autonomous Vehicles at Highway On-Ramps With Latent Space Reinforcement Learning IEEE Trans. Veh. Technol. 2021
105 Explainable AI methods on a deep reinforcement learning agent for automatic docking IFAC-PapersOnLine 2021
106 Visual Explanation using Attention Mechanism in Actor-Critic-based Deep Reinforcement Learning IJCNN 2021
107 Programmatic Policy Extraction by Iterative Local Search ILP 2021
108 Explaining the Decisions of Deep Policy Networks for Robotic Manipulations IROS 2021
109 XAI-N: Sensor-based Robot Navigation using Expert Policies and Decision Trees IROS 2021
110 Mixed Autonomous Supervision in Traffic Signal Control ITSC 2021
111 Can You Trust Your Autonomous Car? Interpretable and Verifiably Safe Reinforcement Learning IV 2021
112 Explaining a Deep Reinforcement Learning Docking Agent Using Linear Model Trees with User Adapted Visualization Journal of Marine Science and Engineering 2021
113 Visual Analysis of Deep Q-network KSII Trans. Internet Inf. Syst. 2021
114 Automatic discovery of interpretable planning strategies Mach. Learn. 2021
115 EDGE: Explaining Deep Reinforcement Learning Policies NeurIPS 2021
116 Learning Tree Interpretation from Object Representation for Deep Reinforcement Learning NeurIPS 2021
117 Learning to Synthesize Programs as Interpretable and Generalizable Policies NeurIPS 2021
118 Machine versus Human Attention in Deep Reinforcement Learning Tasks NeurIPS 2021
119 Explainable Artificial Intelligence (XAI) for Increasing User Trust in Deep Reinforcement Learning Driven Autonomous Systems NeurIPS Workshop on Deep RL 2021
120 Identifying Decision Points for Safe and Interpretable Reinforcement Learning in Hypotension Treatment NeurIPS Workshop on Machine Learning for Health 2021
121 Feature-Based Interpretable Reinforcement Learning based on State-Transition Models SMC 2021
122 A co-evolutionary approach to interpretable reinforcement learning in environments with continuous action spaces SSCI 2021
123 Interpretable AI Agent Through Nonlinear Decision Trees for Lane Change Problem SSCI 2021
124 Learning Sparse Evidence- Driven Interpretation to Understand Deep Reinforcement Learning Agents SSCI 2021
125 Explainable Reinforcement Learning through a Causal Lens AAAI 2020
126 Attribution-based Salience Method towards Interpretable Reinforcement Learning AAAI-MAKE 2020
127 Learning an Interpretable Traffic Signal Control Policy AAMAS 2020
128 Optimization Methods for Interpretable Differentiable Decision Trees Applied to Reinforcement Learning AISTATS 2020
129 Interestingness elements for explainable reinforcement learning: Understanding agents' capabilities and limitations Artif. Intell. 2020
130 Model primitives for hierarchical lifelong reinforcement learning Auton. Agents Multi Agent Syst. 2020
131 Understanding the Behavior of Reinforcement Learning Agents BIOMA 2020
132 Methodology for Interpretable Reinforcement Learning Model for HVAC Energy Control Big Data 2020
133 Explaining Autonomous Driving by Learning End-to-End Visual Attention CVPRW 2020
134 Understanding Learned Reward Functions CoRR 2020
135 Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis Complex & Intelligent Systems 2020
136 DRLViz: Understanding Decisions and Memory in Deep Reinforcement Learning Comput. Graph. Forum 2020
137 Understanding RL Vision Distill 2020
138 Interpretable policies for reinforcement learning by empirical fuzzy sets Eng. Appl. Artif. Intell. 2020
139 Neuroevolution of self-interpretable agents GECCO 2020
140 Topological Visualization Method for Understanding the Landscape of Value Functions and Structure of the State Space in Reinforcement Learning ICAART 2020
141 Identifying Critical States by the Action-Based Variance of Expected Return ICANN 2020
142 TLdR: Policy Summarization for Factored SSP Problems Using Temporal Abstractions ICAPS 2020
143 Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature Attribution ICLR 2020
144 Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning ICLR 2020
145 Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents ICLR 2020
146 Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions ICML 2020
147 Deep Reinforcement Learning for Safe Local Planning of a Ground Vehicle in Unknown Rough Terrain IEEE Robotics Autom. Lett. 2020
148 Towards Interpretable Reinforcement Learning with State Abstraction Driven by External Knowledge IEICE Trans. Inf. Syst. 2020
149 Improved Policy Extraction via Online Q-Value Distillation IJCNN 2020
150 Visualization of topographical internal representation of learning robots IJCNN 2020
151 Explainable navigation system using fuzzy reinforcement learning IJIDeM 2020
152 Explainability of Intelligent Transportation Systems using Knowledge Compilation: a Traffic Light Controller Case ITSC 2020
153 xGAIL: Explainable Generative Adversarial Imitation Learning for Explainable Human Decision Analysis KDD 2020
154 What Did You Think Would Happen? Explaining Agent Behaviour through Intended Outcomes NeurIPS 2020
155 Explaining Conditions for Reinforcement Learning Behaviors from Real and Imagined Data NeurIPS Workshop on Challenges of Real-World RL 2020
156 DynamicsExplorer: Visual Analytics for Robot Control Tasks involving Dynamics and LSTM-based Control Policies PacificVis 2020
157 Combining reinforcement learning with rule-based controllers for transparent and general decision-making in autonomous driving Robotics Auton. Syst. 2020
158 Modelling Agent Policies with Interpretable Imitation Learning TAILOR 2020
159 Interpretable, Verifiable, and Robust Reinforcement Learning via Program Synthesis xxAI - Beyond Explainable AI 2020
160 Generation of Policy-Level Explanations for Reinforcement Learning AAAI 2019
161 SDRL: Interpretable and Data-Efficient Deep Reinforcement Learning Leveraging Symbolic Planning AAAI 2019
162 Towards Better Interpretability in Deep Q-Networks AAAI 2019
163 Toward Robust Policy Summarization AAMAS 2019
164 Towards Governing Agent's Efficacy: Action-Conditional \textdollar(\beta)\textdollar-VAE for Deep Transparent Reinforcement Learning ACML 2019
165 Memory-Based Explainable Reinforcement Learning AI 2019
166 Summarizing agent strategies Auton. Agents Multi Agent Syst. 2019
167 Enabling robots to communicate their objectives Auton. Robots 2019
168 Visualization of Deep Reinforcement Learning using Grad-CAM: How AI Plays Atari Games? CoG 2019
169 Explaining Reward Functions in Markov Decision Processes FLAIRS 2019
170 Explanation-Based Reward Coaching to Improve Human Performance via Reinforcement Learning HRI 2019
171 Free-Lunch Saliency via Attention in Atari Agents ICCVW 2019
172 Deep reinforcement learning with relational inductive biases ICLR 2019
173 Learning Finite State Representations of Recurrent Policy Networks ICLR 2019
174 Neural Logic Reinforcement Learning ICML 2019
175 Interpretable Approximation of a Deep Reinforcement Learning Agent as a Set of If-Then Rules ICMLA 2019
176 Semantic Predictive Control for Explainable and Efficient Policy Learning ICRA 2019
177 DQNViz: A Visual Analytics Approach to Understand Deep Q-Networks IEEE Trans. Vis. Comput. Graph. 2019
178 Visualizing Deep Q-Learning to Understanding Behavior of Swarm Robotic System IES 2019
179 Exploring Computational User Models for Agent Policy Summarization IJCA 2019
180 Explaining Reinforcement Learning to Mere Mortals: An Empirical Study IJCAI 2019
181 Counterfactual States for Atari Agents via Generative Deep Learning IJCAI Workshop on XAI 2019
182 Distilling Deep Reinforcement Learning Policies in Soft Decision Trees IJCAI Workshop on XAI 2019
183 Dot-to-Dot: Explainable Hierarchical Reinforcement Learning for Robotic Manipulation IROS 2019
184 Reinforcement Learning with Explainability for Traffic Signal Control ITSC 2019
185 Interestingness Elements for Explainable Reinforcement Learning through Introspection IUI Workshops 2019
186 Explainable Reinforcement Learning via Reward Decomposition JCAI Workshop on XAI 2019
187 Enhancing Explainability of Deep Reinforcement Learning Through Selective Layer-Wise Relevance Propagation KI 2019
188 Imitation-Projected Programmatic Reinforcement Learning NeurIPS 2019
189 Towards Interpretable Reinforcement Learning Using Attention Augmented Agents NeurIPS 2019
190 Verbal Explanations for Deep Reinforcement Learning Neural Networks with Attention on Extracted Features RO-MAN 2019
191 A formal methods approach to interpretable reinforcement learning for robotic planning Sci. Robotics 2019
192 HIGHLIGHTS: Summarizing Agent Behavior to People AAMAS 2018
193 Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations AIES 2018
194 Transparency and Explanation in Deep Reinforcement Learning Neural Networks AIES 2018
195 Visual Rationalizations in Deep Reinforcement Learning for Atari Games BNAIC 2018
196 Textual Explanations for Self-Driving Vehicles ECCV 2018
197 Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees ECML PKDD 2018
198 Interpretable policies for reinforcement learning by genetic programming Eng. Appl. Artif. Intell. 2018
199 Generating interpretable fuzzy controllers using particle swarm optimization and genetic programming GECCO 2018
200 Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning ICLR 2018
201 Programmatically Interpretable Reinforcement Learning ICML 2018
202 Visualizing and Understanding Atari Agents ICML 2018
203 Deep Reinforcement Learning Monitor for Snapshot Recording ICMLA 2018
204 Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences IJCAI Workshop on XAI 2018
205 Explaining Deep Adaptive Programs via Reward Decomposition IJCAI/ECAI Workshop XAI 2018
206 Establishing Appropriate Trust via Critical States IROS 2018
207 Unsupervised Video Object Segmentation for Deep Reinforcement Learning NeurIPS 2018
208 Verifiable Reinforcement Learning via Policy Extraction NeurIPS 2018
209 Visual Sparse Bayesian Reinforcement Learning: A Framework for Interpreting What an Agent Has Learned SSCI 2018
210 Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies Eng. Appl. Artif. Intell. 2017
211 Autonomous Self-Explanation of Behavior for Interactive Reinforcement Learning Agents HAI 2017
212 Improving Robot Controller Transparency Through Autonomous Policy Explanation HRI 2017
213 Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention ICCV 2017
214 Application of Instruction-Based Behavior Explanation to a Reinforcement Learning Agent with Changing Policy ICONIP 2017
215 Graying the black box: Understanding DQNs ICML 2016

Citation

@article{Bekkemoen24,
author       = {Yanzhe Bekkemoen},
title        = {Explainable reinforcement learning {(XRL):} a systematic literature
review and taxonomy},
journal      = {Mach. Learn.},
volume       = {113},
number       = {1},
pages        = {355--441},
year         = {2024},
url          = {https://doi.org/10.1007/s10994-023-06479-7},
doi          = {10.1007/S10994-023-06479-7},
}