Redefining Counterfactual Explanations for Reinforcement Learning Overview Challenges and Opportunities

2025-04-29 0 0 1.31MB 32 页 10玖币
侵权投诉
Redefining Counterfactual Explanations for Reinforcement Learning: Overview,
Challenges and Opportunities
JASMINA GAJCIN and IVANA DUSPARIC,Trinity College Dublin, Ireland
While AI algorithms have shown remarkable success in various elds, their lack of transparency hinders their application to real-life
tasks. Although explanations targeted at non-experts are necessary for user trust and human-AI collaboration, the majority of
explanation methods for AI are focused on developers and expert users. Counterfactual explanations are local explanations that oer
users advice on what can be changed in the input for the output of the black-box model to change. Counterfactuals are user-friendly
and provide actionable advice for achieving the desired output from the AI system. While extensively researched in supervised learning,
there are few methods applying them to reinforcement learning (RL). In this work, we explore the reasons for the underrepresentation
of a powerful explanation method in RL. We start by reviewing the current work in counterfactual explanations in supervised learning.
Additionally, we explore the dierences between counterfactual explanations in supervised learning and RL and identify the main
challenges that prevent the adoption of methods from supervised in reinforcement learning. Finally, we redene counterfactuals for
RL and propose research directions for implementing counterfactuals in RL.
CCS Concepts: Computing methodologies Reinforcement learning.
Additional Key Words and Phrases: Reinforcement Learning, Explainability, Interpretability, Counterfactual Explanations
ACM Reference Format:
Jasmina Gajcin and Ivana Dusparic. 2023. Redening Counterfactual Explanations for Reinforcement Learning: Overview, Challenges
and Opportunities. 1, 1 (February 2023), 32 pages. https://doi.org/XXXXXXX.XXXXXXX
1 INTRODUCTION
Articial intelligence (AI) solutions have become pervasive in various elds in the last decades, thanks in part to the
adoption of deep learning algorithms. In particular, deep learning has shown remarkable success in supervised learning
tasks, where the goal is to learn patterns in a labeled training data set and use them to accurately predict labels on
unseen data [
71
]. Deep learning algorithms rely on neural networks, which allow for ecient processing of large
amounts of unstructured data. However, they also rely on a large number of parameters, making their decision-making
process dicult to understand. These models are often referred to as black-box due to the lack of transparency in their
inner workings.
Reinforcement learning (RL) [
85
] is a sub-eld of AI that focuses on developing intelligent agents for sequential
decision-making tasks. RL employs a trial-and-error learning approach in which an agent learns a task from scratch
through interactions with its environment. An agent can observe the environment, perform actions that alter its
state and receive rewards from the environment, which guide it towards an optimal behavior. The goal of RL is to
obtain an optimal policy
𝜋
, which maps the agent’s states to optimal actions. This bears some similarity to supervised
learning approaches, where the goal is to classify an instance into the correct class according to the input features.
Authors’ address: Jasmina Gajcin, gajcinj@tcd.ie; Ivana Dusparic, ivana.dusparic@tcd.ie, Trinity College Dublin, College Green, Dublin, , Ireland, D02.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not
made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specic permission and/or a fee. Request permissions from permissions@acm.org.
©2023 Association for Computing Machinery.
Manuscript submitted to ACM
Manuscript submitted to ACM 1
arXiv:2210.11846v2 [cs.AI] 9 Feb 2024
2 Gajcin and Dusparic
Fig. 1. A summary of goals of XAI depending on the target audience: while the focus of developers is to beer understand the abilities
of the system and enable successful deployment, experts using the system require explanations to beer collaborate with the system.
Explanations are necessary for non-expert users to develop trust, ensure system decisions are fair, and give users actionable feedback
on how to elicit a dierent decision from the system.
However, while supervised learning algorithms rely on labeled training instances to learn patterns in the data, RL agents
approach the task without prior knowledge and learn it through interactions with the environment. Deep reinforcement
learning (DRL) algorithms [
5
], which employ a neural network to represent an agent’s policy, are currently the most
popular approach for learning RL policies [
5
]. DRL algorithms have shown remarkable success in navigating sequential
decision-making problems in games, autonomous driving, healthcare, and robotics [
17
,
46
,
48
,
64
]. Although they can
process large amounts of unstructured, high-dimensional data, they also struggle to explain agent’s decisions, due to
their reliance on neural networks.
Depending on the task and the user, AI systems require explainability for a variety of reasons (Figure 1). From the
perspective of the developer, explainability is necessary to verify the system’s behavior before deployment. Understand-
ing how the input features inuence the decision of the AI system is necessary to avoid deployment of models that rely
on spurious correlations [
8
,
27
] and to ensure robustness to adversarial attacks [
86
]. From the perspective of fairness,
understanding the decision-making process of an AI system is necessary to prevent automated discrimination. Lack of
transparency of AI models can cause them to inadvertently adopt historical biases ingrained in training data and use
them in their decision logic [
39
]. To prevent discrimination, users of autonomous decision-making systems are now
Manuscript submitted to ACM
Redening Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities 3
legally entitled to an explanation under the regulation within the GDPR in EU [
31
]. From the perspective of expert and
non-expert users of the system, explainability is necessary to ensure trust. For experts that use AI systems as an aid in
their everyday tasks, trust is a crucial component necessary for successful collaboration. For example, a medical doctor
using an AI system for diagnostics needs to understand it to trust its decisions and use them for this high-risk task [
49
].
Similarly, for non-expert users, trust is needed to encourage interaction with the system. If an AI system is used to
make potentially life-altering decisions for the user, they need to understand how the system operates to maintain their
condence and trust in the system.
The eld of explainable AI (XAI) explores methods for interpreting decisions of black-box systems in various elds
such as machine learning, reinforcement learning, explainable planning [
2
,
16
,
26
,
32
,
50
,
60
,
74
,
76
,
78
,
83
,
90
,
91
,
97
]. In
recent years, the focus of XAI has mostly been on explaining decisions of supervised learning models [
10
]. Specically,
the majority of XAI methods have focused on explaining the decisions of neural networks, due to the emergence of deep
learning as the state-of-the-art approach to many supervised learning tasks [
1
,
13
,
99
]. In contrast, explainable RL (XRL)
is a fairly novel eld that has not yet received an equal amount of attention. Most often, existing XRL methods focus on
explaining DRL algorithms, which rely on neural networks to represent the agent’s policy, due to their prevalence and
success [
95
]. However, as RL algorithms are becoming more prominent and are being considered for use in real-life tasks,
there is a need for understanding their decisions [
24
,
73
]. For example, RL algorithms are being developed for dierent
tasks in healthcare, such as dynamic treatment design [
53
,
59
,
101
]. Without rigorous verication and understanding
of such systems, the medical experts will be reluctant to collaborate and rely on them [
75
]. Similarly, RL algorithms
have been explored for enabling autonomous driving [
4
]. To understand and prevent mistakes such as the 2017 Uber
accident [
47
] where a self-driving car failed to stop before a pedestrian, the underlying decision-making systems have
to be scrutable. Specic to the RL framework, explainability is also necessary to correct and prevent “reward hacking” –
a phenomenon where an RL agent learns to trick a potentially misspecied reward function, such as a vacuum cleaner
ejecting collected dust to increase its cleaning time [3,68].
In this work, we explore counterfactual explanations in supervised and reinforcement learning. Counterfactual
explanations answer the question: “Given that the black-box model made decision y for input x, how can x be changed
for the model to output alternative decision y’?” [
93
]. Counterfactual explanations oer actionable advice to users of
black-box systems by generating counterfactual instances – instances as similar as possible to the original instance
being explained but producing a desired outcome. If the user is not satised with the decision of a black-box system,
a counterfactual explanation oers them a recipe for altering their input features, to obtain a dierent output. For
example, if a user is denied a loan by an AI system, they might be interested to know how they can change their
application so that it gets accepted in the future. Counterfactual explanations are targeted at non-expert users, as they
often deal in high-level terms, and oer actionable advice to the user. They are also selective, aiming to change as few
features as possible to achieve the desired output. As explanations that can suggest potentially life-altering actions
to the users, counterfactuals carry great responsibility. A useful counterfactual explanation can help user achieve a
desired outcome, and increase their trust and condence in the system. However, an ill-dened counterfactual that
proposes unrealistic changes to the input features or does not deliver the desired outcome can waste user’s time and
eort and erode their trust in the system. For this reason, careful selection of counterfactual explanations is essential
for maintaining user trust and encouraging their collaboration with the system.
Although they have been explored in supervised learning [
14
,
19
,
40
,
58
,
72
,
96
], counterfactual explanations are
rarely applied to RL tasks [
67
]. In supervised learning, methods for generating counterfactual explanations often follow
a similar pattern. Firstly, a loss function is dened, taking into account dierent properties of counterfactual instances,
Manuscript submitted to ACM
4 Gajcin and Dusparic
such as the prediction of the desired class or similarity to the original instance. The loss function is then optimized over
the training data to nd the most suitable counterfactual instance. While the exact design of the loss function and the
optimization algorithm vary between approaches, the high-level approach often takes the same form.
In this work, we challenge the denition of counterfactuals inherited from supervised learning for explaining RL
agents. We examine the similarities and dierences between the supervised and RL from the perspective of counterfactual
explanations and argue that the same denition of counterfactual explanations cannot be directly translated from
supervised to RL. Even though the two learning paradigms share similarities, in this work we demonstrate that
the sequential nature of RL tasks, as well as the agent’s goals, plans, and motivations, make these two approaches
substantially dierent from the perspective of counterfactual explanations. We start by reviewing the existing state-
of-the-art methods for generating counterfactual explanations in supervised learning. Furthermore, we identify the
main dierences between supervised and reinforcement learning from the perspective of counterfactual explanations
and redene them for RL use. Finally, we identify research questions that need to be answered before counterfactual
explanation methods can be applied to RL and propose potential solutions.
Previous surveys of XRL recognize counterfactual explanations as an important method but do not oer an in-depth
review of methods for generating this type of explanation [
36
,
73
,
98
]. Previous surveys of counterfactual explanations,
however, focus only on methods for explaining supervised learning models and oer a theoretical background and
review of state-of-the-art approaches [
81
,
82
,
92
]. Similarly, Guidotti
[33]
review counterfactuals for supervised learning
and oer a demonstration and comparison of dierent approaches. On the other hand, in this work, we focus specically
on counterfactual explanations from the perspective of RL. Additionally, while previous work has explored dierences
between supervised and RL for causal explanations [
20
], we utilize this to redene counterfactual explanations for RL
use, as well as explore challenges of applying supervised learning methods for generating counterfactual explanations
directly in RL.
The rest of the work is organized as follows. Section 2provides a taxonomy and a short overview of methods for
explaining the behavior of RL agents. In Section 3we identify the key similarities and dierences between supervised
and reinforcement learning from the perspective of explainability. Properties of counterfactual explanations are explored
in Section 4. Furthermore, Section 5oers a review of the state-of-the-art methods for generating counterfactual expla-
nations in supervised and reinforcement learning. Finally, Section 6focuses on redening counterfactual explanations
for RL and identifying challenges and open questions in this eld.
2 EXPLAINABLE REINFORCEMENT LEARNING (XRL)
Recent years have seen a rise in the development of explainable methods for RL tasks [
73
]. In this section, we provide
an overview of RL framework, XRL taxonomy and oer a condensed review of current state-of-the-art XRL methods.
2.1 Reinforcement Learning (RL)
Reinforcement learning (RL) [
85
] is a trial-and-error learning paradigm for navigating sequential decision-making tasks.
RL problems are usually presented in the form of Markov Decision Process (MDP)
𝑀=(𝑆, 𝐴, 𝑃, 𝑅, 𝛾)
, where
𝑆
is a set
of states an agent can observe and
𝐴
a set of actions it can perform. Transition function
𝑃
:
𝑆×𝐴𝑆
denes how
actions change the current state and produce a new state. Actions can elicit rewards 𝑟𝑅, whose purpose is to guide
the agent toward the desired behavior. Parameter
𝛾
is the discount factor. Agent’s performance, starting from time step
𝑡, can be calculated as the return:
Manuscript submitted to ACM
Redening Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities 5
Table 1. Categorization of methods for explainable RL based on the XRL taxonomy (Section 2.2). Methods are classified based on
their scope, reliance on the black-box model, time of explanation and intended audience.
Method Subcategory Scope Model-reliance Explanation time Audience Publications
Global surrogates Decision trees Global Model-specic Post-hoc Developers/Experts Coppens et al. [16], Liu et al. [57]
Custom model Global Model-specic Post-hoc Developers/Experts Verma [90], Verma et al. [91]
Saliency maps Gradient-based Local Model-specic Post-hoc Developers/Experts Wang et al. [97]
Perturbation-based Local Model-agnostic Post-hoc Developers/Experts Greydanus et al. [32], Puri et al. [74]
Summaries Global Model-agnostic Post-hoc All Amir and Amir [2], Sequeira and Gervasio [78]
Contrastive Expl. Contrasting outcomes Local Model-agnostic Post-hoc All Madumal et al. [61], van der Waa et al. [89]
Contrasting objectives Local/Global Model-agnostic Post-hoc All Gajcin et al. [28], Juozapaitis et al. [41], Sukkerd et al. [84]
Hierarchical RL Global Model-specic Interpretable Developers/Experts Beyret et al. [9]
Visualization t-SNE Global Model-agnostic Post-hoc Developers/Experts Mnih et al. [64], Zrihem et al. [102]
Other
Abstract graphs Global Model-specic Interpretable Developers/Experts Topin and Veloso [87]
Symbolic policies Global Model-specic Interpretable Developers/Experts Landajuela et al. [51]
Causal confusion Global Model-agnostic Post-hoc Developers Gajcin and Dusparic [27]
𝐺𝑡=
𝑇
𝑖=𝑡+1
𝛾𝑖𝑡1·𝑟𝑖(1)
where 𝛾balances between the importance of short and long-term rewards.
The goal of the RL agent is to learn a policy
𝜋
:
𝑆𝐴
, which maps the agent’s states to optimal actions. One of
the most notable approaches for learning the optimal policy is deep reinforcement learning (DRL). DRL is a eld at
the intersection of deep learning and RL. DRL algorithms use neural networks to represent an agent’s policy. This
allows easier processing of unstructured observations, such as images, and overcomes scalability limitations of tabular
methods such as Q-learning. DRL algorithms have been successfully applied to various domains, such as games [
64
,
94
],
autonomous driving [
46
] and robotics [
48
,
70
]. Despite their success, DRL algorithms cannot be safely deployed to
real-life tasks before their behavior is fully veried and understood. However, since these methods rely on neural
networks to learn the optimal policy, the agent’s behavior is dicult to explain and the reasoning behind its decisions
is not transparent.
Additionally, there are a number of extensions to the RL framework. In multi-goal RL, for example, an agent operates
in an environment with more than one goal and must navigate them successfully to nish the task. A simple taxi
environment in which an agent needs to pick up and drop o passengers is an example of a multi-goal problem, with
two subtasks. Similarly, multi-objective RL applies to tasks where the agent needs to balance between dierent, often
conicting desiderata. In autonomous driving, for example, an agent needs to consider dierent objectives such as
speed, safety, and user comfort.
2.2 XRL Taxonomy
According to a previous survey of XRL by Puiutta and Veith
[73]
the taxonomy of XRL methods largely overlaps
with the taxonomy of XAI approaches, and classies methods based on the reliance of the black-box model, time of
interpretation, scope and target audience.
Depending on their reliance on the black-box model, XRL methods can be model-agnostic or model-specic. Model-
specic explanations are developed for a particular type of black-box algorithm, while model-agnostic approaches can
explain any method.
Manuscript submitted to ACM
摘要:

RedefiningCounterfactualExplanationsforReinforcementLearning:Overview,ChallengesandOpportunitiesJASMINAGAJCINandIVANADUSPARIC,TrinityCollegeDublin,IrelandWhileAIalgorithmshaveshownremarkablesuccessinvariousfields,theirlackoftransparencyhinderstheirapplicationtoreal-lifetasks.Althoughexplanationstarg...

展开>> 收起<<
Redefining Counterfactual Explanations for Reinforcement Learning Overview Challenges and Opportunities.pdf

共32页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:32 页 大小:1.31MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 32
客服
关注