Experiential Explanations for Reinforcement Learning Amal Alabdulkarim Madhuri Singh Gennie Mansi Kaely Hall

2025-04-27 0 0 2.69MB 34 页 10玖币

侵权投诉

Experiential Explanations for Reinforcement

Learning

Amal Alabdulkarim, Madhuri Singh, Gennie Mansi, Kaely Hall,

Mark O. Riedl

School of Interactive Computing, Georgia Institute of Technology.

*Corresponding author(s). E-mail(s): amal@gatech.edu;

Contributing authors: msingh365@gatech.edu;gennie.mansi@gatech.edu;

khall33@gatech.edu;riedl@gatech.edu;

Abstract

Reinforcement Learning (RL) systems can be complex and non-interpretable,

making it challenging for non-AI experts to understand or intervene in their

decisions. This is due in part to the sequential nature of RL in which actions

are chosen because of future rewards. However, RL agents discard the qualita-

tive features of their training, making it diﬃcult to recover user-understandable

information for “why” an action is chosen. We propose a technique Experien-

tial Explanations to generate counterfactual explanations by training inﬂuence

predictors along with the RL policy. Inﬂuence predictors are models that learn

how sources of reward aﬀect the agent in diﬀerent states, thus restoring informa-

tion about how the policy reﬂects the environment. A human evaluation study

revealed that participants presented with experiential explanations were bet-

ter able to correctly guess what an agent would do than those presented with

other standard types of explanation. Participants also found that experiential

explanations are more understandable, satisfying, complete, useful, and accurate.

Qualitative analysis provides insights into the factors of experiential explanations

that are most useful.∗

arXiv:2210.04723v4 [cs.AI] 13 Dec 2023

My expected future reward for

going down is 0.95, while my

future reward for going up is 0.80.

Why did you go down instead of up?

“If I go up, I fear falling down the

stairs; going down feels safer.”

Fig. 1 In an interaction between the user and the agent, the user expected the agent to go up, but

the agent went down instead. The user asks the system for an explanation. The ﬁgure shows two

types of explanations the user can receive. (A) is an explanation that can be derived directly from the

agent’s policy. (B) is an explanation that can be derived from the agent’s policy with the assistance

of additional models called negative and positive inﬂuence predictors.

1 Introduction

Reinforcement Learning (RL) techniques are becoming increasingly popular in crit-

ical domains such as robotics and autonomous vehicles. However, applications such

as autonomous cars and socially assistive robots in healthcare and home settings are

expected to interact with non-AI experts. People often seek explanations for abnormal

events or observations to improve their understanding of the agent and better predict

and control its actions [1]. This need for an explanation will likely arise when inter-

acting with these systems, whether it is a car taking an unexpected turn or a robot

performing a task unconventionally. Without an explanation, users can ﬁnd it diﬃcult

to trust the agent’s ability to act safely and reasonably [2], especially if they have

limited artiﬁcial intelligence expertise. This can even be in the case where the agent is

operating optimally and without failure; the agent’s optimal behavior may not match

the user’s expectations, resulting in confusion and lack of trust. Therefore, RL sys-

tems need to provide good explanations without compromising their performance. To

address this need, explainable RL has become a rapidly growing area of research with

many open challenges [3–9].

But what makes a good explanation for end-users who cannot change the under-

lying model? How do we generate explanations if we cannot modify the underlying

RL model or degrade its performance? In RL, an agent learns dynamically to maxi-

mize its reward through a series of experiences interacting with the environment [10].

Through these interactions, the agent builds up a model of utility, Q(s, a), which esti-

mates the future reward that can be expected to be achieved if a particular action a

is performed from a particular state s.

∗Code repository available at:

https://github.com/amal994/Experiential-Explanations-RL

While an RL agent learns from its experiences during training, those experiences

are inaccessible after their training is over [4]. This is because Q(s, a) summarizes

future experiences as a single, real-valued number; this is all that is needed to execute

a policy π= argmaxaQ(s, a). For example, consider a robot that receives a negative

reward for being close to the stairs and thus learns that states along an alternative

route have higher utility. When it comes time to ﬁgure out why the agent executed

one action trajectory over another, the policy is devoid of any information from which

to construct an explanation beyond the fact that some actions have lower expected

utility than others.

Explanations need not only address agents’ failures. The user’s need for expla-

nations can also arise when the agent makes an unexpected decision. Requests for

explanations thus often manifest themselves as “why-not” questions referencing a

counterfactual: “why did not the agent make a diﬀerent decision?” Explanations in

sequential decision-making environments can help users update their mental models

of the agent or identify how to change the environment so that the agent performs as

expected.

We propose a technique, Experiential Explanations, in which deep RL agents

generate explanations in response to on-demand, local, counterfactuals proposed by

users. These explanations qualitatively contrast the agent’s intended action trajec-

tory with a trajectory proposed by the user, and speciﬁcally link agent behavior

to environmental contexts. The agent maintains a set of models—called inﬂuence

predictors—of how diﬀerent sparse reward states inﬂuence the agent’s state-action

utilities and, thus, its policy. Inﬂuence predictors are trained alongside an RL policy

with a speciﬁc focus on retaining details about the training experience. The inﬂuence

models are “outside the black box”, but provide information with which to interpret

the agent’s black-box policy in a human-understandable fashion.

Consider the illustrative example in Figure 1where the user observes the agent

go down around the wall. The user expects the agent to go up, but the agent goes

down instead. The user asks why the agent did not choose up, which appears to lead

to a shorter route to the goal. One possible and correct explanation would be that

the agent’s estimated expected utility for the down action is higher than that for

the up action. However, this is not information that a user can easily act upon. The

alternative, experiential explanation, states that up will pass through a region that

is in proximity to dangerous locations. The user can update their understanding that

the agent prefers to avoid stairs. The user can also understand how to take an explicit

action to change the environment, such as blocking stairs. Our technique focuses on

post-hoc, real-time explanations of agents operating in situ, instead of pre-explanation

of agent plans [11], although the two are closely related.

We evaluate our Experiential Explanations technique with studies with human

participants. We show that explanations generated by our technique allow users to

better understand and predict agent actions and are found to be more useful. We

additionally perform a qualitative analysis of participant responses to see how users

use the explanations in our technique and baseline alternatives to reason about the

agent’s behavior, which not only provides evidence for Experimental Explanations, but

provides a blueprint to understand the human factors of other explanation techniques.

2 Background and Related work

Reinforcement learning is an approach to learning in which an agent attempts to

maximize its reward through feedback from a trial-and-error process. RL is suitable for

Markov Decision Processes, which can be represented as a tuple M=⟨S, A, T, R, γ⟩

where Sis the set of possible world states, Ais the set of possible actions, Tis the state

transition function, which determines the probability of the following state P(S) as a

function of the current state and action. T:S×A→P(S), Ris the reward function

R:S×A→R, and γis a discount factor 0 ≤γ≤1. RL ﬁrst learns a policy π:S→A,

which deﬁnes which actions should be taken in each state. Deep RL uses deep neural

networks to estimate the expected future utility such that π(s) = argmaxaQθ(s, a)

where θare the model parameters.

As deep reinforcement learning is increasingly used in sensitive areas such as health-

care and robotics, Explainable RL (XRL) is becoming a crucial research ﬁeld. For an

overview of XRL, please refer to Milani et al. [8]. We highlight the most relevant work

on XRL to our approach.

Rationale generation [12] is an approach to explanation generation for sequential

environments in which explanations approximate how humans would explain what

they would do in the same situation as the agent. Rationale generation does not

“open the black box” but looks at the agent’s state and action and applies a model

trained from human explanations. Rationales do not guarantee faithfulness. However,

our models train alongside the agent and achieve a high degree of faithfulness.

Some XRL techniques generate explanations by decomposing utilities and showing

information about the components of the utility score for an agent’s state [13–16].

Our technique is related to decompositions in that we learn the inﬂuence of diﬀerent

sources of reward on utilities, avoiding the trade-oﬀ between using a semi-interpretable

system or a better black-box system.

Our work focuses on counterfactual explanations, but we diﬀer in the explana-

tion goals and approach from other works. For example, Frost et al. [17] trained an

exploration policy to generate test time trajectories to explain how the agent behaves

in unseen states after training. Explanation trajectories help users understand how

the agent would perform under new conditions. While this method gives users a

global understanding of the agent, our method focuses on providing local counter-

factual explanations of an agent’s action. Contrastive approaches such as van der

Waa et al. [18] leverage an interpretable model and a learned transition model. Mad-

umal et al. [19] explain local actions with a causal structural model as an action

inﬂuence graph to encode the causal relations between variables. Sreedharan et al.

[20] generate contrastive explanations using a partial symbolic model approximation,

including knowledge about concepts, preconditions, and action costs. These methods

provided answers to user’s local why not questions. Our approach minimizes the use

of predeﬁned tasks or a priori environment knowledge.

Olson et al. [21] explains the local decisions of an agent by demonstrating a coun-

terfactual state in which the agent takes a diﬀerent action to illustrate the minimal

alteration necessary for a diﬀerent result. Huber et al.[22] furthers this concept by

creating counterfactual states with an adversarial learning model. Our technique, how-

ever, concentrates on the qualitative distinctions between trajectories and not on the

states.

3 Experiential Explanations

Experiential Explanations are contrastive explanations generated for local decisions

with minimal reliance on structured inputs and without imposing limitations on the

agent or the RL algorithm. Our explanation technique uses additional models called

inﬂuence predictors that learn how sparse rewards aﬀect the agent’s utility predictions.

In essence, inﬂuence predictors tell us how strongly or weakly any source of reward

(positive or negative) is impacting states in the state space that the agent believes it

will pass through. This is in contrast to the agent’s learned policy, which aggregates

the utilities with respect to all rewards. This enables agents that use our explanation

technique to provide additional context to the choices they make because it knows

about their utility landscape in ﬁner-grained detail. As in Figure 1, we see the expla-

nation referencing the agent’s relationship to the environment in terms of negative

and positive elements.

Once we have trained these inﬂuence predictor models, explanation generation

proceeds in two phases. First, given a decision made by the agent and a request for an

explanation, we generate a state-action trajectory that the agent will take to maximize

the expected reward plus a counterfactual trajectory based on the user’s expectation.

For example, in Figure 1the agent’s trajectory is down, but the user asks about going

up.

Second, we use inﬂuence predictors to compare the two trajectories and extract

information about the diﬀerent inﬂuences along each. In the following sections, we

will dive into the details about the inﬂuence predictors, what they are, how we train

them, how to use them to generate explanations, and the other components of the

explanation generation system and how we generate explanations.

3.1 Inﬂuence Predictors

Inﬂuence predictors reconstruct the eﬀects of the received rewards on utility during

training. These models are trained alongside the agent to predict the strength of the

inﬂuence of diﬀerent sources of reward on the agent: Uc(s, a) where each c∈Cis a dis-

tinct source (or class) of reward. For example, a terminal goal state might be a source

of positive reward, stairs might be a source of negative reward, and other dangerous

objects may be other sources of negative reward. In a more complex environment,

multiple sources of positive and negative rewards can be helpful to outline the agent’s

plan. For instance, a robot attempting to make a cup of coﬀee would receive positive

rewards for obtaining a clean mug, hot milk and coﬀee, then combining the milk and

coﬀee in the mug and stirring them. Negative inﬂuences could include getting a dirty

cup or adding salt.

In principle, any agent architecture and optimization algorithm can be used with

our explanation technique as long as we can observe its transitions during training.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ExperientialExplanationsforReinforcementLearningAmalAlabdulkarim,MadhuriSingh,GennieMansi,KaelyHall,MarkO.RiedlSchoolofInteractiveComputing,GeorgiaInstituteofTechnology.*Correspondingauthor(s).E-mail(s):amal@gatech.edu;Contributingauthors:msingh365@gatech.edu;gennie.mansi@gatech.edu;khall33@gatech.e...

展开>> 收起<<

Experiential Explanations for Reinforcement Learning Amal Alabdulkarim Madhuri Singh Gennie Mansi Kaely Hall.pdf

共34页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Experiential Explanations for Reinforcement Learning Amal Alabdulkarim Madhuri Singh Gennie Mansi Kaely Hall

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: