Experiential Explanations for Reinforcement Learning Amal Alabdulkarim Madhuri Singh Gennie Mansi Kaely Hall

2025-04-27 0 0 2.69MB 34 页 10玖币
侵权投诉
Experiential Explanations for Reinforcement
Learning
Amal Alabdulkarim, Madhuri Singh, Gennie Mansi, Kaely Hall,
Mark O. Riedl
School of Interactive Computing, Georgia Institute of Technology.
*Corresponding author(s). E-mail(s): amal@gatech.edu;
Contributing authors: msingh365@gatech.edu;gennie.mansi@gatech.edu;
khall33@gatech.edu;riedl@gatech.edu;
Abstract
Reinforcement Learning (RL) systems can be complex and non-interpretable,
making it challenging for non-AI experts to understand or intervene in their
decisions. This is due in part to the sequential nature of RL in which actions
are chosen because of future rewards. However, RL agents discard the qualita-
tive features of their training, making it difficult to recover user-understandable
information for “why” an action is chosen. We propose a technique Experien-
tial Explanations to generate counterfactual explanations by training influence
predictors along with the RL policy. Influence predictors are models that learn
how sources of reward affect the agent in different states, thus restoring informa-
tion about how the policy reflects the environment. A human evaluation study
revealed that participants presented with experiential explanations were bet-
ter able to correctly guess what an agent would do than those presented with
other standard types of explanation. Participants also found that experiential
explanations are more understandable, satisfying, complete, useful, and accurate.
Qualitative analysis provides insights into the factors of experiential explanations
that are most useful.
1
arXiv:2210.04723v4 [cs.AI] 13 Dec 2023
A
My expected future reward for
going down is 0.95, while my
future reward for going up is 0.80.
Why did you go down instead of up?
“If I go up, I fear falling down the
stairs; going down feels safer.”
B
Fig. 1 In an interaction between the user and the agent, the user expected the agent to go up, but
the agent went down instead. The user asks the system for an explanation. The figure shows two
types of explanations the user can receive. (A) is an explanation that can be derived directly from the
agent’s policy. (B) is an explanation that can be derived from the agent’s policy with the assistance
of additional models called negative and positive influence predictors.
1 Introduction
Reinforcement Learning (RL) techniques are becoming increasingly popular in crit-
ical domains such as robotics and autonomous vehicles. However, applications such
as autonomous cars and socially assistive robots in healthcare and home settings are
expected to interact with non-AI experts. People often seek explanations for abnormal
events or observations to improve their understanding of the agent and better predict
and control its actions [1]. This need for an explanation will likely arise when inter-
acting with these systems, whether it is a car taking an unexpected turn or a robot
performing a task unconventionally. Without an explanation, users can find it difficult
to trust the agent’s ability to act safely and reasonably [2], especially if they have
limited artificial intelligence expertise. This can even be in the case where the agent is
operating optimally and without failure; the agent’s optimal behavior may not match
the user’s expectations, resulting in confusion and lack of trust. Therefore, RL sys-
tems need to provide good explanations without compromising their performance. To
address this need, explainable RL has become a rapidly growing area of research with
many open challenges [39].
But what makes a good explanation for end-users who cannot change the under-
lying model? How do we generate explanations if we cannot modify the underlying
RL model or degrade its performance? In RL, an agent learns dynamically to maxi-
mize its reward through a series of experiences interacting with the environment [10].
Through these interactions, the agent builds up a model of utility, Q(s, a), which esti-
mates the future reward that can be expected to be achieved if a particular action a
is performed from a particular state s.
Code repository available at:
https://github.com/amal994/Experiential-Explanations-RL
2
While an RL agent learns from its experiences during training, those experiences
are inaccessible after their training is over [4]. This is because Q(s, a) summarizes
future experiences as a single, real-valued number; this is all that is needed to execute
a policy π= argmaxaQ(s, a). For example, consider a robot that receives a negative
reward for being close to the stairs and thus learns that states along an alternative
route have higher utility. When it comes time to figure out why the agent executed
one action trajectory over another, the policy is devoid of any information from which
to construct an explanation beyond the fact that some actions have lower expected
utility than others.
Explanations need not only address agents’ failures. The user’s need for expla-
nations can also arise when the agent makes an unexpected decision. Requests for
explanations thus often manifest themselves as “why-not” questions referencing a
counterfactual: “why did not the agent make a different decision?” Explanations in
sequential decision-making environments can help users update their mental models
of the agent or identify how to change the environment so that the agent performs as
expected.
We propose a technique, Experiential Explanations, in which deep RL agents
generate explanations in response to on-demand, local, counterfactuals proposed by
users. These explanations qualitatively contrast the agent’s intended action trajec-
tory with a trajectory proposed by the user, and specifically link agent behavior
to environmental contexts. The agent maintains a set of models—called influence
predictors—of how different sparse reward states influence the agent’s state-action
utilities and, thus, its policy. Influence predictors are trained alongside an RL policy
with a specific focus on retaining details about the training experience. The influence
models are “outside the black box”, but provide information with which to interpret
the agent’s black-box policy in a human-understandable fashion.
Consider the illustrative example in Figure 1where the user observes the agent
go down around the wall. The user expects the agent to go up, but the agent goes
down instead. The user asks why the agent did not choose up, which appears to lead
to a shorter route to the goal. One possible and correct explanation would be that
the agent’s estimated expected utility for the down action is higher than that for
the up action. However, this is not information that a user can easily act upon. The
alternative, experiential explanation, states that up will pass through a region that
is in proximity to dangerous locations. The user can update their understanding that
the agent prefers to avoid stairs. The user can also understand how to take an explicit
action to change the environment, such as blocking stairs. Our technique focuses on
post-hoc, real-time explanations of agents operating in situ, instead of pre-explanation
of agent plans [11], although the two are closely related.
We evaluate our Experiential Explanations technique with studies with human
participants. We show that explanations generated by our technique allow users to
better understand and predict agent actions and are found to be more useful. We
additionally perform a qualitative analysis of participant responses to see how users
use the explanations in our technique and baseline alternatives to reason about the
agent’s behavior, which not only provides evidence for Experimental Explanations, but
provides a blueprint to understand the human factors of other explanation techniques.
3
2 Background and Related work
Reinforcement learning is an approach to learning in which an agent attempts to
maximize its reward through feedback from a trial-and-error process. RL is suitable for
Markov Decision Processes, which can be represented as a tuple M=S, A, T, R, γ
where Sis the set of possible world states, Ais the set of possible actions, Tis the state
transition function, which determines the probability of the following state P(S) as a
function of the current state and action. T:S×AP(S), Ris the reward function
R:S×AR, and γis a discount factor 0 γ1. RL first learns a policy π:SA,
which defines which actions should be taken in each state. Deep RL uses deep neural
networks to estimate the expected future utility such that π(s) = argmaxaQθ(s, a)
where θare the model parameters.
As deep reinforcement learning is increasingly used in sensitive areas such as health-
care and robotics, Explainable RL (XRL) is becoming a crucial research field. For an
overview of XRL, please refer to Milani et al. [8]. We highlight the most relevant work
on XRL to our approach.
Rationale generation [12] is an approach to explanation generation for sequential
environments in which explanations approximate how humans would explain what
they would do in the same situation as the agent. Rationale generation does not
“open the black box” but looks at the agent’s state and action and applies a model
trained from human explanations. Rationales do not guarantee faithfulness. However,
our models train alongside the agent and achieve a high degree of faithfulness.
Some XRL techniques generate explanations by decomposing utilities and showing
information about the components of the utility score for an agent’s state [1316].
Our technique is related to decompositions in that we learn the influence of different
sources of reward on utilities, avoiding the trade-off between using a semi-interpretable
system or a better black-box system.
Our work focuses on counterfactual explanations, but we differ in the explana-
tion goals and approach from other works. For example, Frost et al. [17] trained an
exploration policy to generate test time trajectories to explain how the agent behaves
in unseen states after training. Explanation trajectories help users understand how
the agent would perform under new conditions. While this method gives users a
global understanding of the agent, our method focuses on providing local counter-
factual explanations of an agent’s action. Contrastive approaches such as van der
Waa et al. [18] leverage an interpretable model and a learned transition model. Mad-
umal et al. [19] explain local actions with a causal structural model as an action
influence graph to encode the causal relations between variables. Sreedharan et al.
[20] generate contrastive explanations using a partial symbolic model approximation,
including knowledge about concepts, preconditions, and action costs. These methods
provided answers to user’s local why not questions. Our approach minimizes the use
of predefined tasks or a priori environment knowledge.
Olson et al. [21] explains the local decisions of an agent by demonstrating a coun-
terfactual state in which the agent takes a different action to illustrate the minimal
alteration necessary for a different result. Huber et al.[22] furthers this concept by
4
creating counterfactual states with an adversarial learning model. Our technique, how-
ever, concentrates on the qualitative distinctions between trajectories and not on the
states.
3 Experiential Explanations
Experiential Explanations are contrastive explanations generated for local decisions
with minimal reliance on structured inputs and without imposing limitations on the
agent or the RL algorithm. Our explanation technique uses additional models called
influence predictors that learn how sparse rewards affect the agent’s utility predictions.
In essence, influence predictors tell us how strongly or weakly any source of reward
(positive or negative) is impacting states in the state space that the agent believes it
will pass through. This is in contrast to the agent’s learned policy, which aggregates
the utilities with respect to all rewards. This enables agents that use our explanation
technique to provide additional context to the choices they make because it knows
about their utility landscape in finer-grained detail. As in Figure 1, we see the expla-
nation referencing the agent’s relationship to the environment in terms of negative
and positive elements.
Once we have trained these influence predictor models, explanation generation
proceeds in two phases. First, given a decision made by the agent and a request for an
explanation, we generate a state-action trajectory that the agent will take to maximize
the expected reward plus a counterfactual trajectory based on the user’s expectation.
For example, in Figure 1the agent’s trajectory is down, but the user asks about going
up.
Second, we use influence predictors to compare the two trajectories and extract
information about the different influences along each. In the following sections, we
will dive into the details about the influence predictors, what they are, how we train
them, how to use them to generate explanations, and the other components of the
explanation generation system and how we generate explanations.
3.1 Influence Predictors
Influence predictors reconstruct the effects of the received rewards on utility during
training. These models are trained alongside the agent to predict the strength of the
influence of different sources of reward on the agent: Uc(s, a) where each cCis a dis-
tinct source (or class) of reward. For example, a terminal goal state might be a source
of positive reward, stairs might be a source of negative reward, and other dangerous
objects may be other sources of negative reward. In a more complex environment,
multiple sources of positive and negative rewards can be helpful to outline the agent’s
plan. For instance, a robot attempting to make a cup of coffee would receive positive
rewards for obtaining a clean mug, hot milk and coffee, then combining the milk and
coffee in the mug and stirring them. Negative influences could include getting a dirty
cup or adding salt.
In principle, any agent architecture and optimization algorithm can be used with
our explanation technique as long as we can observe its transitions during training.
5
摘要:

ExperientialExplanationsforReinforcementLearningAmalAlabdulkarim,MadhuriSingh,GennieMansi,KaelyHall,MarkO.RiedlSchoolofInteractiveComputing,GeorgiaInstituteofTechnology.*Correspondingauthor(s).E-mail(s):amal@gatech.edu;Contributingauthors:msingh365@gatech.edu;gennie.mansi@gatech.edu;khall33@gatech.e...

展开>> 收起<<
Experiential Explanations for Reinforcement Learning Amal Alabdulkarim Madhuri Singh Gennie Mansi Kaely Hall.pdf

共34页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:34 页 大小:2.69MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 34
客服
关注