While an RL agent learns from its experiences during training, those experiences
are inaccessible after their training is over [4]. This is because Q(s, a) summarizes
future experiences as a single, real-valued number; this is all that is needed to execute
a policy π= argmaxaQ(s, a). For example, consider a robot that receives a negative
reward for being close to the stairs and thus learns that states along an alternative
route have higher utility. When it comes time to figure out why the agent executed
one action trajectory over another, the policy is devoid of any information from which
to construct an explanation beyond the fact that some actions have lower expected
utility than others.
Explanations need not only address agents’ failures. The user’s need for expla-
nations can also arise when the agent makes an unexpected decision. Requests for
explanations thus often manifest themselves as “why-not” questions referencing a
counterfactual: “why did not the agent make a different decision?” Explanations in
sequential decision-making environments can help users update their mental models
of the agent or identify how to change the environment so that the agent performs as
expected.
We propose a technique, Experiential Explanations, in which deep RL agents
generate explanations in response to on-demand, local, counterfactuals proposed by
users. These explanations qualitatively contrast the agent’s intended action trajec-
tory with a trajectory proposed by the user, and specifically link agent behavior
to environmental contexts. The agent maintains a set of models—called influence
predictors—of how different sparse reward states influence the agent’s state-action
utilities and, thus, its policy. Influence predictors are trained alongside an RL policy
with a specific focus on retaining details about the training experience. The influence
models are “outside the black box”, but provide information with which to interpret
the agent’s black-box policy in a human-understandable fashion.
Consider the illustrative example in Figure 1where the user observes the agent
go down around the wall. The user expects the agent to go up, but the agent goes
down instead. The user asks why the agent did not choose up, which appears to lead
to a shorter route to the goal. One possible and correct explanation would be that
the agent’s estimated expected utility for the down action is higher than that for
the up action. However, this is not information that a user can easily act upon. The
alternative, experiential explanation, states that up will pass through a region that
is in proximity to dangerous locations. The user can update their understanding that
the agent prefers to avoid stairs. The user can also understand how to take an explicit
action to change the environment, such as blocking stairs. Our technique focuses on
post-hoc, real-time explanations of agents operating in situ, instead of pre-explanation
of agent plans [11], although the two are closely related.
We evaluate our Experiential Explanations technique with studies with human
participants. We show that explanations generated by our technique allow users to
better understand and predict agent actions and are found to be more useful. We
additionally perform a qualitative analysis of participant responses to see how users
use the explanations in our technique and baseline alternatives to reason about the
agent’s behavior, which not only provides evidence for Experimental Explanations, but
provides a blueprint to understand the human factors of other explanation techniques.
3