CHI ’23, April 23–28, 2023, Hamburg, Germany Kim, Watkins, Russakovsky, Fong, Monroy-Hernández
systems) and low-to-high domain background (representing both
users who know less and more about birding than the AI system).
With each participant, we conducted an hour-long interview,
which included a survey and an interactive feedback session, to un-
derstand their XAI needs, uses, and perceptions. Our study bridges
the gap between XAI research done in the HCI and AI communities
by directly connecting end-users of a real-world AI application
with the XAI methods literature. We do so by mocking up four
XAI approaches that could be potentially implemented into Merlin,
i.e., heatmap, example, concept, and prototype-based explanations
of the AI’s outputs. The mock-up explanations enabled us to get
concrete and detailed data about how participants intended to use
XAI explanations, as well as how they perceived each approach, in
an actual AI use context.
Through our study, we found:
•
Participants’ XAI needs varied depending on their domain/AI
background and interest level. While participants were gen-
erally curious about AI system details, those with high-AI
background or notably high interest in birds had higher XAI
needs. However, participants unanimously expressed a need
for practically useful information that can improve their col-
laboration with the AI, suggesting an important area of focus
for future XAI development (RQ1, Sec. 5.1).
•
Participants intended to use XAI explanations for various
purposes beyond understanding the AI’s outputs: determin-
ing when to trust the AI, learning to perform the task better
on their own without needing to consult the AI, changing
their behavior to supply better inputs to the AI, and giving
constructive feedback to the developers to improve the AI.
This highlights the broad range of XAI needs that should be
considered in XAI development (RQ2, Sec. 5.2).
•
Among existing XAI approaches, participants preferred part-
based explanations, i.e., concept [
105
,
144
] and prototype [
24
,
88
] based explanations. Participants found them similar to
human reasoning and explanations, and the most useful
for the aforementioned purposes. This suggests that to the
extent possible, the XAI community should pay particular
attention to these methods, despite the challenges with their
development and evaluation (RQ3, Sec. 5.3).
Following our ndings, we discuss XAI’s potential as a medium
for enhancing human-AI collaboration, and conclude with a set of
recommendations for future XAI design. However, as with any case
study, our ndings and recommendations may have limited gener-
alizability. This is an intentional trade-o made to gain an in-depth
understanding of end-users’ XAI needs, uses, and perceptions in a
real-world context, in line with growing calls for human-centered
XAI research [
34
–
36
,
67
,
68
]. We are hopeful that our study design
and insights will aid future XAI research in other contexts.
2 RELATED WORK
2.1 From algorithm-centered to
human-centered XAI
With the growing adoption of AI, there has been a surge of interest
in explainable AI (XAI) research that aims to make AI systems
more understandable to people. XAI is one of the fastest growing
elds with hundreds of new papers published each year. See [
1
,
2
,
7
,
27
,
29
,
41
,
46
,
48
,
49
,
84
,
107
,
109
,
110
,
121
] for in-depth surveys,
and the following for examples of XAI research done in dierent
disciplines: AI [
42
,
56
,
59
,
97
,
118
], HCI [
50
,
120
,
133
,
139
], social
and cognitive science [
18
,
26
,
78
,
80
,
122
,
124
], and philosophy [
14
,
54
,
82
]. XAI is also increasingly being researched and applied in
various domains, including but not limited to healthcare [
5
,
71
,
75
,
100
,
117
,
119
,
135
,
141
], autonomous driving [
9
,
77
,
94
], energy and
power systems [72], and climate science [73].
Much of the eld’s eorts originally focused on the algorithms,
i.e., on providing explanations of AI systems’ inner workings and
outputs, rather than the people or the context where these sys-
tems are deployed. Recently, there has been a growing recognition
that XAI methods cannot be developed “in a vacuum” without an
understanding of people’s needs in specic contexts [
34
–
36
,
66
–
68
]. In response, researchers have proposed conceptual frameworks
to characterize XAI needs based on people’s roles [
64
,
102
,
126
],
expertise [
83
], or more ne-grained axes of knowledge and objec-
tives [
123
]. Others interviewed industry practitioners who work on
AI products to identify their common XAI needs [15, 52, 66].
We join this relatively new line of research, called “human-
centered XAI” [
34
–
36
,
66
–
68
], and foreground the people who use
AI systems and their needs, goals, and contexts in understanding
how explainability can support human-AI interaction. In doing so,
we build on the aforementioned frameworks to study end-users’
explainability needs. Concretely, we developed a survey based on
Liao and colleagues’ XAI Question Bank [
66
] to collect concrete
data on which aspects of AI end-users want to know about.
2.2 Understanding end-users’ XAI needs
Although human-centered XAI is an actively growing area of re-
search, much of the work still focuses on developers rather than
end-users of AI systems [
15
,
52
,
66
]. This gap is unsurprising, since
XAI methods have been primarily developed for and used by de-
velopers to inspect AI systems [
15
,
80
]. But it is critical because
end-users may have dierent explainability needs that XAI methods
should but don’t yet support.
Recently, some researchers began looking at end-users’ XAI
needs in context of specic applications [
22
,
23
,
127
]. Tonekaboni
and colleagues [
127
] placed clinicians in hypothetical scenarios
where AI models are used for health risk assessment, and found
that clinicians wanted to know what features the model uses so
they can understand and rationalize the model’s outputs. In a lab
setting, Cai and colleagues [
23
] studied clinicians’ needs in their
interaction with a prototype AI model that can assist with cancer
diagnoses, and found that clinicians desired overall information
about the model (e.g., capabilities and limitations, design objective)
in addition to explanations of the model’s individual outputs. In
another lab setting, Cai and colleagues [
22
] examined what needs
pathologists have when using a prototype AI model for retrieving
similar medical images. They also studied how pathologists use
their proposed renement tools, nding that pathologists often
re-purposed them to test and understand the underlying search
algorithm and to disambiguate AI errors from their own errors.
These studies delivered rich insights. However, they studied hy-
pothetical or prototype AI applications. Hence, an important ques-
tion remains, which we tackle in this work: What are end-users’