Help Me Help the AI Understanding How Explainability Can Support Human-AI Interaction

2025-05-06 0 0 2.09MB 17 页 10玖币
侵权投诉
"Help Me Help the AI": Understanding How Explainability Can
Support Human-AI Interaction
Sunnie S. Y. Kim
Princeton University
Princeton, New Jersey, USA
Elizabeth Anne Watkins
Intel Labs
Santa Clara, California, USA
Olga Russakovsky
Princeton University
Princeton, New Jersey, USA
Ruth Fong
Princeton University
Princeton, New Jersey, USA
Andrés Monroy-Hernández
Princeton University
Princeton, New Jersey, USA
ABSTRACT
Despite the proliferation of explainable AI (XAI) methods, little is
understood about end-users’ explainability needs and behaviors
around XAI explanations. To address this gap and contribute to
understanding how explainability can support human-AI interac-
tion, we conducted a mixed-methods study with 20 end-users of a
real-world AI application, the Merlin bird identication app, and in-
quired about their XAI needs, uses, and perceptions. We found that
participants desire practically useful information that can improve
their collaboration with the AI, more so than technical system de-
tails. Relatedly, participants intended to use XAI explanations for
various purposes beyond understanding the AI’s outputs: calibrat-
ing trust, improving their task skills, changing their behavior to
supply better inputs to the AI, and giving constructive feedback to
developers. Finally, among existing XAI approaches, participants
preferred part-based explanations that resemble human reasoning
and explanations. We discuss the implications of our ndings and
provide recommendations for future XAI design.
CCS CONCEPTS
Human-centered computing Empirical studies in HCI
;
User studies
;
Computing methodologies Articial intel-
ligence.
KEYWORDS
Explainable AI (XAI), Interpretability, Human-Centered XAI, Human-
AI Interaction, Human-AI Collaboration, XAI for Computer Vision,
Local Explanations
ACM Reference Format:
Sunnie S. Y. Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong,
and Andrés Monroy-Hernández. 2023. "Help Me Help the AI": Understand-
ing How Explainability Can Support Human-AI Interaction. In Proceedings
of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23),
Most work was done during Postdoctoral Research Appointment at the Princeton
Center for Information Technology Policy and Human-Computer Interaction Program.
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
CHI ’23, April 23–28, 2023, Hamburg, Germany
©2023 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9421-5/23/04.
https://doi.org/10.1145/3544548.3581001
April 23–28, 2023, Hamburg, Germany. ACM, New York, NY, USA, 17 pages.
https://doi.org/10.1145/3544548.3581001
1 INTRODUCTION
Articial Intelligence (AI) systems are ubiquitous: from unlocking
our phones with face identication, to reducing trac accidents
with autonomous cars, to assisting radiologists with medical image
analysis. Being able to better understand these AI systems is be-
coming increasingly important—although what exactly that means
is dierent in dierent settings: a smartphone user may want to
understand how best to position their face to quickly unlock their
phone, a researcher may want to understand what particular de-
sign decisions led to an autonomous car accident, and a radiologist
may want to understand where the medical decision support tool
is looking in suggesting a particular diagnosis.
Over the past years, numerous explainable AI (XAI) methods
have been developed to provide transparency into these AI sys-
tems and make them more understandable to people (see [
2
,
7
,
27
,
41
,
46
,
48
,
49
,
109
] for surveys). However, arguably these are be-
ing developed without embracing the full spectrum of end-user
needs. Particularly for computer vision AI systems (such as the
ones described above), with millions of model parameters process-
ing thousands of low-level image pixels, translating model outputs
into understandable insights is so challenging that proposed XAI
methods are frequently limited by what XAI researchers can do
rather than what AI end-users might need.
In this work, we connect XAI development with end-users and
study a real-world context in which XAI methods might be deployed.
Concretely, we set out to answer three research questions:
RQ1
: What are end-users’ XAI
needs
in real-world AI ap-
plications?
RQ2
: How do end-users
intend to use
XAI explanations
1
?
RQ3
: How are existing XAI approaches
perceived
by end-
users?
In scoping our study, we focus on Merlin, an AI-based mobile
phone application that uses computer vision to identify birds in user-
uploaded photos and audio recordings. We chose Merlin because it
is a widely-used application that allows us to connect with a diverse
set of active end-users. Concretely, we conducted a mixed-methods
study with 20 Merlin users who span the range from low-to-high
AI background (representing both consumers and creators of AI
1
In this paper, we use the term “XAI explanations” to refer to explanations produced
by XAI methods to explain specic AI system outputs.
arXiv:2210.03735v2 [cs.HC] 16 Feb 2023
CHI ’23, April 23–28, 2023, Hamburg, Germany Kim, Watkins, Russakovsky, Fong, Monroy-Hernández
systems) and low-to-high domain background (representing both
users who know less and more about birding than the AI system).
With each participant, we conducted an hour-long interview,
which included a survey and an interactive feedback session, to un-
derstand their XAI needs, uses, and perceptions. Our study bridges
the gap between XAI research done in the HCI and AI communities
by directly connecting end-users of a real-world AI application
with the XAI methods literature. We do so by mocking up four
XAI approaches that could be potentially implemented into Merlin,
i.e., heatmap, example, concept, and prototype-based explanations
of the AI’s outputs. The mock-up explanations enabled us to get
concrete and detailed data about how participants intended to use
XAI explanations, as well as how they perceived each approach, in
an actual AI use context.
Through our study, we found:
Participants’ XAI needs varied depending on their domain/AI
background and interest level. While participants were gen-
erally curious about AI system details, those with high-AI
background or notably high interest in birds had higher XAI
needs. However, participants unanimously expressed a need
for practically useful information that can improve their col-
laboration with the AI, suggesting an important area of focus
for future XAI development (RQ1, Sec. 5.1).
Participants intended to use XAI explanations for various
purposes beyond understanding the AI’s outputs: determin-
ing when to trust the AI, learning to perform the task better
on their own without needing to consult the AI, changing
their behavior to supply better inputs to the AI, and giving
constructive feedback to the developers to improve the AI.
This highlights the broad range of XAI needs that should be
considered in XAI development (RQ2, Sec. 5.2).
Among existing XAI approaches, participants preferred part-
based explanations, i.e., concept [
105
,
144
] and prototype [
24
,
88
] based explanations. Participants found them similar to
human reasoning and explanations, and the most useful
for the aforementioned purposes. This suggests that to the
extent possible, the XAI community should pay particular
attention to these methods, despite the challenges with their
development and evaluation (RQ3, Sec. 5.3).
Following our ndings, we discuss XAI’s potential as a medium
for enhancing human-AI collaboration, and conclude with a set of
recommendations for future XAI design. However, as with any case
study, our ndings and recommendations may have limited gener-
alizability. This is an intentional trade-o made to gain an in-depth
understanding of end-users’ XAI needs, uses, and perceptions in a
real-world context, in line with growing calls for human-centered
XAI research [
34
36
,
67
,
68
]. We are hopeful that our study design
and insights will aid future XAI research in other contexts.
2 RELATED WORK
2.1 From algorithm-centered to
human-centered XAI
With the growing adoption of AI, there has been a surge of interest
in explainable AI (XAI) research that aims to make AI systems
more understandable to people. XAI is one of the fastest growing
elds with hundreds of new papers published each year. See [
1
,
2
,
7
,
27
,
29
,
41
,
46
,
48
,
49
,
84
,
107
,
109
,
110
,
121
] for in-depth surveys,
and the following for examples of XAI research done in dierent
disciplines: AI [
42
,
56
,
59
,
97
,
118
], HCI [
50
,
120
,
133
,
139
], social
and cognitive science [
18
,
26
,
78
,
80
,
122
,
124
], and philosophy [
14
,
54
,
82
]. XAI is also increasingly being researched and applied in
various domains, including but not limited to healthcare [
5
,
71
,
75
,
100
,
117
,
119
,
135
,
141
], autonomous driving [
9
,
77
,
94
], energy and
power systems [72], and climate science [73].
Much of the eld’s eorts originally focused on the algorithms,
i.e., on providing explanations of AI systems’ inner workings and
outputs, rather than the people or the context where these sys-
tems are deployed. Recently, there has been a growing recognition
that XAI methods cannot be developed “in a vacuum” without an
understanding of people’s needs in specic contexts [
34
36
,
66
68
]. In response, researchers have proposed conceptual frameworks
to characterize XAI needs based on people’s roles [
64
,
102
,
126
],
expertise [
83
], or more ne-grained axes of knowledge and objec-
tives [
123
]. Others interviewed industry practitioners who work on
AI products to identify their common XAI needs [15, 52, 66].
We join this relatively new line of research, called “human-
centered XAI” [
34
36
,
66
68
], and foreground the people who use
AI systems and their needs, goals, and contexts in understanding
how explainability can support human-AI interaction. In doing so,
we build on the aforementioned frameworks to study end-users’
explainability needs. Concretely, we developed a survey based on
Liao and colleagues’ XAI Question Bank [
66
] to collect concrete
data on which aspects of AI end-users want to know about.
2.2 Understanding end-users’ XAI needs
Although human-centered XAI is an actively growing area of re-
search, much of the work still focuses on developers rather than
end-users of AI systems [
15
,
52
,
66
]. This gap is unsurprising, since
XAI methods have been primarily developed for and used by de-
velopers to inspect AI systems [
15
,
80
]. But it is critical because
end-users may have dierent explainability needs that XAI methods
should but don’t yet support.
Recently, some researchers began looking at end-users’ XAI
needs in context of specic applications [
22
,
23
,
127
]. Tonekaboni
and colleagues [
127
] placed clinicians in hypothetical scenarios
where AI models are used for health risk assessment, and found
that clinicians wanted to know what features the model uses so
they can understand and rationalize the model’s outputs. In a lab
setting, Cai and colleagues [
23
] studied clinicians’ needs in their
interaction with a prototype AI model that can assist with cancer
diagnoses, and found that clinicians desired overall information
about the model (e.g., capabilities and limitations, design objective)
in addition to explanations of the model’s individual outputs. In
another lab setting, Cai and colleagues [
22
] examined what needs
pathologists have when using a prototype AI model for retrieving
similar medical images. They also studied how pathologists use
their proposed renement tools, nding that pathologists often
re-purposed them to test and understand the underlying search
algorithm and to disambiguate AI errors from their own errors.
These studies delivered rich insights. However, they studied hy-
pothetical or prototype AI applications. Hence, an important ques-
tion remains, which we tackle in this work: What are end-users’
Understanding How Explainability Can Support Human-AI Interaction CHI ’23, April 23–28, 2023, Hamburg, Germany
XAI needs in real-world AI applications? (RQ1). Elish and Watkins
[
37
] recently provided insights into this question through an in-situ
study of a deployed, real-world AI system. Specically, they doc-
umented the types of inquiries physicians asked of nurses tasked
with monitoring Sepsis Watch [
113
], an AI system designed to pre-
dict patients’ risk of sepsis development. However, they did not
study how XAI methods could answer the physicians’ inquiries. In
this paper, we take a step further and contribute to understanding
how XAI methods can satisfy (or not satisfy) end-users’ needs by
studying: How do end-users intend to use XAI explanations? (RQ2)
and How are existing XAI approaches perceived by end-users? (RQ3).
Our work extends prior work in three more ways. First, while
all aforementioned work [
22
,
23
,
37
,
127
] studies AI applications
that make or support high-stakes medical decisions, we focus on
an ordinary application that a diverse set of people use in everyday
life. Second, while prior work does not dierentiate their partici-
pants, we study group dierences with respect to domain and AI
background levels. We are inspired by recent ndings of Ehsan and
colleagues [
33
] on how people’s perceptions of XAI explanations
diered based on their AI background. Third, we connect to the XAI
methods literature directly, by mocking-up XAI explanations in the
studied application. These in-situ mock-up explanations allowed
us to gather detailed data on how end-users perceive and intend to
use XAI explanations in their actual use of the AI.
2.3 XAI’s role in human-AI collaboration
Our work also connects to the literature of human-AI collabo-
ration [
6
,
8
,
23
,
60
,
62
,
132
], sometimes called human-AI team-
ing [
10
,
11
,
95
] or human-AI partnership [
89
], that studies how
people work together with AI to achieve shared goals. In this work,
we didn’t set out to study human-AI collaboration. Our use of this
term emerged from our ndings: while studying participants’ XAI
needs, uses, and perceptions, we found that participants described
a process for which the language of “collaboration” proved the
best t. Participants described a two-way exchange, where they
help Merlin succeed in bird identication and obtain more accurate
results in return, and expressed a strong desire to improve their
collaboration with XAI explanations and other information. Hence,
we give a brief overview of the human-AI collaboration literature
and describe how our work connects to existing work.
Prior work has studied how people collaborate with dierent
types of AI systems (e.g., robots [
38
,
61
,
90
,
91
,
93
,
130
], virtual
agents [
8
,
25
,
31
,
92
], embedded systems [
4
,
23
,
38
,
40
,
53
,
57
,
62
,
63
,
89
,
91
,
128
]) in dierent task contexts (e.g., content generation [
65
,
70
,
142
], medical diagnosis [
23
,
40
,
128
], content moderation [
53
,
62
], deception detection [
63
,
89
], cooperative games [
8
], and ne-
grained visual recognition [
38
,
57
,
90
,
91
]). Among these, our work
is most closely related to [
38
,
57
,
63
,
90
,
91
] that studied XAI’s role
in AI-assisted decision making, where AI makes a recommendation
and a human makes the nal decision. In this work, we explored
what role XAI explanations could play in Merlin where for each
bird identication, end-users make the nal call based on the app’s
output and their knowledge of birds and the app.
However, dierent from our work, [
38
,
57
,
63
,
90
,
91
] focused on
measuring the usefulness of specic XAI methods in AI-assisted
Figure 1: Screenshots of Merlin, our study application.
Merlin
is an AI-based bird identication mobile phone app. Users upload
photos on the Photo ID feature (top) or sounds on the Sound ID
feature (bottom) to get a list of birds that best match the input.
Users also share optional location and season data. The resulting
bird list comes with example photos and sounds.
decision making through lab experiments. These experiments typi-
cally consisted of simple tasks (e.g., binary choice) and were con-
ducted with participants recruited from Amazon Mechanical Turk.
Further, because they were lab experiments, it was well-dened
in advance how participants should use XAI explanations in their
collaboration with AI (e.g., look at the provided explanation and
judge whether or not to accept the AI’s output). On the other hand,
our qualitative descriptive study allowed us to nd that participants
intended to use XAI explanations for various purposes, highlighting
a broad range of XAI needs and uses that should be considered in
XAI development.
2.4 XAI methods for computer vision
Finally, we review the XAI methods literature to provide back-
ground on how we mocked up XAI explanations for Merlin. We
focus on methods developed for computer vision AI models because
Merlin uses computer vision to identify birds in user-input pho-
tos and audio recordings. See [
7
,
19
,
41
,
46
,
49
,
107
,
109
] for more
comprehensive overviews.
XAI methods can be categorized along several axes: rst, whether
a method is post-hoc or interpretable-by-design; second, whether it
provides a global or local explanation; and third, by the explanation
form. To begin, the majority of existing XAI methods are post-hoc
methods that explain certain aspects of already-trained models [
12
,
13
,
42
,
44
,
55
,
58
,
98
,
105
,
106
,
112
,
115
,
116
,
136
,
138
,
143
,
144
]. Re-
cently, more interpretable-by-design methods are being proposed;
these are typically new types of computer vision models with an
explicitly-interpretable reasoning process [
17
,
20
,
21
,
24
,
28
,
30
,
59
,
CHI ’23, April 23–28, 2023, Hamburg, Germany Kim, Watkins, Russakovsky, Fong, Monroy-Hernández
88
,
103
]. Second, XAI methods provide either a local explanation
of a model’s individual output or a global explanation of a model
and its behavior. Local, post-hoc methods include feature attribu-
tion [
42
,
98
,
112
,
115
,
116
,
138
,
143
], approximation [
106
], and sam-
ple importance [
58
,
136
] methods. Global, post-hoc methods include
methods that generate class-level explanations [
105
,
144
] and sum-
maries of what a model has learned [
12
,
13
,
44
,
55
]. Interpretable-
by-design models can provide local and/or global explanations,
depending on the model type. Lastly, explanations come in a va-
riety of forms. Representative ones are heatmaps [
17
,
43
,
98
,
112
,
115
,
116
,
134
,
138
,
143
], examples [
58
,
136
], concepts [
59
,
105
,
144
],
and prototypes [
24
,
28
,
88
,
91
]. To the best of our knowledge, these
cover the range of XAI methods for computer vision.
Since we are not aliated with the Merlin development team
and do not have access to its AI models, it was not possible to pro-
duce actual explanations of how Merlin identies birds. Hence, we
created mock-up explanations. For comprehensiveness, we mocked
up all four aforementioned explanation forms. We know they all
are plausible XAI approaches for Merlin because they have been
demonstrated on bird image classication models in prior work
(e.g., heatmaps in [
57
,
96
,
134
], examples in [
91
], concepts in [
59
,
104
,
105
], prototypes in [
24
,
28
,
88
,
91
]). See Fig. 2 and Sec. 4.2
for the mock-ups and their descriptions, and the supplementary
material for details about how we created the mock-ups.
3 STUDY APPLICATION: MERLIN BIRD
IDENTIFICATION APP
As described in Sec. 2, we looked for a research setting that in-
volves real-world AI use by end-users with a diverse domain and
AI knowledge base, and that people use in ordinary, everyday life
scenarios. Furthermore, we looked for a domain with signicant
AI and XAI research. We found Merlin [
125
] t what we were look-
ing for. Merlin is a mobile phone app (Fig. 1) with over a million
downloads that end-users, with diverse birding and AI knowledge,
use for bird identication as they go out and about outdoors. Most
birding apps are digital eld guides that don’t use AI (e.g., Audubon
Bird Guide [
87
], iBird Pro Guide [
81
], Birdadvisor 360
°
[
99
]). Merlin
is unique in that it uses computer vision AI models to identify birds
in user-input photos and audio recordings.
Merlin provided a grounded context with real end-users whose
experience we can augment with mock-ups of XAI explanations.
Furthermore, a large proportion of XAI methods for computer
vision have been developed and evaluated on bird image classica-
tion [
24
,
28
,
30
,
47
,
59
,
88
,
91
,
96
,
103
,
129
] using the Caltech-UCSD
Birds (CUB) dataset [
131
]. Hence, the feedback we collect on the
mock-up explanations for Merlin can provide concrete and imme-
diate insights to XAI researchers.
4 METHODS
In this section, we describe our study methods, all of which were
reviewed and approved by our Institutional Review Board prior to
conducting the study.
4.1 Participant recruitment and selection
We recruited participants who are end-users of Merlin’s Photo ID
and/or Sound ID, the app’s AI-based bird identication features,
Table 1: Participants’ domain (bird) and AI background.
See
Sec. 4.1 for a description of the background levels.
Low-AI Medium-AI High-AI
Low-domain P7, P12, P16 P8, P14 P11, P13
Medium-domain P2, P20 P1, P4, P10 P6
High-domain P5, P17 P3, P9, P15 P18, P19
with considerations for diversity in participants’ domain and AI
background. Concretely, we created a screening survey with ques-
tions about the respondent’s domain background, AI background,
and app usage pattern (e.g., regularly used app features, frequency
of app use). We posted the survey on a variety of channels: Bird-
ing International Discord, AI for Conservation Slack, various Slack
workspaces within our institution, and Twitter. On Twitter, in addi-
tion to posting the survey, we reached out to accounts with tweets
about Merlin via @mentions and Direct Messages.
Based on the screening survey responses, we selectively enrolled
participants to maximize the diversity of domain and AI background
of the study sample. See Tab. 1 for a summary of participants’
background. The subgroups were dened based on participants’
survey responses and interview answers. We refer to individual
participants by identier P#.
Low-domain: From “don’t know anything about birds” (P11,
P12) to “recently started birding” (P7, P8, P13, P14, P16).
Participants who selected the latter option typically have
been birding for a few months or more than a year but in an
on-and-o way, and were able to identify some local birds.
Medium-domain: Have been birding for a few years and/or
can identify most local birds (P1, P2, P4, P6, P10, P20).
High-domain: Have been birding for more than a few years
and/or do bird-related work (e.g., ornithologist) (P3, P5, P9,
P15, P17, P18, P19).
Low-AI : From “don’t know anything about AI” (P16, P17) to
“have heard about a few AI concepts or applications” (P2, P5,
P7, P12, P20). Participants in this group either didn’t know
that Merlin uses AI (P12, P16) or knew but weren’t familiar
with the technical aspects of AI (P2, P5, P7, P17, P20).
Medium-AI: From “know the basics of AI and can hold a
short conversation about it” (P1, P3, P8, P9, P14) to “have
taken a course in AI or have experience working with an
AI system” (P4, P10, P15). Participants in this group had a
general idea of how Merlin’s AI might work, e.g., it is neural
network based and has learned to identify birds based on
large amounts of labeled examples.
High-AI : Use, study, or work with AI in day-to-day life (P6,
P11, P13, P18, P19). Participants in this group were extremely
familiar with AI in general and had detailed ideas of how
Merlin’s AI might work at the level of specic data process-
ing techniques, model architectures, and training algorithms.
Note that our referral here and elsewhere to “high-AI background”
participants describes their expertise with AI in general, not neces-
sarily with Merlin’s AI. All participants were active Merlin users
who could provide vivid anecdotes of when the app worked well
摘要:

"HelpMeHelptheAI":UnderstandingHowExplainabilityCanSupportHuman-AIInteractionSunnieS.Y.KimPrincetonUniversityPrinceton,NewJersey,USAElizabethAnneWatkins∗IntelLabsSantaClara,California,USAOlgaRussakovskyPrincetonUniversityPrinceton,NewJersey,USARuthFongPrincetonUniversityPrinceton,NewJersey,USAAndrés...

展开>> 收起<<
Help Me Help the AI Understanding How Explainability Can Support Human-AI Interaction.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:2.09MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注