Help Me Help the AI Understanding How Explainability Can Support Human-AI Interaction

2025-05-06 1 0 2.09MB 17 页 10玖币

侵权投诉

"Help Me Help the AI": Understanding How Explainability Can

Support Human-AI Interaction

Sunnie S. Y. Kim

Princeton University

Princeton, New Jersey, USA

Elizabeth Anne Watkins∗

Intel Labs

Santa Clara, California, USA

Olga Russakovsky

Princeton University

Princeton, New Jersey, USA

Ruth Fong

Princeton University

Princeton, New Jersey, USA

Andrés Monroy-Hernández

Princeton University

Princeton, New Jersey, USA

ABSTRACT

Despite the proliferation of explainable AI (XAI) methods, little is

understood about end-users’ explainability needs and behaviors

around XAI explanations. To address this gap and contribute to

understanding how explainability can support human-AI interac-

tion, we conducted a mixed-methods study with 20 end-users of a

real-world AI application, the Merlin bird identication app, and in-

quired about their XAI needs, uses, and perceptions. We found that

participants desire practically useful information that can improve

their collaboration with the AI, more so than technical system de-

tails. Relatedly, participants intended to use XAI explanations for

various purposes beyond understanding the AI’s outputs: calibrat-

ing trust, improving their task skills, changing their behavior to

supply better inputs to the AI, and giving constructive feedback to

developers. Finally, among existing XAI approaches, participants

preferred part-based explanations that resemble human reasoning

and explanations. We discuss the implications of our ndings and

provide recommendations for future XAI design.

CCS CONCEPTS

•Human-centered computing →Empirical studies in HCI

;

User studies

;

•Computing methodologies →Articial intel-

ligence.

KEYWORDS

Explainable AI (XAI), Interpretability, Human-Centered XAI, Human-

AI Interaction, Human-AI Collaboration, XAI for Computer Vision,

Local Explanations

ACM Reference Format:

Sunnie S. Y. Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong,

and Andrés Monroy-Hernández. 2023. "Help Me Help the AI": Understand-

ing How Explainability Can Support Human-AI Interaction. In Proceedings

of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23),

∗

Most work was done during Postdoctoral Research Appointment at the Princeton

Center for Information Technology Policy and Human-Computer Interaction Program.

Permission to make digital or hard copies of part or all of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for third-party components of this work must be honored.

For all other uses, contact the owner/author(s).

CHI ’23, April 23–28, 2023, Hamburg, Germany

ACM ISBN 978-1-4503-9421-5/23/04.

https://doi.org/10.1145/3544548.3581001

April 23–28, 2023, Hamburg, Germany. ACM, New York, NY, USA, 17 pages.

https://doi.org/10.1145/3544548.3581001

1 INTRODUCTION

Articial Intelligence (AI) systems are ubiquitous: from unlocking

our phones with face identication, to reducing trac accidents

with autonomous cars, to assisting radiologists with medical image

analysis. Being able to better understand these AI systems is be-

coming increasingly important—although what exactly that means

is dierent in dierent settings: a smartphone user may want to

understand how best to position their face to quickly unlock their

phone, a researcher may want to understand what particular de-

sign decisions led to an autonomous car accident, and a radiologist

may want to understand where the medical decision support tool

is looking in suggesting a particular diagnosis.

Over the past years, numerous explainable AI (XAI) methods

have been developed to provide transparency into these AI sys-

tems and make them more understandable to people (see [

109

] for surveys). However, arguably these are be-

ing developed without embracing the full spectrum of end-user

needs. Particularly for computer vision AI systems (such as the

ones described above), with millions of model parameters process-

ing thousands of low-level image pixels, translating model outputs

into understandable insights is so challenging that proposed XAI

methods are frequently limited by what XAI researchers can do

rather than what AI end-users might need.

In this work, we connect XAI development with end-users and

study a real-world context in which XAI methods might be deployed.

Concretely, we set out to answer three research questions:

•RQ1

: What are end-users’ XAI

needs

in real-world AI ap-

plications?

•RQ2

: How do end-users

intend to use

XAI explanations

•RQ3

: How are existing XAI approaches

perceived

by end-

users?

In scoping our study, we focus on Merlin, an AI-based mobile

phone application that uses computer vision to identify birds in user-

uploaded photos and audio recordings. We chose Merlin because it

is a widely-used application that allows us to connect with a diverse

set of active end-users. Concretely, we conducted a mixed-methods

study with 20 Merlin users who span the range from low-to-high

AI background (representing both consumers and creators of AI

In this paper, we use the term “XAI explanations” to refer to explanations produced

by XAI methods to explain specic AI system outputs.

arXiv:2210.03735v2 [cs.HC] 16 Feb 2023

CHI ’23, April 23–28, 2023, Hamburg, Germany Kim, Watkins, Russakovsky, Fong, Monroy-Hernández

systems) and low-to-high domain background (representing both

users who know less and more about birding than the AI system).

With each participant, we conducted an hour-long interview,

which included a survey and an interactive feedback session, to un-

derstand their XAI needs, uses, and perceptions. Our study bridges

the gap between XAI research done in the HCI and AI communities

by directly connecting end-users of a real-world AI application

with the XAI methods literature. We do so by mocking up four

XAI approaches that could be potentially implemented into Merlin,

i.e., heatmap, example, concept, and prototype-based explanations

of the AI’s outputs. The mock-up explanations enabled us to get

concrete and detailed data about how participants intended to use

XAI explanations, as well as how they perceived each approach, in

an actual AI use context.

Through our study, we found:

•

Participants’ XAI needs varied depending on their domain/AI

background and interest level. While participants were gen-

erally curious about AI system details, those with high-AI

background or notably high interest in birds had higher XAI

needs. However, participants unanimously expressed a need

for practically useful information that can improve their col-

laboration with the AI, suggesting an important area of focus

for future XAI development (RQ1, Sec. 5.1).

•

Participants intended to use XAI explanations for various

purposes beyond understanding the AI’s outputs: determin-

ing when to trust the AI, learning to perform the task better

on their own without needing to consult the AI, changing

their behavior to supply better inputs to the AI, and giving

constructive feedback to the developers to improve the AI.

This highlights the broad range of XAI needs that should be

considered in XAI development (RQ2, Sec. 5.2).

•

Among existing XAI approaches, participants preferred part-

based explanations, i.e., concept [

105

144

] and prototype [

] based explanations. Participants found them similar to

human reasoning and explanations, and the most useful

for the aforementioned purposes. This suggests that to the

extent possible, the XAI community should pay particular

attention to these methods, despite the challenges with their

development and evaluation (RQ3, Sec. 5.3).

Following our ndings, we discuss XAI’s potential as a medium

for enhancing human-AI collaboration, and conclude with a set of

recommendations for future XAI design. However, as with any case

study, our ndings and recommendations may have limited gener-

alizability. This is an intentional trade-o made to gain an in-depth

understanding of end-users’ XAI needs, uses, and perceptions in a

real-world context, in line with growing calls for human-centered

XAI research [

–

]. We are hopeful that our study design

and insights will aid future XAI research in other contexts.

2 RELATED WORK

2.1 From algorithm-centered to

human-centered XAI

With the growing adoption of AI, there has been a surge of interest

in explainable AI (XAI) research that aims to make AI systems

more understandable to people. XAI is one of the fastest growing

elds with hundreds of new papers published each year. See [

107

109

110

121

] for in-depth surveys,

and the following for examples of XAI research done in dierent

disciplines: AI [

118

], HCI [

120

133

139

], social

and cognitive science [

122

124

], and philosophy [

]. XAI is also increasingly being researched and applied in

various domains, including but not limited to healthcare [

100

117

119

135

141

], autonomous driving [

], energy and

power systems [72], and climate science [73].

Much of the eld’s eorts originally focused on the algorithms,

i.e., on providing explanations of AI systems’ inner workings and

outputs, rather than the people or the context where these sys-

tems are deployed. Recently, there has been a growing recognition

that XAI methods cannot be developed “in a vacuum” without an

understanding of people’s needs in specic contexts [

–

]. In response, researchers have proposed conceptual frameworks

to characterize XAI needs based on people’s roles [

102

126

expertise [

], or more ne-grained axes of knowledge and objec-

tives [

123

]. Others interviewed industry practitioners who work on

AI products to identify their common XAI needs [15, 52, 66].

We join this relatively new line of research, called “human-

centered XAI” [

–

], and foreground the people who use

AI systems and their needs, goals, and contexts in understanding

how explainability can support human-AI interaction. In doing so,

we build on the aforementioned frameworks to study end-users’

explainability needs. Concretely, we developed a survey based on

Liao and colleagues’ XAI Question Bank [

] to collect concrete

data on which aspects of AI end-users want to know about.

2.2 Understanding end-users’ XAI needs

Although human-centered XAI is an actively growing area of re-

search, much of the work still focuses on developers rather than

end-users of AI systems [

]. This gap is unsurprising, since

XAI methods have been primarily developed for and used by de-

velopers to inspect AI systems [

]. But it is critical because

end-users may have dierent explainability needs that XAI methods

should but don’t yet support.

Recently, some researchers began looking at end-users’ XAI

needs in context of specic applications [

127

]. Tonekaboni

and colleagues [

127

] placed clinicians in hypothetical scenarios

where AI models are used for health risk assessment, and found

that clinicians wanted to know what features the model uses so

they can understand and rationalize the model’s outputs. In a lab

setting, Cai and colleagues [

] studied clinicians’ needs in their

interaction with a prototype AI model that can assist with cancer

diagnoses, and found that clinicians desired overall information

about the model (e.g., capabilities and limitations, design objective)

in addition to explanations of the model’s individual outputs. In

another lab setting, Cai and colleagues [

] examined what needs

pathologists have when using a prototype AI model for retrieving

similar medical images. They also studied how pathologists use

their proposed renement tools, nding that pathologists often

re-purposed them to test and understand the underlying search

algorithm and to disambiguate AI errors from their own errors.

These studies delivered rich insights. However, they studied hy-

pothetical or prototype AI applications. Hence, an important ques-

tion remains, which we tackle in this work: What are end-users’

Understanding How Explainability Can Support Human-AI Interaction CHI ’23, April 23–28, 2023, Hamburg, Germany

XAI needs in real-world AI applications? (RQ1). Elish and Watkins

[

] recently provided insights into this question through an in-situ

study of a deployed, real-world AI system. Specically, they doc-

umented the types of inquiries physicians asked of nurses tasked

with monitoring Sepsis Watch [

113

], an AI system designed to pre-

dict patients’ risk of sepsis development. However, they did not

study how XAI methods could answer the physicians’ inquiries. In

this paper, we take a step further and contribute to understanding

how XAI methods can satisfy (or not satisfy) end-users’ needs by

studying: How do end-users intend to use XAI explanations? (RQ2)

and How are existing XAI approaches perceived by end-users? (RQ3).

Our work extends prior work in three more ways. First, while

all aforementioned work [

127

] studies AI applications

that make or support high-stakes medical decisions, we focus on

an ordinary application that a diverse set of people use in everyday

life. Second, while prior work does not dierentiate their partici-

pants, we study group dierences with respect to domain and AI

background levels. We are inspired by recent ndings of Ehsan and

colleagues [

] on how people’s perceptions of XAI explanations

diered based on their AI background. Third, we connect to the XAI

methods literature directly, by mocking-up XAI explanations in the

studied application. These in-situ mock-up explanations allowed

us to gather detailed data on how end-users perceive and intend to

use XAI explanations in their actual use of the AI.

2.3 XAI’s role in human-AI collaboration

Our work also connects to the literature of human-AI collabo-

ration [

132

], sometimes called human-AI team-

ing [

] or human-AI partnership [

], that studies how

people work together with AI to achieve shared goals. In this work,

we didn’t set out to study human-AI collaboration. Our use of this

term emerged from our ndings: while studying participants’ XAI

needs, uses, and perceptions, we found that participants described

a process for which the language of “collaboration” proved the

best t. Participants described a two-way exchange, where they

help Merlin succeed in bird identication and obtain more accurate

results in return, and expressed a strong desire to improve their

collaboration with XAI explanations and other information. Hence,

we give a brief overview of the human-AI collaboration literature

and describe how our work connects to existing work.

Prior work has studied how people collaborate with dierent

types of AI systems (e.g., robots [

130

], virtual

agents [

], embedded systems [

128

]) in dierent task contexts (e.g., content generation [

142

], medical diagnosis [

128

], content moderation [

], deception detection [

], cooperative games [

], and ne-

grained visual recognition [

]). Among these, our work

is most closely related to [

] that studied XAI’s role

in AI-assisted decision making, where AI makes a recommendation

and a human makes the nal decision. In this work, we explored

what role XAI explanations could play in Merlin where for each

bird identication, end-users make the nal call based on the app’s

output and their knowledge of birds and the app.

However, dierent from our work, [

] focused on

measuring the usefulness of specic XAI methods in AI-assisted

Figure 1: Screenshots of Merlin, our study application.

Merlin

is an AI-based bird identication mobile phone app. Users upload

photos on the Photo ID feature (top) or sounds on the Sound ID

feature (bottom) to get a list of birds that best match the input.

Users also share optional location and season data. The resulting

bird list comes with example photos and sounds.

decision making through lab experiments. These experiments typi-

cally consisted of simple tasks (e.g., binary choice) and were con-

ducted with participants recruited from Amazon Mechanical Turk.

Further, because they were lab experiments, it was well-dened

in advance how participants should use XAI explanations in their

collaboration with AI (e.g., look at the provided explanation and

judge whether or not to accept the AI’s output). On the other hand,

our qualitative descriptive study allowed us to nd that participants

intended to use XAI explanations for various purposes, highlighting

a broad range of XAI needs and uses that should be considered in

XAI development.

2.4 XAI methods for computer vision

Finally, we review the XAI methods literature to provide back-

ground on how we mocked up XAI explanations for Merlin. We

focus on methods developed for computer vision AI models because

Merlin uses computer vision to identify birds in user-input pho-

tos and audio recordings. See [

107

109

] for more

comprehensive overviews.

XAI methods can be categorized along several axes: rst, whether

a method is post-hoc or interpretable-by-design; second, whether it

provides a global or local explanation; and third, by the explanation

form. To begin, the majority of existing XAI methods are post-hoc

methods that explain certain aspects of already-trained models [

105

106

112

115

116

136

138

143

144

]. Re-

cently, more interpretable-by-design methods are being proposed;

these are typically new types of computer vision models with an

explicitly-interpretable reasoning process [

CHI ’23, April 23–28, 2023, Hamburg, Germany Kim, Watkins, Russakovsky, Fong, Monroy-Hernández

103

]. Second, XAI methods provide either a local explanation

of a model’s individual output or a global explanation of a model

and its behavior. Local, post-hoc methods include feature attribu-

tion [

112

115

116

138

143

], approximation [

106

], and sam-

ple importance [

136

] methods. Global, post-hoc methods include

methods that generate class-level explanations [

105

144

] and sum-

maries of what a model has learned [

]. Interpretable-

by-design models can provide local and/or global explanations,

depending on the model type. Lastly, explanations come in a va-

riety of forms. Representative ones are heatmaps [

112

115

116

134

138

143

], examples [

136

], concepts [

105

144

and prototypes [

]. To the best of our knowledge, these

cover the range of XAI methods for computer vision.

Since we are not aliated with the Merlin development team

and do not have access to its AI models, it was not possible to pro-

duce actual explanations of how Merlin identies birds. Hence, we

created mock-up explanations. For comprehensiveness, we mocked

up all four aforementioned explanation forms. We know they all

are plausible XAI approaches for Merlin because they have been

demonstrated on bird image classication models in prior work

(e.g., heatmaps in [

134

], examples in [

], concepts in [

104

105

], prototypes in [

]). See Fig. 2 and Sec. 4.2

for the mock-ups and their descriptions, and the supplementary

material for details about how we created the mock-ups.

3 STUDY APPLICATION: MERLIN BIRD

IDENTIFICATION APP

As described in Sec. 2, we looked for a research setting that in-

volves real-world AI use by end-users with a diverse domain and

AI knowledge base, and that people use in ordinary, everyday life

scenarios. Furthermore, we looked for a domain with signicant

AI and XAI research. We found Merlin [

125

] t what we were look-

ing for. Merlin is a mobile phone app (Fig. 1) with over a million

downloads that end-users, with diverse birding and AI knowledge,

use for bird identication as they go out and about outdoors. Most

birding apps are digital eld guides that don’t use AI (e.g., Audubon

Bird Guide [

], iBird Pro Guide [

], Birdadvisor 360

[

]). Merlin

is unique in that it uses computer vision AI models to identify birds

in user-input photos and audio recordings.

Merlin provided a grounded context with real end-users whose

experience we can augment with mock-ups of XAI explanations.

Furthermore, a large proportion of XAI methods for computer

vision have been developed and evaluated on bird image classica-

tion [

103

129

] using the Caltech-UCSD

Birds (CUB) dataset [

131

]. Hence, the feedback we collect on the

mock-up explanations for Merlin can provide concrete and imme-

diate insights to XAI researchers.

4 METHODS

In this section, we describe our study methods, all of which were

reviewed and approved by our Institutional Review Board prior to

conducting the study.

4.1 Participant recruitment and selection

We recruited participants who are end-users of Merlin’s Photo ID

and/or Sound ID, the app’s AI-based bird identication features,

Table 1: Participants’ domain (bird) and AI background.

See

Sec. 4.1 for a description of the background levels.

Low-AI Medium-AI High-AI

Low-domain P7, P12, P16 P8, P14 P11, P13

Medium-domain P2, P20 P1, P4, P10 P6

High-domain P5, P17 P3, P9, P15 P18, P19

with considerations for diversity in participants’ domain and AI

background. Concretely, we created a screening survey with ques-

tions about the respondent’s domain background, AI background,

and app usage pattern (e.g., regularly used app features, frequency

of app use). We posted the survey on a variety of channels: Bird-

ing International Discord, AI for Conservation Slack, various Slack

workspaces within our institution, and Twitter. On Twitter, in addi-

tion to posting the survey, we reached out to accounts with tweets

about Merlin via @mentions and Direct Messages.

Based on the screening survey responses, we selectively enrolled

participants to maximize the diversity of domain and AI background

of the study sample. See Tab. 1 for a summary of participants’

background. The subgroups were dened based on participants’

survey responses and interview answers. We refer to individual

participants by identier P#.

•

Low-domain: From “don’t know anything about birds” (P11,

P12) to “recently started birding” (P7, P8, P13, P14, P16).

Participants who selected the latter option typically have

been birding for a few months or more than a year but in an

on-and-o way, and were able to identify some local birds.

•

Medium-domain: Have been birding for a few years and/or

can identify most local birds (P1, P2, P4, P6, P10, P20).

•

High-domain: Have been birding for more than a few years

and/or do bird-related work (e.g., ornithologist) (P3, P5, P9,

P15, P17, P18, P19).

•

Low-AI : From “don’t know anything about AI” (P16, P17) to

“have heard about a few AI concepts or applications” (P2, P5,

P7, P12, P20). Participants in this group either didn’t know

that Merlin uses AI (P12, P16) or knew but weren’t familiar

with the technical aspects of AI (P2, P5, P7, P17, P20).

•

Medium-AI: From “know the basics of AI and can hold a

short conversation about it” (P1, P3, P8, P9, P14) to “have

taken a course in AI or have experience working with an

AI system” (P4, P10, P15). Participants in this group had a

general idea of how Merlin’s AI might work, e.g., it is neural

network based and has learned to identify birds based on

large amounts of labeled examples.

•

High-AI : Use, study, or work with AI in day-to-day life (P6,

P11, P13, P18, P19). Participants in this group were extremely

familiar with AI in general and had detailed ideas of how

Merlin’s AI might work at the level of specic data process-

ing techniques, model architectures, and training algorithms.

Note that our referral here and elsewhere to “high-AI background”

participants describes their expertise with AI in general, not neces-

sarily with Merlin’s AI. All participants were active Merlin users

who could provide vivid anecdotes of when the app worked well

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

"HelpMeHelptheAI":UnderstandingHowExplainabilityCanSupportHuman-AIInteractionSunnieS.Y.KimPrincetonUniversityPrinceton,NewJersey,USAElizabethAnneWatkins∗IntelLabsSantaClara,California,USAOlgaRussakovskyPrincetonUniversityPrinceton,NewJersey,USARuthFongPrincetonUniversityPrinceton,NewJersey,USAAndrés...

展开>> 收起<<

Help Me Help the AI Understanding How Explainability Can Support Human-AI Interaction.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Help Me Help the AI Understanding How Explainability Can Support Human-AI Interaction

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: