Understanding Practices Challenges and Opportunities for User-Engaged Algorithm Auditing in Industry Practice

2025-05-06 0 0 1.72MB 18 页 10玖币
侵权投诉
Understanding Practices, Challenges, and Opportunities for
User-Engaged Algorithm Auditing in Industry Practice
Wesley Hanwen Deng
hanwend@cs.cmu.edu
Carnegie Mellon University
Pittsburgh, PA, USA
Bill Boyuan Guo
boyuang@andrew.cmu.edu
Carnegie Mellon University
Pittsburgh, PA, USA
Alicia DeVrio
adevos@cs.cmu.edu
Carnegie Mellon University
Pittsburgh, PA, USA
Hong Shen
hongs@cs.cmu.edu
Carnegie Mellon University
Pittsburgh, PA, USA
Motahhare Eslami
meslami@cs.cmu.edu
Carnegie Mellon University
Pittsburgh, PA, USA
Kenneth Holstein
kjholste@cs.cmu.edu
Carnegie Mellon University
Pittsburgh, PA, USA
ABSTRACT
Recent years have seen growing interest among both researchers
and practitioners in user-engaged approaches to algorithm auditing,
which directly engage users in detecting problematic behaviors in
algorithmic systems. However, we know little about industry prac-
titioners’ current practices and challenges around user-engaged
auditing, nor what opportunities exist for them to better leverage
such approaches in practice. To investigate, we conducted a series
of interviews and iterative co-design activities with practitioners
who employ user-engaged auditing approaches in their work. Our
ndings reveal several challenges practitioners face in appropri-
ately recruiting and incentivizing user auditors, scaolding user
audits, and deriving actionable insights from user-engaged audit
reports. Furthermore, practitioners shared organizational obstacles
to user-engaged auditing, surfacing a complex relationship between
practitioners and user auditors. Based on these ndings, we discuss
opportunities for future HCI research to help realize the potential
(and mitigate risks) of user-engaged auditing in industry practice.
KEYWORDS
user-engaged algorithm auditing, responsible AI, industry practi-
tioners, fairness, bias
ACM Reference Format:
Wesley Hanwen Deng, Bill Boyuan Guo, Alicia DeVrio, Hong Shen, Motah-
hare Eslami, and Kenneth Holstein. 2023. Understanding Practices, Chal-
lenges, and Opportunities for User-Engaged Algorithm Auditing in Industry
Practice. In CHI ’23: ACM Conference on Human Factors in Computing Sys-
tems, April 23–38, 2023, Hamburg, Germany. ACM, New York, NY, USA,
18 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
Both authors contributed equally to this research.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
CHI ’23, April 23–28, 2023, Hamburg, Germany
©2023 Association for Computing Machinery.
ACM ISBN 978-1-4503-XXXX-X/18/06. . .$15.00
https://doi.org/10.1145/nnnnnnn.nnnnnnn
1 INTRODUCTION
In recent years, algorithm audits have risen to prominence as an
approach to uncover biased, discriminatory, or otherwise harmful
behaviors in algorithmic systems [
16
,
22
,
29
,
31
,
46
,
60
,
68
,
77
,
86
,
95
].
Today, algorithm audits are typically conducted by small groups
of experts such as industry practitioners, researchers, and activists
[
60
,
77
]. Although expert-led approaches have been highly im-
pactful, they often suer from major blindspots, and fail to detect
critical issues. For example, expert-led audits can fail when those
conducting the audit lack the relevant cultural knowledge and lived
experience to recognize and know where to look for certain kinds
of harmful algorithmic behaviors [25, 42, 80, 92].
To overcome limitations of current algorithm auditing tech-
niques, researchers in HCI and AI have begun to explore the po-
tential of more user-engaged approaches to algorithm auditing,
which directly engage users of AI products and services in surfac-
ing harmful algorithmic behaviors. Recent years have seen many
cases in which users organically came together to uncover and
raise awareness about harmful behaviors in algorithmic systems
they use day-to-day, which had eluded detection by industry teams
or other expert auditors [
80
]. Inspired by these observations, re-
searchers have begun to explore the design of systems that can
leverage the power of everyday users and crowds to surface harm-
ful algorithmic behaviors that might otherwise go undetected (e.g.,
[
8
,
17
,
48
,
53
,
63
,
64
]). The designs of existing research systems
span a spectrum of user engagement, from more practitioner-led
approaches—such as crowdsourcing workows in which users’ test-
ing and auditing activities are more heavily guided and constrained
by requesters—to more user-led approaches in which users take
greater initiative in directing their own activities.
In parallel to these research eorts, several major technology
companies have begun to experiment with approaches that engage
users in auditing their AI products and services for problematic
behaviors. For example, in 2021 Twitter introduced its rst “al-
gorithmic bias bounty” challenge to engage users in identifying
harmful biases in its image cropping algorithm [
21
]. In another
eort, Google launched the “AI Test Kitchen,” a web-based applica-
tion that invites users to experiment with some of Google’s latest
AI-based conversational agents, and to report any problematic be-
haviors they encounter [
90
]. More recently, inspired by Twitter’s
“algorithmic bias bounty,” OpenAI initiated a “Feedback Contest”
arXiv:2210.03709v4 [cs.HC] 21 Feb 2023
CHI ’23, April 23–28, 2023, Hamburg, Germany Wesley Deng et al.
to encourage users to “provide feedback on problematic model
outputs” during their interactions with ChatGPT chatbot [4].
Despite growing interest from industry, there remains a gulf
between the academic research literature on user-engaged auditing
and current industry practice. In particular, we still know little
about industry AI practitioners’ current practices and challenges
around user-engaged auditing, and what opportunities exist for
them to better leverage such approaches in practice. To investigate,
in this paper we explore the following research questions:
RQ1
What are AI practitioners’ current motivations and practices
around user engagement in auditing their AI products and
services for problematic algorithmic behaviors?
RQ2
What opportunities and challenges do practitioners envi-
sion for user-engaged approaches to better support their
algorithm auditing eorts?
We conducted a two-stage study with 12 industry practitioners
from 9 technology companies, all of whom have experimented with
engaging users in auditing their AI systems for problematic algo-
rithmic behaviors. We rst conducted semi-structured interviews to
understand practitioners’ current practices and challenges around
engaging users in AI testing and auditing. We then conducted co-
design activities, working with practitioners to iteratively co-design
three design artifacts as a way to further probe challenges perceived
by practitioners and opportunities to better support user-engaged
approaches to algorithm auditing in industry practice.
Overall, our participants shared three major motivations for
engaging users in AI testing and auditing: understanding users’
subjective experiences of problematic machine behaviors, over-
coming their teams’ blindspots when auditing their products and
services, and gathering evidence from users to help them advocate
for fairness work within their organizations. Participants shared
prior experiences engaging users on dierent scales, from individ-
ual user study sessions, to focus group workshops, to large-scale
user feedback and crowdsourcing activities. However, in doing so,
practitioners encountered various challenges in engaging users
eectively. For instance, practitioners discussed challenges they
faced in recruiting and incentivizing the “right” group of auditors
for a given task, with relevant identities and lived experiences. Par-
ticipants also discussed the diculties in scaolding users towards
productive auditing strategies, without biasing them to simply repli-
cate industry teams’ own blindspots. Finally, practitioners discussed
the challenges of quantication when deriving actionable insights
from user-engaged auditing reports: relying upon the majority vote
runs the risk of masking the very biases an audit is intended to
uncover. In addition, participants shared broader organizational
obstacles to user-engaged auditing, highlighting key tensions that
arise in practice when involving users in algorithm auditing ef-
forts such as potential PR risks, prot motives that work against
protecting marginalized groups, and privacy and legal concerns.
As private companies increasingly experiment with user-
engaged approaches to algorithm auditing, HCI research has a
critical role to play in shaping more eective and responsible prac-
tices. To this end, this work contributes:
An in-depth understanding of industry practitioners’ mo-
tivations, current practices, and challenges in eectively
engaging users in testing and auditing AI products and ser-
vices. Our ndings shed light on the types of problems prac-
titioners aim to address through user engagement around
algorithm auditing, as well as the the ways practitioners
navigate organizational tensions around user involvement
in AI development processes.
A set of design implications for user-engaged algorithm au-
diting, beyond standard considerations for human computa-
tion or user feedback systems.
Insights into the complex relationship between user auditors
and industry practitioners working on responsible AI, sug-
gesting opportunities for future HCI research to help realize
the potential (and mitigate risks) of user-engaged auditing
in industry practice.
2 RELATED WORK
2.1 Understanding and supporting responsible
AI practices in industry contexts
In recent years, signicant eort has been directed towards the
development of approaches, guidelines, and tools to help indus-
try practitioners audit their AI products and services for un-
fair, biased, or otherwise harmful algorithmic behaviors (e.g.,
[
3
,
11
,
13
,
14
,
61
,
71
,
72
]). Early work in this area has largely
been guided by advances in academic research on AI fairness
[
1
,
6
,
10
,
32
,
52
,
67
]. Yet in a series of interview studies and surveys
with industry AI practitioners, Holstein et al. [
42
] found that there
were major disconnects between the tools oered by the research
community, versus the actual on-the-ground needs of industry AI
practitioners. To address such gaps, a growing line of research in
HCI has focused on better understanding industry AI practitioners
needs and designing to support responsible AI practices in industry.
For example, studies from Madaio et al. [
56
] and Rakova et al. [
70
]
investigated the organizational challenges and barriers that practi-
tioners face in practice when attempting to build more responsible
AI systems.
Meanwhile, to better support responsible AI practices, companies
have been developing responsible AI guidelines such as People + AI
guidebook [
72
], trustworthy AI principles [
87
], AI fairness check-
lists [
71
], and responsible AI toolkits such as AI Explainability 360
[
3
] and Fairlearn [
13
]. However, recent HCI research has surfaced
gaps between fairness toolkits’ capabilities and practitioners’ needs
[
24
,
54
,
54
,
73
]. For example, Kaur et al. [
47
] found that AI practi-
tioners often over-trust and misuse AI explainability toolkits. Other
work from Lee et al. and Deng et al. identied misalignment be-
tween the designs of existing fairness toolkits versus practitioners’
actual desires and usage of these toolkits [
24
,
54
,
73
]. In interviews
with AI practitioners, these authors found that, beyond the func-
tionality provided by current toolkits, practitioners desired tools
that could help them bring in perspectives from relevant domain
experts and users, in order to aid them in auditing their AI systems
[
24
]. In the next sections, we discuss emerging work that aims to
harness the power of users in algorithm auditing.
2.2 The power of users in algorithm auditing
Metaxa et al. [
60
] dene an algorithm audit as “a method of re-
peatedly querying an algorithm and observing its output to draw
User-Engaged Algorithm Auditing in Industry Practice CHI ’23, April 23–28, 2023, Hamburg, Germany
conclusions about the algorithm’s opaque inner workings and pos-
sible external impact.” A growing body of work in HCI, AI, and
related communities has developed tools and processes to audit al-
gorithmic systems for biased, discriminatory, or otherwise harmful
behaviors (e.g., [
16
,
60
,
77
]). Past work in algorithm auditing has
uncovered harmful algorithmic behaviors across a wide range of
algorithmic systems, from search engines to hiring algorithms to
computer vision applications [7, 16, 37, 62, 66, 86].
Today, algorithm audits are typically conducted by small groups
of experts such as industry practitioners, researchers, activists, and
government agencies [
60
]. However, such expert-led audits often
fail to surface serious issues that everyday users of algorithmic
systems are quickly able to detect once a system is deployed in the
real world [
42
,
80
]. For instance, this approach can fail when those
conducting the audit lack the relevant cultural knowledge and lived
experience to recognize and know where to look for certain kinds
of harmful algorithmic behaviors [
25
,
42
,
80
,
92
]. In addition, expert-
led audits may fail to detect certain harmful algorithmic behaviors
because these behaviors only arise—or are only recognized as harm-
ful—when a system is used in particular context or in particular
ways, which auditors may fail to anticipate [22, 30, 34, 42, 79, 80].
Recent years have seen many real-world cases in which users
have uncovered and raised awareness around harmful algorith-
mic behaviors in systems they use day-to-day (e.g., search engines
[
16
], online rating/review systems [
29
,
86
], and machine translation
systems [
66
]) although expert auditors had failed to detect these
issues. Shen et al. [
80
] developed the concept of “everyday algo-
rithm auditing” to describe how everyday users detect, understand,
and interrogate problematic machine behaviors via their daily in-
teractions with algorithmic systems. In the cases these authors
reviewed, regular users of a wide range of algorithmic systems and
platforms came together organically to hypothesize and test for
potential biases. More recently, DeVos et al. [
25
] conducted a series
of behavioral studies to better understand how users are often able
to be so eective, both individually and collectively, in surfacing
harmful algorithmic behaviors that more formal or expert-led audit-
ing approaches fail to detect. As discussed next, recent research is
beginning to explore ways to harness the power users in algorithm
auditing to overcome limitations of expert-led approaches.
2.3 Supporting user-engaged algorithm
auditing
Recognizing the power of users in algorithm auditing, researchers
have begun to explore the design of systems to support more user-
engaged approaches [25, 53] to algorithm auditing, which directly
engage users in surfacing harmful algorithmic behaviors that might
otherwise go undetected.
A line of work has developed interfaces, interactive visual-
izations, and crowdsourcing pipelines to support people in ac-
tively searching for algorithmic biases and harmful behaviors
[
8
,
17
,
48
,
63
]. The designs of these research systems span a spec-
trum of user-engagement, from more practitioner-led approaches
to more user-led approaches in which users take greater initiative
and control in directing their eorts. For example, Ochigame and
Ye developed a web-based tool called Search Atlas, which enables
users to easily conduct side-by-side comparisons of the Google
search results they might see if they were located in dierent coun-
tries to spot [
64
]. Kiela et al. developed a general research platform
called Dynabench, which invites users to try to identify erroneous
and potentially harmful behaviors in AI models [
48
]. Using Dyn-
abench, users can generate test inputs to a model to try to nd
problematic behaviors, ag behaviors they identify, and provide
brief open-text responses if they wish to oer additional context.
More recently, Lam et al. developed a tool called “IndieLabel,” in
order to empower end users to detect and ag potential algorith-
mic biases and then author audit reports to communicate these to
relevant decision-makers [53].
In parallel, several major technology companies have begun to
experiment with approaches that engage users in auditing their AI
products and services for problematic behaviors. For example, in
2021 Twitter introduced its rst “algorithmic bias bounty” challenge
to engage users in identifying harmful biases in its image cropping
algorithm [
21
]. In another eort, Meta adopted the Dynabench
platform described above, to discover potentially harmful behav-
iors in natural language processing models [
48
]. More recently,
Google launched the “AI Test Kitchen,” a web-based application
that invites users to experiment with Google’s latest LLMs-powered
conversational agents, and to report any problematic behaviors
they encounter, with the stated goal of engaging users in “learn-
ing, improving, and innovating responsibly on AI together” [
90
]. In
addition, organizations like OpenAI and HuggingFace are begin-
ning to include built-in interface features that invite users to report
harmful algorithmic behaviors they encounter while interacting
with LLM-powered applications like text-to-image generation tools.
HuggingFace developed features to engage end users in agging
ethical/legal issues on their API [
65
]. In addition, OpenAI initiated
a feedback contest around their LLM-based tool ChatGPT, with
the goal of encouraging users to “provide feedback on problematic
model outputs” [4].
Despite growing interest in both academia and industry, there
remains a gulf between the academic research literature on user-
engaged auditing and current industry practice. To date, little is
known about industry AI practitioners’ current practices and chal-
lenges around user-engaged auditing, nor what opportunities exist
for them to better leverage such approaches in practice. In this
paper, we take a rst step towards understanding current practices,
challenges, and design opportunities for user-engaged approaches
to algorithm auditing in industry practice.
3 METHOD
3.1 Study design
We conducted a two-stage study involving semi-structured inter-
views followed by iterative co-design activities. We rst conducted
semi-structured interviews to understand participants’ current prac-
tices and challenges around engaging users in AI testing and au-
diting. In the next stage, we engaged participants in a co-design
activity to further probe the opportunities and challenges in sup-
porting user-engaged algorithm auditing in industry practice. We
worked with participants to iteratively design three artifacts: a
user-engaged audit report
, representing a “wish list” of types
of information that they would ideally want to solicit through a
user-engaged auditing approach, and
two user-engaged auditing
CHI ’23, April 23–28, 2023, Hamburg, Germany Wesley Deng et al.
Figure 1: Two potential user-engaged auditing pipeline designs that were iteratively co-designed with participants. The left
image shows the “developer-led” pipeline design, and the right image shows the “user-led” pipeline design. Each gure illus-
trates a possible interaction ow between user auditors and AI product teams, showing how auditing tasks are created, how
background information on user auditors is shared, how user auditing reports are generated based on auditors’ ndings, and
how these reports are shared with AI product teams. During the co-design activity, participants could zoom in, annotate, and
modify the details. We used these pipeline owcharts as probes, not as nal products, to investigate more deeply on practi-
tioners’ challenges and desires.
pipelines
, building upon initial designs informed by interview nd-
ings and insights from prior literature [
25
,
60
,
77
,
80
]. Throughout
the study, we iterated on these design artifacts based on feedback
and design ideas from prior participants. We used these artifacts
and the process of co-designing them to provoke deeper conversa-
tions around participants’ desires, as well as potential risks they
anticipate, for new systems that support user-engaged auditing.
Following an iterative co-design process similar to prior work (cf.
[
41
,
57
]), for our rst ve participants, we ran the two stages of our
study in separate sessions in order to better inform the design of the
initial versions of the artifacts based on the needs and desires these
participants expressed in the rst set of interviews. However, we
soon encountered diculty in retaining industry participants due to
their busy schedules (e.g., one participant was not able to return to
complete the co-design). Therefore, after our rst ve participants,
we ran both stages in a single session. We then continued to iterate
on the artifacts during the study sessions themselves. Below, we
describe each of these activities in more detail.1
3.1.1 Stage one: Semi-structured interviews. To understand practi-
tioners’ current practices around engaging users in AI testing and
auditing, we conducted semi-structured interviews, each lasting up
to an hour. We adopted a directed storytelling approach [
33
]. We
rst asked participants to describe their team’s prior experiences in
trying to detect or address biased or harmful behaviors in their AI
products or services, with a specic focus on whether, why, and how
they engaged users in the process. For example, we asked “Could
you describe how your team attempted to engage users in auditing
the AI products and services you mentioned” and “What motivated
you or your team to engage users in this way?” Through follow-up
1
We also provide our interview and co-design protocol in the supplementary material.
questions, we probed deeper into challenges participants had en-
countered when attempting to engage users in the auditing process.
As participants shared specic challenges they had encountered,
we also invited them to share ideas for potential solutions to these
challenges. For example, in response to specic challenges raised
by participants, we asked “How did your team attempt to tackle these
challenges?” and “How eective were your team’s approaches?”
3.1.2 Stage two: Iterative co-design activities. To further envision
future opportunities and solicit potential challenges and risks for
user-engaged algorithm auditing approaches, following the inter-
views, we then involved participants in a series of co-design ac-
tivities, following an iterative co-design process similar to prior
work (cf. [
41
,
57
]). This stage of the study lasted up to 45 minutes,
and involved participants in co-design around three design arti-
facts: a
user-engaged audit report
and two
user-engaged audit
pipeline owcharts
. We rst designed initial versions of these
artifacts based on participant needs and desires expressed during
stage one, as well as prior research on user-engaged algorithm au-
diting [
25
,
80
]. We then iterated on their designs with practitioners
throughout the co-design activities. We note that these design arti-
facts were not the goal of our study, but rather served as tools to
probe more deeply on practitioners’ challenges and desires. Below,
we describe the process of co-designing these three artifacts, and
how we used this process to probe on future opportunities and
risks of user-engaged audits.
User-engaged audit report
: We invited each participant to
contribute to the design of a report that they would ideally like
to see as the output of a user-engaged auditing process. We rst
asked participants open-ended questions such as “What information
would you ideally want the service to report back to your team?” and
encouraged them to sketch as they generated new ideas. To help
participants come up with ideas, we presented participants with
摘要:

UnderstandingPractices,Challenges,andOpportunitiesforUser-EngagedAlgorithmAuditinginIndustryPracticeWesleyHanwenDenghanwend@cs.cmu.eduCarnegieMellonUniversityPittsburgh,PA,USABillBoyuanGuoboyuang@andrew.cmu.eduCarnegieMellonUniversityPittsburgh,PA,USAAliciaDeVrioadevos@cs.cmu.eduCarnegieMellonUniver...

展开>> 收起<<
Understanding Practices Challenges and Opportunities for User-Engaged Algorithm Auditing in Industry Practice.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:18 页 大小:1.72MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注