Understanding Practices Challenges and Opportunities for User-Engaged Algorithm Auditing in Industry Practice

2025-05-06 0 0 1.72MB 18 页 10玖币

侵权投诉

Understanding Practices, Challenges, and Opportunities for

User-Engaged Algorithm Auditing in Industry Practice

Wesley Hanwen Deng

hanwend@cs.cmu.edu

Carnegie Mellon University

Pittsburgh, PA, USA

Bill Boyuan Guo

boyuang@andrew.cmu.edu

Carnegie Mellon University

Pittsburgh, PA, USA

Alicia DeVrio

adevos@cs.cmu.edu

Carnegie Mellon University

Pittsburgh, PA, USA

Hong Shen

hongs@cs.cmu.edu

Carnegie Mellon University

Pittsburgh, PA, USA

Motahhare Eslami∗

meslami@cs.cmu.edu

Carnegie Mellon University

Pittsburgh, PA, USA

Kenneth Holstein∗

kjholste@cs.cmu.edu

Carnegie Mellon University

Pittsburgh, PA, USA

ABSTRACT

Recent years have seen growing interest among both researchers

and practitioners in user-engaged approaches to algorithm auditing,

which directly engage users in detecting problematic behaviors in

algorithmic systems. However, we know little about industry prac-

titioners’ current practices and challenges around user-engaged

auditing, nor what opportunities exist for them to better leverage

such approaches in practice. To investigate, we conducted a series

of interviews and iterative co-design activities with practitioners

who employ user-engaged auditing approaches in their work. Our

ndings reveal several challenges practitioners face in appropri-

ately recruiting and incentivizing user auditors, scaolding user

audits, and deriving actionable insights from user-engaged audit

reports. Furthermore, practitioners shared organizational obstacles

to user-engaged auditing, surfacing a complex relationship between

practitioners and user auditors. Based on these ndings, we discuss

opportunities for future HCI research to help realize the potential

(and mitigate risks) of user-engaged auditing in industry practice.

KEYWORDS

user-engaged algorithm auditing, responsible AI, industry practi-

tioners, fairness, bias

ACM Reference Format:

Wesley Hanwen Deng, Bill Boyuan Guo, Alicia DeVrio, Hong Shen, Motah-

hare Eslami, and Kenneth Holstein. 2023. Understanding Practices, Chal-

lenges, and Opportunities for User-Engaged Algorithm Auditing in Industry

Practice. In CHI ’23: ACM Conference on Human Factors in Computing Sys-

tems, April 23–38, 2023, Hamburg, Germany. ACM, New York, NY, USA,

18 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

∗Both authors contributed equally to this research.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

CHI ’23, April 23–28, 2023, Hamburg, Germany

ACM ISBN 978-1-4503-XXXX-X/18/06. . .$15.00

https://doi.org/10.1145/nnnnnnn.nnnnnnn

1 INTRODUCTION

In recent years, algorithm audits have risen to prominence as an

approach to uncover biased, discriminatory, or otherwise harmful

behaviors in algorithmic systems [

Today, algorithm audits are typically conducted by small groups

of experts such as industry practitioners, researchers, and activists

[

]. Although expert-led approaches have been highly im-

pactful, they often suer from major blindspots, and fail to detect

critical issues. For example, expert-led audits can fail when those

conducting the audit lack the relevant cultural knowledge and lived

experience to recognize and know where to look for certain kinds

of harmful algorithmic behaviors [25, 42, 80, 92].

To overcome limitations of current algorithm auditing tech-

niques, researchers in HCI and AI have begun to explore the po-

tential of more user-engaged approaches to algorithm auditing,

which directly engage users of AI products and services in surfac-

ing harmful algorithmic behaviors. Recent years have seen many

cases in which users organically came together to uncover and

raise awareness about harmful behaviors in algorithmic systems

they use day-to-day, which had eluded detection by industry teams

or other expert auditors [

]. Inspired by these observations, re-

searchers have begun to explore the design of systems that can

leverage the power of everyday users and crowds to surface harm-

ful algorithmic behaviors that might otherwise go undetected (e.g.,

[

]). The designs of existing research systems

span a spectrum of user engagement, from more practitioner-led

approaches—such as crowdsourcing workows in which users’ test-

ing and auditing activities are more heavily guided and constrained

by requesters—to more user-led approaches in which users take

greater initiative in directing their own activities.

In parallel to these research eorts, several major technology

companies have begun to experiment with approaches that engage

users in auditing their AI products and services for problematic

behaviors. For example, in 2021 Twitter introduced its rst “al-

gorithmic bias bounty” challenge to engage users in identifying

harmful biases in its image cropping algorithm [

]. In another

eort, Google launched the “AI Test Kitchen,” a web-based applica-

tion that invites users to experiment with some of Google’s latest

AI-based conversational agents, and to report any problematic be-

haviors they encounter [

]. More recently, inspired by Twitter’s

“algorithmic bias bounty,” OpenAI initiated a “Feedback Contest”

arXiv:2210.03709v4 [cs.HC] 21 Feb 2023

CHI ’23, April 23–28, 2023, Hamburg, Germany Wesley Deng et al.

to encourage users to “provide feedback on problematic model

outputs” during their interactions with ChatGPT chatbot [4].

Despite growing interest from industry, there remains a gulf

between the academic research literature on user-engaged auditing

and current industry practice. In particular, we still know little

about industry AI practitioners’ current practices and challenges

around user-engaged auditing, and what opportunities exist for

them to better leverage such approaches in practice. To investigate,

in this paper we explore the following research questions:

RQ1

What are AI practitioners’ current motivations and practices

around user engagement in auditing their AI products and

services for problematic algorithmic behaviors?

RQ2

What opportunities and challenges do practitioners envi-

sion for user-engaged approaches to better support their

algorithm auditing eorts?

We conducted a two-stage study with 12 industry practitioners

from 9 technology companies, all of whom have experimented with

engaging users in auditing their AI systems for problematic algo-

rithmic behaviors. We rst conducted semi-structured interviews to

understand practitioners’ current practices and challenges around

engaging users in AI testing and auditing. We then conducted co-

design activities, working with practitioners to iteratively co-design

three design artifacts as a way to further probe challenges perceived

by practitioners and opportunities to better support user-engaged

approaches to algorithm auditing in industry practice.

Overall, our participants shared three major motivations for

engaging users in AI testing and auditing: understanding users’

subjective experiences of problematic machine behaviors, over-

coming their teams’ blindspots when auditing their products and

services, and gathering evidence from users to help them advocate

for fairness work within their organizations. Participants shared

prior experiences engaging users on dierent scales, from individ-

ual user study sessions, to focus group workshops, to large-scale

user feedback and crowdsourcing activities. However, in doing so,

practitioners encountered various challenges in engaging users

eectively. For instance, practitioners discussed challenges they

faced in recruiting and incentivizing the “right” group of auditors

for a given task, with relevant identities and lived experiences. Par-

ticipants also discussed the diculties in scaolding users towards

productive auditing strategies, without biasing them to simply repli-

cate industry teams’ own blindspots. Finally, practitioners discussed

the challenges of quantication when deriving actionable insights

from user-engaged auditing reports: relying upon the majority vote

runs the risk of masking the very biases an audit is intended to

uncover. In addition, participants shared broader organizational

obstacles to user-engaged auditing, highlighting key tensions that

arise in practice when involving users in algorithm auditing ef-

forts such as potential PR risks, prot motives that work against

protecting marginalized groups, and privacy and legal concerns.

As private companies increasingly experiment with user-

engaged approaches to algorithm auditing, HCI research has a

critical role to play in shaping more eective and responsible prac-

tices. To this end, this work contributes:

•

An in-depth understanding of industry practitioners’ mo-

tivations, current practices, and challenges in eectively

engaging users in testing and auditing AI products and ser-

vices. Our ndings shed light on the types of problems prac-

titioners aim to address through user engagement around

algorithm auditing, as well as the the ways practitioners

navigate organizational tensions around user involvement

in AI development processes.

•

A set of design implications for user-engaged algorithm au-

diting, beyond standard considerations for human computa-

tion or user feedback systems.

•

Insights into the complex relationship between user auditors

and industry practitioners working on responsible AI, sug-

gesting opportunities for future HCI research to help realize

the potential (and mitigate risks) of user-engaged auditing

in industry practice.

2 RELATED WORK

2.1 Understanding and supporting responsible

AI practices in industry contexts

In recent years, signicant eort has been directed towards the

development of approaches, guidelines, and tools to help indus-

try practitioners audit their AI products and services for un-

fair, biased, or otherwise harmful algorithmic behaviors (e.g.,

[

]). Early work in this area has largely

been guided by advances in academic research on AI fairness

[

]. Yet in a series of interview studies and surveys

with industry AI practitioners, Holstein et al. [

] found that there

were major disconnects between the tools oered by the research

community, versus the actual on-the-ground needs of industry AI

practitioners. To address such gaps, a growing line of research in

HCI has focused on better understanding industry AI practitioners

needs and designing to support responsible AI practices in industry.

For example, studies from Madaio et al. [

] and Rakova et al. [

]

investigated the organizational challenges and barriers that practi-

tioners face in practice when attempting to build more responsible

AI systems.

Meanwhile, to better support responsible AI practices, companies

have been developing responsible AI guidelines such as People + AI

guidebook [

], trustworthy AI principles [

], AI fairness check-

lists [

], and responsible AI toolkits such as AI Explainability 360

[

] and Fairlearn [

]. However, recent HCI research has surfaced

gaps between fairness toolkits’ capabilities and practitioners’ needs

[

]. For example, Kaur et al. [

] found that AI practi-

tioners often over-trust and misuse AI explainability toolkits. Other

work from Lee et al. and Deng et al. identied misalignment be-

tween the designs of existing fairness toolkits versus practitioners’

actual desires and usage of these toolkits [

]. In interviews

with AI practitioners, these authors found that, beyond the func-

tionality provided by current toolkits, practitioners desired tools

that could help them bring in perspectives from relevant domain

experts and users, in order to aid them in auditing their AI systems

[

]. In the next sections, we discuss emerging work that aims to

harness the power of users in algorithm auditing.

2.2 The power of users in algorithm auditing

Metaxa et al. [

] dene an algorithm audit as “a method of re-

peatedly querying an algorithm and observing its output to draw

User-Engaged Algorithm Auditing in Industry Practice CHI ’23, April 23–28, 2023, Hamburg, Germany

conclusions about the algorithm’s opaque inner workings and pos-

sible external impact.” A growing body of work in HCI, AI, and

related communities has developed tools and processes to audit al-

gorithmic systems for biased, discriminatory, or otherwise harmful

behaviors (e.g., [

]). Past work in algorithm auditing has

uncovered harmful algorithmic behaviors across a wide range of

algorithmic systems, from search engines to hiring algorithms to

computer vision applications [7, 16, 37, 62, 66, 86].

Today, algorithm audits are typically conducted by small groups

of experts such as industry practitioners, researchers, activists, and

government agencies [

]. However, such expert-led audits often

fail to surface serious issues that everyday users of algorithmic

systems are quickly able to detect once a system is deployed in the

real world [

]. For instance, this approach can fail when those

conducting the audit lack the relevant cultural knowledge and lived

experience to recognize and know where to look for certain kinds

of harmful algorithmic behaviors [

]. In addition, expert-

led audits may fail to detect certain harmful algorithmic behaviors

because these behaviors only arise—or are only recognized as harm-

ful—when a system is used in particular context or in particular

ways, which auditors may fail to anticipate [22, 30, 34, 42, 79, 80].

Recent years have seen many real-world cases in which users

have uncovered and raised awareness around harmful algorith-

mic behaviors in systems they use day-to-day (e.g., search engines

[

], online rating/review systems [

], and machine translation

systems [

]) although expert auditors had failed to detect these

issues. Shen et al. [

] developed the concept of “everyday algo-

rithm auditing” to describe how everyday users detect, understand,

and interrogate problematic machine behaviors via their daily in-

teractions with algorithmic systems. In the cases these authors

reviewed, regular users of a wide range of algorithmic systems and

platforms came together organically to hypothesize and test for

potential biases. More recently, DeVos et al. [

] conducted a series

of behavioral studies to better understand how users are often able

to be so eective, both individually and collectively, in surfacing

harmful algorithmic behaviors that more formal or expert-led audit-

ing approaches fail to detect. As discussed next, recent research is

beginning to explore ways to harness the power users in algorithm

auditing to overcome limitations of expert-led approaches.

2.3 Supporting user-engaged algorithm

auditing

Recognizing the power of users in algorithm auditing, researchers

have begun to explore the design of systems to support more user-

engaged approaches [25, 53] to algorithm auditing, which directly

engage users in surfacing harmful algorithmic behaviors that might

otherwise go undetected.

A line of work has developed interfaces, interactive visual-

izations, and crowdsourcing pipelines to support people in ac-

tively searching for algorithmic biases and harmful behaviors

[

]. The designs of these research systems span a spec-

trum of user-engagement, from more practitioner-led approaches

to more user-led approaches in which users take greater initiative

and control in directing their eorts. For example, Ochigame and

Ye developed a web-based tool called Search Atlas, which enables

users to easily conduct side-by-side comparisons of the Google

search results they might see if they were located in dierent coun-

tries to spot [

]. Kiela et al. developed a general research platform

called Dynabench, which invites users to try to identify erroneous

and potentially harmful behaviors in AI models [

]. Using Dyn-

abench, users can generate test inputs to a model to try to nd

problematic behaviors, ag behaviors they identify, and provide

brief open-text responses if they wish to oer additional context.

More recently, Lam et al. developed a tool called “IndieLabel,” in

order to empower end users to detect and ag potential algorith-

mic biases and then author audit reports to communicate these to

relevant decision-makers [53].

In parallel, several major technology companies have begun to

experiment with approaches that engage users in auditing their AI

products and services for problematic behaviors. For example, in

2021 Twitter introduced its rst “algorithmic bias bounty” challenge

to engage users in identifying harmful biases in its image cropping

algorithm [

]. In another eort, Meta adopted the Dynabench

platform described above, to discover potentially harmful behav-

iors in natural language processing models [

]. More recently,

Google launched the “AI Test Kitchen,” a web-based application

that invites users to experiment with Google’s latest LLMs-powered

conversational agents, and to report any problematic behaviors

they encounter, with the stated goal of engaging users in “learn-

ing, improving, and innovating responsibly on AI together” [

]. In

addition, organizations like OpenAI and HuggingFace are begin-

ning to include built-in interface features that invite users to report

harmful algorithmic behaviors they encounter while interacting

with LLM-powered applications like text-to-image generation tools.

HuggingFace developed features to engage end users in agging

ethical/legal issues on their API [

]. In addition, OpenAI initiated

a feedback contest around their LLM-based tool ChatGPT, with

the goal of encouraging users to “provide feedback on problematic

model outputs” [4].

Despite growing interest in both academia and industry, there

remains a gulf between the academic research literature on user-

engaged auditing and current industry practice. To date, little is

known about industry AI practitioners’ current practices and chal-

lenges around user-engaged auditing, nor what opportunities exist

for them to better leverage such approaches in practice. In this

paper, we take a rst step towards understanding current practices,

challenges, and design opportunities for user-engaged approaches

to algorithm auditing in industry practice.

3 METHOD

3.1 Study design

We conducted a two-stage study involving semi-structured inter-

views followed by iterative co-design activities. We rst conducted

semi-structured interviews to understand participants’ current prac-

tices and challenges around engaging users in AI testing and au-

diting. In the next stage, we engaged participants in a co-design

activity to further probe the opportunities and challenges in sup-

porting user-engaged algorithm auditing in industry practice. We

worked with participants to iteratively design three artifacts: a

user-engaged audit report

, representing a “wish list” of types

of information that they would ideally want to solicit through a

user-engaged auditing approach, and

two user-engaged auditing

CHI ’23, April 23–28, 2023, Hamburg, Germany Wesley Deng et al.

Figure 1: Two potential user-engaged auditing pipeline designs that were iteratively co-designed with participants. The left

image shows the “developer-led” pipeline design, and the right image shows the “user-led” pipeline design. Each gure illus-

trates a possible interaction ow between user auditors and AI product teams, showing how auditing tasks are created, how

background information on user auditors is shared, how user auditing reports are generated based on auditors’ ndings, and

how these reports are shared with AI product teams. During the co-design activity, participants could zoom in, annotate, and

modify the details. We used these pipeline owcharts as probes, not as nal products, to investigate more deeply on practi-

tioners’ challenges and desires.

pipelines

, building upon initial designs informed by interview nd-

ings and insights from prior literature [

]. Throughout

the study, we iterated on these design artifacts based on feedback

and design ideas from prior participants. We used these artifacts

and the process of co-designing them to provoke deeper conversa-

tions around participants’ desires, as well as potential risks they

anticipate, for new systems that support user-engaged auditing.

Following an iterative co-design process similar to prior work (cf.

[

]), for our rst ve participants, we ran the two stages of our

study in separate sessions in order to better inform the design of the

initial versions of the artifacts based on the needs and desires these

participants expressed in the rst set of interviews. However, we

soon encountered diculty in retaining industry participants due to

their busy schedules (e.g., one participant was not able to return to

complete the co-design). Therefore, after our rst ve participants,

we ran both stages in a single session. We then continued to iterate

on the artifacts during the study sessions themselves. Below, we

describe each of these activities in more detail.1

3.1.1 Stage one: Semi-structured interviews. To understand practi-

tioners’ current practices around engaging users in AI testing and

auditing, we conducted semi-structured interviews, each lasting up

to an hour. We adopted a directed storytelling approach [

]. We

rst asked participants to describe their team’s prior experiences in

trying to detect or address biased or harmful behaviors in their AI

products or services, with a specic focus on whether, why, and how

they engaged users in the process. For example, we asked “Could

you describe how your team attempted to engage users in auditing

the AI products and services you mentioned” and “What motivated

you or your team to engage users in this way?” Through follow-up

We also provide our interview and co-design protocol in the supplementary material.

questions, we probed deeper into challenges participants had en-

countered when attempting to engage users in the auditing process.

As participants shared specic challenges they had encountered,

we also invited them to share ideas for potential solutions to these

challenges. For example, in response to specic challenges raised

by participants, we asked “How did your team attempt to tackle these

challenges?” and “How eective were your team’s approaches?”

3.1.2 Stage two: Iterative co-design activities. To further envision

future opportunities and solicit potential challenges and risks for

user-engaged algorithm auditing approaches, following the inter-

views, we then involved participants in a series of co-design ac-

tivities, following an iterative co-design process similar to prior

work (cf. [

]). This stage of the study lasted up to 45 minutes,

and involved participants in co-design around three design arti-

facts: a

user-engaged audit report

and two

user-engaged audit

pipeline owcharts

. We rst designed initial versions of these

artifacts based on participant needs and desires expressed during

stage one, as well as prior research on user-engaged algorithm au-

diting [

]. We then iterated on their designs with practitioners

throughout the co-design activities. We note that these design arti-

facts were not the goal of our study, but rather served as tools to

probe more deeply on practitioners’ challenges and desires. Below,

we describe the process of co-designing these three artifacts, and

how we used this process to probe on future opportunities and

risks of user-engaged audits.

User-engaged audit report

: We invited each participant to

contribute to the design of a report that they would ideally like

to see as the output of a user-engaged auditing process. We rst

asked participants open-ended questions such as “What information

would you ideally want the service to report back to your team?” and

encouraged them to sketch as they generated new ideas. To help

participants come up with ideas, we presented participants with

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UnderstandingPractices,Challenges,andOpportunitiesforUser-EngagedAlgorithmAuditinginIndustryPracticeWesleyHanwenDenghanwend@cs.cmu.eduCarnegieMellonUniversityPittsburgh,PA,USABillBoyuanGuoboyuang@andrew.cmu.eduCarnegieMellonUniversityPittsburgh,PA,USAAliciaDeVrioadevos@cs.cmu.eduCarnegieMellonUniver...

展开>> 收起<<

Understanding Practices Challenges and Opportunities for User-Engaged Algorithm Auditing in Industry Practice.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Understanding Practices Challenges and Opportunities for User-Engaged Algorithm Auditing in Industry Practice

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: