User-Engaged Algorithm Auditing in Industry Practice CHI ’23, April 23–28, 2023, Hamburg, Germany
conclusions about the algorithm’s opaque inner workings and pos-
sible external impact.” A growing body of work in HCI, AI, and
related communities has developed tools and processes to audit al-
gorithmic systems for biased, discriminatory, or otherwise harmful
behaviors (e.g., [
16
,
60
,
77
]). Past work in algorithm auditing has
uncovered harmful algorithmic behaviors across a wide range of
algorithmic systems, from search engines to hiring algorithms to
computer vision applications [7, 16, 37, 62, 66, 86].
Today, algorithm audits are typically conducted by small groups
of experts such as industry practitioners, researchers, activists, and
government agencies [
60
]. However, such expert-led audits often
fail to surface serious issues that everyday users of algorithmic
systems are quickly able to detect once a system is deployed in the
real world [
42
,
80
]. For instance, this approach can fail when those
conducting the audit lack the relevant cultural knowledge and lived
experience to recognize and know where to look for certain kinds
of harmful algorithmic behaviors [
25
,
42
,
80
,
92
]. In addition, expert-
led audits may fail to detect certain harmful algorithmic behaviors
because these behaviors only arise—or are only recognized as harm-
ful—when a system is used in particular context or in particular
ways, which auditors may fail to anticipate [22, 30, 34, 42, 79, 80].
Recent years have seen many real-world cases in which users
have uncovered and raised awareness around harmful algorith-
mic behaviors in systems they use day-to-day (e.g., search engines
[
16
], online rating/review systems [
29
,
86
], and machine translation
systems [
66
]) although expert auditors had failed to detect these
issues. Shen et al. [
80
] developed the concept of “everyday algo-
rithm auditing” to describe how everyday users detect, understand,
and interrogate problematic machine behaviors via their daily in-
teractions with algorithmic systems. In the cases these authors
reviewed, regular users of a wide range of algorithmic systems and
platforms came together organically to hypothesize and test for
potential biases. More recently, DeVos et al. [
25
] conducted a series
of behavioral studies to better understand how users are often able
to be so eective, both individually and collectively, in surfacing
harmful algorithmic behaviors that more formal or expert-led audit-
ing approaches fail to detect. As discussed next, recent research is
beginning to explore ways to harness the power users in algorithm
auditing to overcome limitations of expert-led approaches.
2.3 Supporting user-engaged algorithm
auditing
Recognizing the power of users in algorithm auditing, researchers
have begun to explore the design of systems to support more user-
engaged approaches [25, 53] to algorithm auditing, which directly
engage users in surfacing harmful algorithmic behaviors that might
otherwise go undetected.
A line of work has developed interfaces, interactive visual-
izations, and crowdsourcing pipelines to support people in ac-
tively searching for algorithmic biases and harmful behaviors
[
8
,
17
,
48
,
63
]. The designs of these research systems span a spec-
trum of user-engagement, from more practitioner-led approaches
to more user-led approaches in which users take greater initiative
and control in directing their eorts. For example, Ochigame and
Ye developed a web-based tool called Search Atlas, which enables
users to easily conduct side-by-side comparisons of the Google
search results they might see if they were located in dierent coun-
tries to spot [
64
]. Kiela et al. developed a general research platform
called Dynabench, which invites users to try to identify erroneous
and potentially harmful behaviors in AI models [
48
]. Using Dyn-
abench, users can generate test inputs to a model to try to nd
problematic behaviors, ag behaviors they identify, and provide
brief open-text responses if they wish to oer additional context.
More recently, Lam et al. developed a tool called “IndieLabel,” in
order to empower end users to detect and ag potential algorith-
mic biases and then author audit reports to communicate these to
relevant decision-makers [53].
In parallel, several major technology companies have begun to
experiment with approaches that engage users in auditing their AI
products and services for problematic behaviors. For example, in
2021 Twitter introduced its rst “algorithmic bias bounty” challenge
to engage users in identifying harmful biases in its image cropping
algorithm [
21
]. In another eort, Meta adopted the Dynabench
platform described above, to discover potentially harmful behav-
iors in natural language processing models [
48
]. More recently,
Google launched the “AI Test Kitchen,” a web-based application
that invites users to experiment with Google’s latest LLMs-powered
conversational agents, and to report any problematic behaviors
they encounter, with the stated goal of engaging users in “learn-
ing, improving, and innovating responsibly on AI together” [
90
]. In
addition, organizations like OpenAI and HuggingFace are begin-
ning to include built-in interface features that invite users to report
harmful algorithmic behaviors they encounter while interacting
with LLM-powered applications like text-to-image generation tools.
HuggingFace developed features to engage end users in agging
ethical/legal issues on their API [
65
]. In addition, OpenAI initiated
a feedback contest around their LLM-based tool ChatGPT, with
the goal of encouraging users to “provide feedback on problematic
model outputs” [4].
Despite growing interest in both academia and industry, there
remains a gulf between the academic research literature on user-
engaged auditing and current industry practice. To date, little is
known about industry AI practitioners’ current practices and chal-
lenges around user-engaged auditing, nor what opportunities exist
for them to better leverage such approaches in practice. In this
paper, we take a rst step towards understanding current practices,
challenges, and design opportunities for user-engaged approaches
to algorithm auditing in industry practice.
3 METHOD
3.1 Study design
We conducted a two-stage study involving semi-structured inter-
views followed by iterative co-design activities. We rst conducted
semi-structured interviews to understand participants’ current prac-
tices and challenges around engaging users in AI testing and au-
diting. In the next stage, we engaged participants in a co-design
activity to further probe the opportunities and challenges in sup-
porting user-engaged algorithm auditing in industry practice. We
worked with participants to iteratively design three artifacts: a
user-engaged audit report
, representing a “wish list” of types
of information that they would ideally want to solicit through a
user-engaged auditing approach, and
two user-engaged auditing