LEARNING TO ADVISE HUMANS IN HIGH-STAKES SETTINGS Nicholas Wolczynski University of Texas at Austin

2025-05-02 0 0 3.12MB 21 页 10玖币
侵权投诉
LEARNING TO ADVISE HUMANS IN HIGH-STAKES SETTINGS
Nicholas Wolczynski
University of Texas at Austin
nicholas@mccombs.utexas.edu
Maytal Saar-Tsechansky
University of Texas at Austin
maytal@mail.utexas.edu
Tong Wang
University of Iowa
tong-wang@uiowa.edu
ABSTRACT
Expert decision-makers (DMs) in high-stakes AI-assisted decision-making (AIaDM) settings receive
and reconcile recommendations from AI systems before making their final decisions. We identify
distinct properties of these settings which are key to developing AIaDM models that effectively benefit
team performance. First, DMs incur reconciliation costs from exerting decision-making resources
(e.g., time and effort) when reconciling AI recommendations that contradict their own judgment.
Second, DMs in AIaDM settings exhibit algorithm discretion behavior (ADB), i.e., an idiosyncratic
tendency to imperfectly accept or reject algorithmic recommendations for any given decision task.
The human’s reconciliation costs and imperfect discretion behavior introduce the need to develop
AI systems which (1) provide recommendations selectively, (2) leverage the human partner’s ADB
to maximize the team’s decision accuracy while regularizing for reconciliation costs, and (3) are
inherently interpretable. We refer to the task of developing AI to advise humans in AIaDM settings as
learning to advise and we address this task by first introducing the AI-assisted Team (AIaT)-Learning
Framework. We instantiate our framework to develop TeamRules (TR): an algorithm that produces
rule-based models and recommendations for AIaDM settings. TR is optimized to selectively advise a
human and to trade-off reconciliation costs and team accuracy for a given environment by leveraging
the human partner’s ADB. Evaluations on synthetic and real-world benchmark datasets with a variety
of simulated human accuracy and discretion behaviors show that TR robustly improves the team’s
objective across settings over interpretable, rule-based alternatives.
1 Introduction
Advances in machine learning performance and interpretability across domains have brought about a growing focus
on human-AI (HAI) systems to enhance human decision-making [Cai et al., 2019; Soares and Angelov, 2019; Green
and Chen, 2021; Basu et al., 2021; Lebovitz et al., 2022]. Most prior work that developed AI methods for human-AI
teams focused on settings where the AI can either make all decisions autonomously or can decide to defer to the human
for some tasks [Madras et al., 2018; Gao et al., 2021; Keswani et al., 2021]. In this work, we consider the task of
learning to advise in high-stakes AI-assisted decision making (AIaDM) settings where the human must act as the final
decision-maker (DM) for all instances. In such settings, the AI does not undertake any decisions autonomously; rather,
the AI may only advise the human on some or all instances. The task of learning to advise has proved challenging
in practice, and existing AI systems often do not significantly improve teams’ final decisions in high-stakes AIaDM
settings [Green and Chen, 2019a; Lebovitz et al., 2022; Green and Chen, 2021].
In this work, we first outline key properties of AIaDM contexts that pose challenges to team performance and then
propose an AIaDM method that addresses them. In particular, prior work that closely studied experts advised by AI in
high-stakes settings highlights that experts incur costs from exerting time and effort to reconcile AI recommendations
that contradict their own initial judgment [Lebovitz et al., 2022]. Consequently, DMs in AIaDM settings incur costs
Equal contribution
arXiv:2210.12849v3 [cs.HC] 13 Feb 2023
Learning to Advise Humans in High-Stakes Settings (Preprint)
when the AI system takes action and offers recommendations that contradict the DMs’ initial judgments. Each DM in a
given context has a tolerance for additional reconciliation costs, and when the effort required to reconcile contradicting
advice is excessive given their tolerance, DMs may ignore beneficial advice or disengage with the AI entirely [Lebovitz
et al., 2022]. Second, DMs exhibit imperfect algorithm discretion behavior (ADB), i.e., an idiosyncratic tendency to
imperfectly accept or reject algorithmic recommendations for any given decision task. Bansal et al. [2021a] demonstrate
that AI can improve team performance in AIaDM settings when the human DM optimally reconciles algorithmic
recommendations. However, prior work has established that humans’ discretion towards algorithmic recommendations
is unlikely to always be optimal, to the detriment of the team [Dietvorst et al., 2015; Green and Chen, 2019b,c; Chiang
and Yin, 2021; Bansal et al., 2021b; Zhang et al., 2022]. Thus, how AI can best improve team performance in the
presence of sup-optimal human discretion remains an open problem. Finally, high-stakes AIaDM settings also require
AI recommendations that are inherently interpretable [Vellido, 2020; Chiang and Yin, 2021]
2
. Interpretable advice
allows experts to reason about and justify the AI-advised decisions that contradict their initial judgment and to edit the
patterns underlying the recommendations [Caruana et al., 2015; Balagopal et al., 2021].
To overcome the challenges posed by high-stakes AIaDM properties, AIaDM systems must offer complementary advice
that can effectively enhance the team’s performance given any human’s arbitrary and imperfect ADB, decision-making
ability, and tolerance for incurring reconciliation costs, while also producing inherently interpretable advice that experts
can establish the reasoning for. We first propose that AIaDM systems make recommendations selectively, not only when
the AI is likely to be more accurate than the human, but also when the expected benefit to decision-making outweighs
the costs incurred from the human’s reconciliation of contradictory AI advice. Importantly, the cost-effectiveness
of advice is also informed by the human’s discretion behavior, namely, the likelihood of the human accepting the
advice. Second, we provide theoretical justification for the potential benefits of leveraging the human’s ADB, and we
empirically show that AIaDM systems can improve team outcomes by simultaneously leveraging the human’s ADB,
decision history, and tolerance for reconciliation costs within the AI training objective, allowing for direct optimization
over the team’s final decision.
We propose a framework and an algorithm to address the challenge of learning to advise humans in high-stakes settings.
Our framework and algorithm produce a personalized and inherently interpretable rule-based model that provides
complementary advice to an individual human DM. Specifically, we make the following contributions:
We identify and consider three key properties of high-stakes AIaDM settings where an imperfect human DM
may be advised by an AI but always undertakes the final decision. The human DM (1) incurs costs from
reconciling contradicting algorithmic recommendations, (2) can exhibit imperfect ADB towards the AI’s
advice, and (3) requires interpretable explanations to reason about and edit the AI model’s recommendations.
We propose that methods for high-stakes AIaDM settings can produce complementary advice beneficial
to team performance when they: (1) provide recommendations selectively, (2) are informed by the human
DM’s algorithm discretion behavior (ADB), decision history, and tolerance for reconciling advice, and (3) are
inherently interpretable.
We develop a framework for learning to advise humans in high-stakes AIaDM settings with the properties
above, and, based on our proposed framework, we develop and evaluate the TeamRules (TR) algorithm which
generates an inherently interpretable rule-based model designed to complement a human DM by providing
selective advice that leverages the human DM’s ADB, decision history, and tolerance for reconciling advice.
We evaluate TRs performance relative to alternative rule-based methods on three real-world datasets and two synthetic
datasets, and when paired with different simulated human behaviors. The results show that TR reliably and effectively
leverages the human’s ADB to selectively provide recommendations that improve the team’s objective over alternatives.
Additionally, TR can adapt to the partner’s tolerance for incurring additional reconciliation costs by trading away benefits
to team decision accuracy for lower reconciliation costs incurred. We empirically demonstrate that our method yields
superior team performance relative to benchmarks even under imperfect estimates of the human’s ADB, showcasing the
robustness of our method.
2 Related Work
In this section, we review related work on human-AI teams, algorithm discretion, and rule-based interpretable models.
2
In some key high-risk and regulated domains, model interpretability is required by law [Doshi-Velez and Kim, 2017; Goodman
and Flaxman, 2017]
2
Learning to Advise Humans in High-Stakes Settings (Preprint)
2.1 Human-AI Teams
Existing literature on human-AI (HAI) teams is broad and considers a variety of perspectives and applications. HAI
teams are increasingly deployed for decision-making in a multitude of high-stakes settings, including medicine [Cai et
al., 2019; Balagopal et al., 2021] and criminal justice [Soares and Angelov, 2019]. However, AI systems deployed in
high-stakes settings often do not lead to complementary team performance that is superior to both the human’s and AI’s
standalone performance [Green and Chen, 2019b,c; Lebovitz et al., 2022].
Expert DMs in high-stakes settings make many complex decisions under time constraints. When an AI system offers
the DM advice that contradicts their initial judgement, the DM incurs reconciliation costs due to the additional time and
effort required to reason about the contradicting recommendation [Lebovitz et al., 2022]. If the AI system over-burdens
the DM with excessive reconciliation costs, the DM may begin to disregard the AI advice entirely, gaining no benefit
from the AI.
Prior work on HAI teams demonstrated that directly optimizing the team’s objectives is key to producing AI systems
that complement human DMs, often by accounting for their decision history [Madras et al., 2018; Bansal et al., 2021a].
Specifically, Bansal et al. [2021a] consider a setting in which a DM is assumed to exhibit optimal algorithm discretion
behavior, such that the DM always accepts the AI decision if it will lead to a higher expected team outcome, and rejects
otherwise. However, humans’ discretion of algorithmic recommendations is not always optimal [Chiang and Yin, 2021;
Dietvorst et al., 2015; Castelo et al., 2019], and, thus, how AI can best improve team performance in the presence
of arbitrary and sup-optimal human discretion behavior is an open problem that we aim to address. In this work we
propose that AI systems designed for AIaDM settings should effectively complement humans exhibiting arbitrary
ADB; we propose how a human’s arbitrary ADB can be brought to bear and demonstrate the subsequent potential
benefits to the team’s performance.
We note that the AIaDM setting is distinct from the HAI team deferral setting in which the AI system can make
decisions autonomously, without human involvement, but may defer to the human on some decisions [Madras et al.,
2018; Wang and Saar-Tsechansky, 2020; Wilder et al., 2021; Keswani et al., 2021; Gao et al., 2021; Bondi et al.,
2022]. Methods developed for the deferral setting do not need to address the human’s discretion behavior and costs of
reconciling contradictory advice to improve and evaluate the team’s performance.
2.2 Algorithm Discretion
The term algorithm aversion was proposed to characterize humans’ tendency to avoid relying on an algorithmic
recommendation in favor of human judgment, even when the algorithm’s performance is superior [Dietvorst et al., 2015;
Castelo et al., 2019]. However, a human’s reconciliation of algorithmic recommendation can also exhibit over-reliance
on these recommendations [Logg et al., 2019; Mahmud et al., 2022], or be based on an adequate judgment of relevant
factors [Kim et al., 2020; Jussupow et al., 2020]. Consequently, in this work, we use the term Algorithm Discretion to
refer to the human’s arbitrary and idiosyncratic pattern to accept or reject algorithmic recommendations for any given
decision instance. Existing work has demonstrated that algorithm discretion behavior is predictable and is largely a
function of human self-confidence in their own decisions [Chong et al., 2022; Wang et al., 2022a]. Our method models
and aims to leverage a human’s arbitrary algorithm discretion behavior to improve human-AI team performance.
2.3 Rule-Based Models
Rule-based models are inherently interpretable and easy to understand because they take the form of sparse decision lists,
consisting of a series of if... then statements [Yildiz, 2014; Rudin, 2019; Wang et al., 2021; Wang, 2018]. This model
form offers an inherent reason for every prediction-based recommendation [Letham et al., 2015], and rule-based models
are widely recognized as one of the most intuitively understandable models for their transparent inner structures and
good model expressivity [Rudin, 2019; Wang et al., 2021]. In many high-stakes domains, experts require interpretability
to vet on the reasoning underlying the model’s predictions [Caruana et al., 2015]. Additionally, AIaDM systems yield
more productive outcomes when experts can directly edit the patterns underlying the recommendations, so as to reflect
the knowledge that the AI cannot otherwise learn [Caruana et al., 2015; Balagopal et al., 2021; Wang et al., 2022b].
Consequently, in this work, we provide an inherently interpretable, rule-based approach.
The method we introduce in this paper builds on the H
Y
RS method [Wang, 2019] originally proposed to offer partial
interpretability for black-box models. H
Y
RS is an extension of the Bayesian Rule Sets (BRS) method [Wang et al., 2017]
for creating rule set classifiers. While these works do not consider the problem of advising a human, we adapted the
H
Y
RS and BRS methods as benchmarks to evaluate the benefits of leveraging ADB and considering reconciliation costs
within a class of methods.
3
Learning to Advise Humans in High-Stakes Settings (Preprint)
Figure 1: AI-assisted Team-Learning Framework
3 Leveraging Human Behavior
We propose the AI-assisted Team (AIaT)-Learning Framework, shown in Figure 1. This framework includes three
iterative phases that can be implemented in practice to develop HAI-team models that leverage a DMs behavior to
benefit the team’s performance. We first provide theoretical motivation for the potential benefits of leveraging ADB
along with decision history and follow with a discussion on the phases of the AIaT-Learning framework.
3.1 Theoretical Motivation
We provide theoretical motivation for the potential benefits of leveraging ADB to improve team performance defined by
an arbitrary loss function. Let
X
be the set of possible examples with labels
Y={0,1}
, where
X × Y D
. We define
a human’s ADB as their tendency to accept or reject algorithmic recommendations for arbitrary decision instances
and feature values thereof. The human’s ADB can be expressed as the probability
p(a|x)
that the human would accept
a contradicting recommendation for any given decision task defined by a feature vector
x
. The event of accepting a
contradicting recommendation is defined by the binary indicator variable a.
The overarching goal of our machine learning task is to learn a classifier
c:X → Y
, where
c∈ C
is selected by
minimizing the expected loss L(y, c(x)) as follows:
c= min
c∈C Ex,y∼D (L(y, c(x))).(1)
However, for every instance defined by
x
, the classifier’s recommendation
c(x)
may get rejected by the human
decision maker with probability
1p(a|x)
. When a human rejects the classifier’s recommendation, the classifier’s
recommendation
c(x)
is replaced by the human’s initial decision on the task
h
. We can thus obtain a classifier
c0∈ C
by
minimizing the loss from the final team decision:
4
Learning to Advise Humans in High-Stakes Settings (Preprint)
c0= min
c∈C Ex,y∼D (p(a|x)L(y, c(x))
+ (1 p(a|x))L(y, h))
= min
c∈C Ex,y∼D (p(a|x)L(y, c(x))
+Ex,y∼D ((1 p(a|x))L(y, h))
= min
c∈C Ex,y∼D (p(a|x)L(y, c(x))(2)
In the above equation, we drop
Ex,y∼D ((1 p(a|x))L(y, h))
because it does not vary with choice of classifier
c∈ C
.
By definition:
Ex,y∼D (p(a|x)L(y, c0(x)) Ex,y∼D (p(a|x)L(y, c(x)) (3)
The above inequality demonstrates that
c0
is a superior classifier in expectation because it directly minimizes expected
loss over the instances that the human decision-maker would accept.
In practice, however, the human partner’s ADB is unknown, and must be learned. We discuss how to obtain an estimate
ˆp(a|x)in the next section.
3.2 AI-assisted Team (AIaT)-Learning Framework
Our AIaT-Learning framework consists of three phases. The Human Interaction Phase serves as the data acquisition
step during which we obtain information on the human partner’s decisions and ADB. Given training data {X, Y }, we
conduct two tasks involving the human. First, either historical data of the human’s past decisions is obtained, or, in
the absence of such history, the human records their decisions for a set of training instances; We refer to the resulting
vector of the human’s decisions as
H
. The second task involves acquiring data and modeling the human’s ADB. Prior
work established how a human’s ADB can be predicted [Wang et al., 2022a; Chong et al., 2022]. In particular, prior
work has found that a human’s inherent self-reported confidence in their own decision, prior to receiving an algorithmic
recommendation, is predictive of their ADB [Wang et al., 2022a; Chong et al., 2022]. In general, the greater the human’s
confidence in their own initial decision, the less likely they are to accept a contradictory algorithmic recommendation,
independently of their confidence in the AI or the AI’s explanation
3
. Thus, the human-interaction phase includes
the acquisition of the DM’s confidence in each of their decisions, denoted by vector
C
, for all training instances
X
.
Additionally, following prior work on learning human’s ADB [Wang et al., 2022a; Chong et al., 2022], the human’s
decisions to accept or reject recommendations, denoted by
A
, are recorded whenever the human is presented with AI
advice that contradicts their intitial judgement.
In the Human-Modeling Phase, the data acquired in the preceding steps is used to learn the discretion model of the
human partner’s ADB. Specifically, as discussed above, prior work has established that humans’ discretion outcomes
are predictable given the human’s confidence in their own decisions [Wang et al., 2022a; Chong et al., 2022]. Given
that prior work established how to learn a mapping onto the human’s discretion behavior, forming
ˆp(a|c, x)
, in this
work we propose how this discretion behavior can be brought to bear towards learning to advise humans, so as to
study its potential benefits. We leave the focus of developing superior discretion data acquisition and discretion model
methods to future work, and we discuss related challenges in the Future Work section. For brevity, henceforth we
denote
p(a|c, x)
and
ˆp(a|c, x)
as
p(a)
and
ˆp(a)
, respectively. Additionally, our approach also brings to bear the human’s
decision behavior with respect to the underlying decision task, so as to complement the human’s decision-making. In
principle, this behavior can be directly observed in the historical data, as well as during deployment. In contexts where
the human’s decisions cannot be observed for all training instances in
X
, a model
ˆ
h(x)
can be learned to infer the
human’s decisions (e.g., [Bansal et al., 2021b],[Madras et al., 2018]).
Finally, given training data
D={X, Y, H, ˆp(a)}
, the Learning to Advise Phase corresponds to simultaneously learning
when to advise the human and what interpretable advice to offer by leveraging the human’s decision history, discretion
model, and tolerance for reconciliation costs, with the goal of optimizing the overall HAI team’s performance. Our goal
also entails defining the team performance objective and metric that can reflect any given decision-making context.
Next, we develop an algorithm for Learning to Advise.
3
One may consider that the human’s ADB may be predicted exclusively from their decision history, given this history can predict
their decision accuracy for a given instance. However, DMs’ confidence, i.e., their assessment of their own accuracy, while shown
to be predictive of their ADB, is rarely well-calibrated with respect to their true accuracy [Klayman et al., 1999; Green and Chen,
2019b].
5
摘要:

LEARNINGTOADVISEHUMANSINHIGH-STAKESSETTINGSNicholasWolczynskiUniversityofTexasatAustinnicholas@mccombs.utexas.eduMaytalSaar-TsechanskyUniversityofTexasatAustinmaytal@mail.utexas.eduTongWangUniversityofIowatong-wang@uiowa.eduABSTRACTExpertdecision-makers(DMs)inhigh-stakesAI-assisteddecision-making(...

展开>> 收起<<
LEARNING TO ADVISE HUMANS IN HIGH-STAKES SETTINGS Nicholas Wolczynski University of Texas at Austin.pdf

共21页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:21 页 大小:3.12MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 21
客服
关注