LEARNING TO ADVISE HUMANS IN HIGH-STAKES SETTINGS Nicholas Wolczynski University of Texas at Austin

2025-05-02 0 0 3.12MB 21 页 10玖币

侵权投诉

LEARNING TO ADVISE HUMANS IN HIGH-STAKES SETTINGS

Nicholas Wolczynski

University of Texas at Austin

nicholas@mccombs.utexas.edu

Maytal Saar-Tsechansky ∗

University of Texas at Austin

maytal@mail.utexas.edu

Tong Wang ∗

University of Iowa

tong-wang@uiowa.edu

ABSTRACT

Expert decision-makers (DMs) in high-stakes AI-assisted decision-making (AIaDM) settings receive

and reconcile recommendations from AI systems before making their ﬁnal decisions. We identify

distinct properties of these settings which are key to developing AIaDM models that effectively beneﬁt

team performance. First, DMs incur reconciliation costs from exerting decision-making resources

(e.g., time and effort) when reconciling AI recommendations that contradict their own judgment.

Second, DMs in AIaDM settings exhibit algorithm discretion behavior (ADB), i.e., an idiosyncratic

tendency to imperfectly accept or reject algorithmic recommendations for any given decision task.

The human’s reconciliation costs and imperfect discretion behavior introduce the need to develop

AI systems which (1) provide recommendations selectively, (2) leverage the human partner’s ADB

to maximize the team’s decision accuracy while regularizing for reconciliation costs, and (3) are

inherently interpretable. We refer to the task of developing AI to advise humans in AIaDM settings as

learning to advise and we address this task by ﬁrst introducing the AI-assisted Team (AIaT)-Learning

Framework. We instantiate our framework to develop TeamRules (TR): an algorithm that produces

rule-based models and recommendations for AIaDM settings. TR is optimized to selectively advise a

human and to trade-off reconciliation costs and team accuracy for a given environment by leveraging

the human partner’s ADB. Evaluations on synthetic and real-world benchmark datasets with a variety

of simulated human accuracy and discretion behaviors show that TR robustly improves the team’s

objective across settings over interpretable, rule-based alternatives.

1 Introduction

Advances in machine learning performance and interpretability across domains have brought about a growing focus

on human-AI (HAI) systems to enhance human decision-making [Cai et al., 2019; Soares and Angelov, 2019; Green

and Chen, 2021; Basu et al., 2021; Lebovitz et al., 2022]. Most prior work that developed AI methods for human-AI

teams focused on settings where the AI can either make all decisions autonomously or can decide to defer to the human

for some tasks [Madras et al., 2018; Gao et al., 2021; Keswani et al., 2021]. In this work, we consider the task of

learning to advise in high-stakes AI-assisted decision making (AIaDM) settings where the human must act as the ﬁnal

decision-maker (DM) for all instances. In such settings, the AI does not undertake any decisions autonomously; rather,

the AI may only advise the human on some or all instances. The task of learning to advise has proved challenging

in practice, and existing AI systems often do not signiﬁcantly improve teams’ ﬁnal decisions in high-stakes AIaDM

settings [Green and Chen, 2019a; Lebovitz et al., 2022; Green and Chen, 2021].

In this work, we ﬁrst outline key properties of AIaDM contexts that pose challenges to team performance and then

propose an AIaDM method that addresses them. In particular, prior work that closely studied experts advised by AI in

high-stakes settings highlights that experts incur costs from exerting time and effort to reconcile AI recommendations

that contradict their own initial judgment [Lebovitz et al., 2022]. Consequently, DMs in AIaDM settings incur costs

∗Equal contribution

arXiv:2210.12849v3 [cs.HC] 13 Feb 2023

Learning to Advise Humans in High-Stakes Settings (Preprint)

when the AI system takes action and offers recommendations that contradict the DMs’ initial judgments. Each DM in a

given context has a tolerance for additional reconciliation costs, and when the effort required to reconcile contradicting

advice is excessive given their tolerance, DMs may ignore beneﬁcial advice or disengage with the AI entirely [Lebovitz

et al., 2022]. Second, DMs exhibit imperfect algorithm discretion behavior (ADB), i.e., an idiosyncratic tendency to

imperfectly accept or reject algorithmic recommendations for any given decision task. Bansal et al. [2021a] demonstrate

that AI can improve team performance in AIaDM settings when the human DM optimally reconciles algorithmic

recommendations. However, prior work has established that humans’ discretion towards algorithmic recommendations

is unlikely to always be optimal, to the detriment of the team [Dietvorst et al., 2015; Green and Chen, 2019b,c; Chiang

and Yin, 2021; Bansal et al., 2021b; Zhang et al., 2022]. Thus, how AI can best improve team performance in the

presence of sup-optimal human discretion remains an open problem. Finally, high-stakes AIaDM settings also require

AI recommendations that are inherently interpretable [Vellido, 2020; Chiang and Yin, 2021]

. Interpretable advice

allows experts to reason about and justify the AI-advised decisions that contradict their initial judgment and to edit the

patterns underlying the recommendations [Caruana et al., 2015; Balagopal et al., 2021].

To overcome the challenges posed by high-stakes AIaDM properties, AIaDM systems must offer complementary advice

that can effectively enhance the team’s performance given any human’s arbitrary and imperfect ADB, decision-making

ability, and tolerance for incurring reconciliation costs, while also producing inherently interpretable advice that experts

can establish the reasoning for. We ﬁrst propose that AIaDM systems make recommendations selectively, not only when

the AI is likely to be more accurate than the human, but also when the expected beneﬁt to decision-making outweighs

the costs incurred from the human’s reconciliation of contradictory AI advice. Importantly, the cost-effectiveness

of advice is also informed by the human’s discretion behavior, namely, the likelihood of the human accepting the

advice. Second, we provide theoretical justiﬁcation for the potential beneﬁts of leveraging the human’s ADB, and we

empirically show that AIaDM systems can improve team outcomes by simultaneously leveraging the human’s ADB,

decision history, and tolerance for reconciliation costs within the AI training objective, allowing for direct optimization

over the team’s ﬁnal decision.

We propose a framework and an algorithm to address the challenge of learning to advise humans in high-stakes settings.

Our framework and algorithm produce a personalized and inherently interpretable rule-based model that provides

complementary advice to an individual human DM. Speciﬁcally, we make the following contributions:

•

We identify and consider three key properties of high-stakes AIaDM settings where an imperfect human DM

may be advised by an AI but always undertakes the ﬁnal decision. The human DM (1) incurs costs from

reconciling contradicting algorithmic recommendations, (2) can exhibit imperfect ADB towards the AI’s

advice, and (3) requires interpretable explanations to reason about and edit the AI model’s recommendations.

•

We propose that methods for high-stakes AIaDM settings can produce complementary advice beneﬁcial

to team performance when they: (1) provide recommendations selectively, (2) are informed by the human

DM’s algorithm discretion behavior (ADB), decision history, and tolerance for reconciling advice, and (3) are

inherently interpretable.

•

We develop a framework for learning to advise humans in high-stakes AIaDM settings with the properties

above, and, based on our proposed framework, we develop and evaluate the TeamRules (TR) algorithm which

generates an inherently interpretable rule-based model designed to complement a human DM by providing

selective advice that leverages the human DM’s ADB, decision history, and tolerance for reconciling advice.

We evaluate TR’s performance relative to alternative rule-based methods on three real-world datasets and two synthetic

datasets, and when paired with different simulated human behaviors. The results show that TR reliably and effectively

leverages the human’s ADB to selectively provide recommendations that improve the team’s objective over alternatives.

Additionally, TR can adapt to the partner’s tolerance for incurring additional reconciliation costs by trading away beneﬁts

to team decision accuracy for lower reconciliation costs incurred. We empirically demonstrate that our method yields

superior team performance relative to benchmarks even under imperfect estimates of the human’s ADB, showcasing the

robustness of our method.

2 Related Work

In this section, we review related work on human-AI teams, algorithm discretion, and rule-based interpretable models.

In some key high-risk and regulated domains, model interpretability is required by law [Doshi-Velez and Kim, 2017; Goodman

and Flaxman, 2017]

Learning to Advise Humans in High-Stakes Settings (Preprint)

2.1 Human-AI Teams

Existing literature on human-AI (HAI) teams is broad and considers a variety of perspectives and applications. HAI

teams are increasingly deployed for decision-making in a multitude of high-stakes settings, including medicine [Cai et

al., 2019; Balagopal et al., 2021] and criminal justice [Soares and Angelov, 2019]. However, AI systems deployed in

high-stakes settings often do not lead to complementary team performance that is superior to both the human’s and AI’s

standalone performance [Green and Chen, 2019b,c; Lebovitz et al., 2022].

Expert DMs in high-stakes settings make many complex decisions under time constraints. When an AI system offers

the DM advice that contradicts their initial judgement, the DM incurs reconciliation costs due to the additional time and

effort required to reason about the contradicting recommendation [Lebovitz et al., 2022]. If the AI system over-burdens

the DM with excessive reconciliation costs, the DM may begin to disregard the AI advice entirely, gaining no beneﬁt

from the AI.

Prior work on HAI teams demonstrated that directly optimizing the team’s objectives is key to producing AI systems

that complement human DMs, often by accounting for their decision history [Madras et al., 2018; Bansal et al., 2021a].

Speciﬁcally, Bansal et al. [2021a] consider a setting in which a DM is assumed to exhibit optimal algorithm discretion

behavior, such that the DM always accepts the AI decision if it will lead to a higher expected team outcome, and rejects

otherwise. However, humans’ discretion of algorithmic recommendations is not always optimal [Chiang and Yin, 2021;

Dietvorst et al., 2015; Castelo et al., 2019], and, thus, how AI can best improve team performance in the presence

of arbitrary and sup-optimal human discretion behavior is an open problem that we aim to address. In this work we

propose that AI systems designed for AIaDM settings should effectively complement humans exhibiting arbitrary

ADB; we propose how a human’s arbitrary ADB can be brought to bear and demonstrate the subsequent potential

beneﬁts to the team’s performance.

We note that the AIaDM setting is distinct from the HAI team deferral setting in which the AI system can make

decisions autonomously, without human involvement, but may defer to the human on some decisions [Madras et al.,

2018; Wang and Saar-Tsechansky, 2020; Wilder et al., 2021; Keswani et al., 2021; Gao et al., 2021; Bondi et al.,

2022]. Methods developed for the deferral setting do not need to address the human’s discretion behavior and costs of

reconciling contradictory advice to improve and evaluate the team’s performance.

2.2 Algorithm Discretion

The term algorithm aversion was proposed to characterize humans’ tendency to avoid relying on an algorithmic

recommendation in favor of human judgment, even when the algorithm’s performance is superior [Dietvorst et al., 2015;

Castelo et al., 2019]. However, a human’s reconciliation of algorithmic recommendation can also exhibit over-reliance

on these recommendations [Logg et al., 2019; Mahmud et al., 2022], or be based on an adequate judgment of relevant

factors [Kim et al., 2020; Jussupow et al., 2020]. Consequently, in this work, we use the term Algorithm Discretion to

refer to the human’s arbitrary and idiosyncratic pattern to accept or reject algorithmic recommendations for any given

decision instance. Existing work has demonstrated that algorithm discretion behavior is predictable and is largely a

function of human self-conﬁdence in their own decisions [Chong et al., 2022; Wang et al., 2022a]. Our method models

and aims to leverage a human’s arbitrary algorithm discretion behavior to improve human-AI team performance.

2.3 Rule-Based Models

Rule-based models are inherently interpretable and easy to understand because they take the form of sparse decision lists,

consisting of a series of if... then statements [Yildiz, 2014; Rudin, 2019; Wang et al., 2021; Wang, 2018]. This model

form offers an inherent reason for every prediction-based recommendation [Letham et al., 2015], and rule-based models

are widely recognized as one of the most intuitively understandable models for their transparent inner structures and

good model expressivity [Rudin, 2019; Wang et al., 2021]. In many high-stakes domains, experts require interpretability

to vet on the reasoning underlying the model’s predictions [Caruana et al., 2015]. Additionally, AIaDM systems yield

more productive outcomes when experts can directly edit the patterns underlying the recommendations, so as to reﬂect

the knowledge that the AI cannot otherwise learn [Caruana et al., 2015; Balagopal et al., 2021; Wang et al., 2022b].

Consequently, in this work, we provide an inherently interpretable, rule-based approach.

The method we introduce in this paper builds on the H

RS method [Wang, 2019] originally proposed to offer partial

interpretability for black-box models. H

RS is an extension of the Bayesian Rule Sets (BRS) method [Wang et al., 2017]

for creating rule set classiﬁers. While these works do not consider the problem of advising a human, we adapted the

RS and BRS methods as benchmarks to evaluate the beneﬁts of leveraging ADB and considering reconciliation costs

within a class of methods.

Learning to Advise Humans in High-Stakes Settings (Preprint)

Figure 1: AI-assisted Team-Learning Framework

3 Leveraging Human Behavior

We propose the AI-assisted Team (AIaT)-Learning Framework, shown in Figure 1. This framework includes three

iterative phases that can be implemented in practice to develop HAI-team models that leverage a DMs behavior to

beneﬁt the team’s performance. We ﬁrst provide theoretical motivation for the potential beneﬁts of leveraging ADB

along with decision history and follow with a discussion on the phases of the AIaT-Learning framework.

3.1 Theoretical Motivation

We provide theoretical motivation for the potential beneﬁts of leveraging ADB to improve team performance deﬁned by

an arbitrary loss function. Let

be the set of possible examples with labels

Y={0,1}

, where

X × Y ∼ D

. We deﬁne

a human’s ADB as their tendency to accept or reject algorithmic recommendations for arbitrary decision instances

and feature values thereof. The human’s ADB can be expressed as the probability

p(a|x)

that the human would accept

a contradicting recommendation for any given decision task deﬁned by a feature vector

. The event of accepting a

contradicting recommendation is deﬁned by the binary indicator variable a.

The overarching goal of our machine learning task is to learn a classiﬁer

c∗:X → Y

, where

c∗∈ C

is selected by

minimizing the expected loss L(y, c(x)) as follows:

c∗= min

c∈C Ex,y∼D (L(y, c(x))).(1)

However, for every instance deﬁned by

, the classiﬁer’s recommendation

c(x)

may get rejected by the human

decision maker with probability

1−p(a|x)

. When a human rejects the classiﬁer’s recommendation, the classiﬁer’s

recommendation

c(x)

is replaced by the human’s initial decision on the task

. We can thus obtain a classiﬁer

c0∈ C

minimizing the loss from the ﬁnal team decision:

Learning to Advise Humans in High-Stakes Settings (Preprint)

c0= min

c∈C Ex,y∼D (p(a|x)L(y, c(x))

+ (1 −p(a|x))L(y, h))

= min

c∈C Ex,y∼D (p(a|x)L(y, c(x))

+Ex,y∼D ((1 −p(a|x))L(y, h))

= min

c∈C Ex,y∼D (p(a|x)L(y, c(x))(2)

In the above equation, we drop

Ex,y∼D ((1 −p(a|x))L(y, h))

because it does not vary with choice of classiﬁer

c∈ C

By deﬁnition:

Ex,y∼D (p(a|x)L(y, c0(x)) ≤Ex,y∼D (p(a|x)L(y, c∗(x)) (3)

The above inequality demonstrates that

is a superior classiﬁer in expectation because it directly minimizes expected

loss over the instances that the human decision-maker would accept.

In practice, however, the human partner’s ADB is unknown, and must be learned. We discuss how to obtain an estimate

ˆp(a|x)in the next section.

3.2 AI-assisted Team (AIaT)-Learning Framework

Our AIaT-Learning framework consists of three phases. The Human Interaction Phase serves as the data acquisition

step during which we obtain information on the human partner’s decisions and ADB. Given training data {X, Y }, we

conduct two tasks involving the human. First, either historical data of the human’s past decisions is obtained, or, in

the absence of such history, the human records their decisions for a set of training instances; We refer to the resulting

vector of the human’s decisions as

. The second task involves acquiring data and modeling the human’s ADB. Prior

work established how a human’s ADB can be predicted [Wang et al., 2022a; Chong et al., 2022]. In particular, prior

work has found that a human’s inherent self-reported conﬁdence in their own decision, prior to receiving an algorithmic

recommendation, is predictive of their ADB [Wang et al., 2022a; Chong et al., 2022]. In general, the greater the human’s

conﬁdence in their own initial decision, the less likely they are to accept a contradictory algorithmic recommendation,

independently of their conﬁdence in the AI or the AI’s explanation

. Thus, the human-interaction phase includes

the acquisition of the DM’s conﬁdence in each of their decisions, denoted by vector

, for all training instances

Additionally, following prior work on learning human’s ADB [Wang et al., 2022a; Chong et al., 2022], the human’s

decisions to accept or reject recommendations, denoted by

, are recorded whenever the human is presented with AI

advice that contradicts their intitial judgement.

In the Human-Modeling Phase, the data acquired in the preceding steps is used to learn the discretion model of the

human partner’s ADB. Speciﬁcally, as discussed above, prior work has established that humans’ discretion outcomes

are predictable given the human’s conﬁdence in their own decisions [Wang et al., 2022a; Chong et al., 2022]. Given

that prior work established how to learn a mapping onto the human’s discretion behavior, forming

ˆp(a|c, x)

, in this

work we propose how this discretion behavior can be brought to bear towards learning to advise humans, so as to

study its potential beneﬁts. We leave the focus of developing superior discretion data acquisition and discretion model

methods to future work, and we discuss related challenges in the Future Work section. For brevity, henceforth we

denote

p(a|c, x)

and

ˆp(a|c, x)

p(a)

and

ˆp(a)

, respectively. Additionally, our approach also brings to bear the human’s

decision behavior with respect to the underlying decision task, so as to complement the human’s decision-making. In

principle, this behavior can be directly observed in the historical data, as well as during deployment. In contexts where

the human’s decisions cannot be observed for all training instances in

, a model

h(x)

can be learned to infer the

human’s decisions (e.g., [Bansal et al., 2021b],[Madras et al., 2018]).

Finally, given training data

D={X, Y, H, ˆp(a)}

, the Learning to Advise Phase corresponds to simultaneously learning

when to advise the human and what interpretable advice to offer by leveraging the human’s decision history, discretion

model, and tolerance for reconciliation costs, with the goal of optimizing the overall HAI team’s performance. Our goal

also entails deﬁning the team performance objective and metric that can reﬂect any given decision-making context.

Next, we develop an algorithm for Learning to Advise.

One may consider that the human’s ADB may be predicted exclusively from their decision history, given this history can predict

their decision accuracy for a given instance. However, DMs’ conﬁdence, i.e., their assessment of their own accuracy, while shown

to be predictive of their ADB, is rarely well-calibrated with respect to their true accuracy [Klayman et al., 1999; Green and Chen,

2019b].

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LEARNINGTOADVISEHUMANSINHIGH-STAKESSETTINGSNicholasWolczynskiUniversityofTexasatAustinnicholas@mccombs.utexas.eduMaytalSaar-TsechanskyUniversityofTexasatAustinmaytal@mail.utexas.eduTongWangUniversityofIowatong-wang@uiowa.eduABSTRACTExpertdecision-makers(DMs)inhigh-stakesAI-assisteddecision-making(...

展开>> 收起<<

LEARNING TO ADVISE HUMANS IN HIGH-STAKES SETTINGS Nicholas Wolczynski University of Texas at Austin.pdf

共21页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

LEARNING TO ADVISE HUMANS IN HIGH-STAKES SETTINGS Nicholas Wolczynski University of Texas at Austin

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: