
Learning to Advise Humans in High-Stakes Settings (Preprint)
c0= min
c∈C Ex,y∼D (p(a|x)L(y, c(x))
+ (1 −p(a|x))L(y, h))
= min
c∈C Ex,y∼D (p(a|x)L(y, c(x))
+Ex,y∼D ((1 −p(a|x))L(y, h))
= min
c∈C Ex,y∼D (p(a|x)L(y, c(x))(2)
In the above equation, we drop
Ex,y∼D ((1 −p(a|x))L(y, h))
because it does not vary with choice of classifier
c∈ C
.
By definition:
Ex,y∼D (p(a|x)L(y, c0(x)) ≤Ex,y∼D (p(a|x)L(y, c∗(x)) (3)
The above inequality demonstrates that
c0
is a superior classifier in expectation because it directly minimizes expected
loss over the instances that the human decision-maker would accept.
In practice, however, the human partner’s ADB is unknown, and must be learned. We discuss how to obtain an estimate
ˆp(a|x)in the next section.
3.2 AI-assisted Team (AIaT)-Learning Framework
Our AIaT-Learning framework consists of three phases. The Human Interaction Phase serves as the data acquisition
step during which we obtain information on the human partner’s decisions and ADB. Given training data {X, Y }, we
conduct two tasks involving the human. First, either historical data of the human’s past decisions is obtained, or, in
the absence of such history, the human records their decisions for a set of training instances; We refer to the resulting
vector of the human’s decisions as
H
. The second task involves acquiring data and modeling the human’s ADB. Prior
work established how a human’s ADB can be predicted [Wang et al., 2022a; Chong et al., 2022]. In particular, prior
work has found that a human’s inherent self-reported confidence in their own decision, prior to receiving an algorithmic
recommendation, is predictive of their ADB [Wang et al., 2022a; Chong et al., 2022]. In general, the greater the human’s
confidence in their own initial decision, the less likely they are to accept a contradictory algorithmic recommendation,
independently of their confidence in the AI or the AI’s explanation
3
. Thus, the human-interaction phase includes
the acquisition of the DM’s confidence in each of their decisions, denoted by vector
C
, for all training instances
X
.
Additionally, following prior work on learning human’s ADB [Wang et al., 2022a; Chong et al., 2022], the human’s
decisions to accept or reject recommendations, denoted by
A
, are recorded whenever the human is presented with AI
advice that contradicts their intitial judgement.
In the Human-Modeling Phase, the data acquired in the preceding steps is used to learn the discretion model of the
human partner’s ADB. Specifically, as discussed above, prior work has established that humans’ discretion outcomes
are predictable given the human’s confidence in their own decisions [Wang et al., 2022a; Chong et al., 2022]. Given
that prior work established how to learn a mapping onto the human’s discretion behavior, forming
ˆp(a|c, x)
, in this
work we propose how this discretion behavior can be brought to bear towards learning to advise humans, so as to
study its potential benefits. We leave the focus of developing superior discretion data acquisition and discretion model
methods to future work, and we discuss related challenges in the Future Work section. For brevity, henceforth we
denote
p(a|c, x)
and
ˆp(a|c, x)
as
p(a)
and
ˆp(a)
, respectively. Additionally, our approach also brings to bear the human’s
decision behavior with respect to the underlying decision task, so as to complement the human’s decision-making. In
principle, this behavior can be directly observed in the historical data, as well as during deployment. In contexts where
the human’s decisions cannot be observed for all training instances in
X
, a model
ˆ
h(x)
can be learned to infer the
human’s decisions (e.g., [Bansal et al., 2021b],[Madras et al., 2018]).
Finally, given training data
D={X, Y, H, ˆp(a)}
, the Learning to Advise Phase corresponds to simultaneously learning
when to advise the human and what interpretable advice to offer by leveraging the human’s decision history, discretion
model, and tolerance for reconciliation costs, with the goal of optimizing the overall HAI team’s performance. Our goal
also entails defining the team performance objective and metric that can reflect any given decision-making context.
Next, we develop an algorithm for Learning to Advise.
3
One may consider that the human’s ADB may be predicted exclusively from their decision history, given this history can predict
their decision accuracy for a given instance. However, DMs’ confidence, i.e., their assessment of their own accuracy, while shown
to be predictive of their ADB, is rarely well-calibrated with respect to their true accuracy [Klayman et al., 1999; Green and Chen,
2019b].
5