
“Why did the Model Fail?”: Attributing Model Performance Changes to Distribution Shifts
in significantly different attributions than their contributions
to the shift in the joint distribution between environments.
In this work, we focus on explaining the discrepancy in
model performance between two environments as measured
by some metric such as prediction accuracy. We emphasize
the non-trivial nature of this problem, as many distribution
shifts will have no impact on a particular model or metric,
and some distribution shifts may even increase model perfor-
mance. Moreover, the root cause of the performance change
may be due to distribution shifts in variables external to
the model input. Thus, explaining performance discrepancy
requires us to develop specialized methods. Specifically, we
want to quantify the contribution to the performance change
of a fixed set of distributions that may change across the
environments. Given such a set, we develop a model-free
importance sampling approach to quantify this contribution.
We then use the Shapley value framework to estimate the at-
tribution for each distribution shift. This framework allows
us to expand the settings where our method is applicable.
We make the following contributions1:
•
We formalize the problem of attributing model perfor-
mance changes due to distribution shifts.
•
We propose a principled approach based on Shapley val-
ues for attribution, and show that it satisfies several desir-
able properties.
•
We validate the correctness and utility of our method on
synthetic and real-world datasets.
2. Problem Setup
Notation. Consider a learning setup where we have some
system variables denoted by
V
consisting of two types of
variables
V= (X, Y )
, which comprises of features
X
and
labels
Y
such that
V∼ D
. Realizations of the variables are
denoted in lower case. We assume access to samples from
two environments. We use
Dsource
to denote the source dis-
tribution and
Dtarget
for the target distribution. Subscripts on
D
refer to the distribution of specific variables. For example,
DX1
is the distribution of feature
X1⊂X
, and
DY|X
is the
conditional distribution of labels given all features X.
Let
XM⊆X
be the subset of features utilized by a given
model
f
. We are given a loss function
ℓ((x, y), f)7→ R
which assigns a real value to the model evaluated at a spe-
cific setting
x
of the variables. For example, in the case
of supervised learning, the model
f
maps
XM
into the la-
bel space, and a loss function such as the squared error
ℓ((x, y), f) := (y−f(xM))2
can be used to evaluate model
performance. We assume that the loss function can be com-
puted separately for each data point. Then, performance
1
Code:
https://github.com/MLforHealth/expl_
perf_drop
of the model in some environment with distribution
D
is
summarized by the average of the losses:
Perf(D) := E(x,y)∼D[ℓ((x, y), f)]
This implies that a shift in any variables
V
in the system may
result in performance change across environments, includ-
ing those that are not directly used by the model, but drive
changes to the features XMused by the model for learning.
Setup. Suppose we are given a candidate set of (marginal
and/or conditional) distributions
CD
over
V
that may ac-
count for the model performance change from
Dsource
to
Dtarget
:
Perf(Dtarget)−Perf(Dsource)
.Our goal is to at-
tribute this change to each distribution in the candidate
set
CD
.For our method, we assume access to the model
f
,
and samples from Dsource as well as Dtarget (see Figure 1).
We assume that dependence between variables
V
is de-
scribed by a causal system (Pearl,2009). For every variable
Xi∈V
, this dependence is captured by a functional rela-
tionship between
Xi
and the so-called “causal parents” of
Xi
(denoted as
parent(Xi)
) driving the variation in
Xi
. The
causal dependence induces a Markov distribution over the
variables in this system. That is, the joint distribution
DV
can be factorized as,
DV=QXi∈VDXi|parent(Xi)
. This de-
pendence can be summarized graphically using a Directed
Acyclic Graph (DAG) with nodes corresponding to the sys-
tem variables and directed edges (
parent(Xi)→Xi
) in
the direction of the causal mechanisms in the system (see
Figure 1for an example).
Example. We provide an example that illustrates that the
performance attribution problem is ill-specified without
knowing how the mechanisms can change to result in the
observed performance difference. Suppose we are predict-
ing
Y
from
X
with a linear model
f(x) := ϕx
under the
squared loss function. Consider two possible scenarios for
data generation – (1)
X←Y
where
DY
changes from
source to target while
DX|Y
remains the same, (2)
X→Y
where
DX
changes from source to target while
DY|X
re-
mains the same. The performance difference of
f(x)
is the
same in both the cases. Naturally, we want an attribution
method to assign all of the difference to the mechanism for
Y
in the first case and to the mechanism of
X
in the second
case. Thus, for the same performance difference between
source and target data, we would like a method to output dif-
ferent attributions depending on whether the data generating
process is case (1) or (2). Note that, in general, it is im-
possible to find the appropriate attributions by first finding
the direction of the causal mechanisms. This follows from
the fact that learning the structure is in general, impossible
purely from observational data (Peters et al.,2017). Hence
knowledge of the data-generating mechanisms is necessary
for appropriate attribution.
More concretely, suppose the processes are (1)
Y∼
2