Boundary-Aware Uncertainty for Feature Attribution Explainers

2025-04-22 0 0 9.45MB 30 页 10玖币
侵权投诉
Boundary-Aware Uncertainty for Feature Attribution Explainers
Davin Hill Aria Masoomi Max Torop
Northeastern University
dhill@ece.neu.edu
Northeastern University
masoomi.a@northeastern.edu
Northeastern University
torop.m@northeastern.edu
Sandesh Ghimire Jennifer Dy
Northeastern University
drsandeshghimire@gmail.com
Northeastern University
jdy@ece.neu.edu
Abstract
Post-hoc explanation methods have become a
critical tool for understanding black-box clas-
sifiers in high-stakes applications. However,
high-performing classifiers are often highly
nonlinear and can exhibit complex behav-
ior around the decision boundary, leading
to brittle or misleading local explanations.
Therefore there is an impending need to
quantify the uncertainty of such explanation
methods in order to understand when expla-
nations are trustworthy. In this work we
propose the Gaussian Process Explanation
UnCertainty (GPEC) framework, which gen-
erates a unified uncertainty estimate com-
bining decision boundary-aware uncertainty
with explanation function approximation un-
certainty. We introduce a novel geodesic-
based kernel, which captures the complex-
ity of the target black-box decision bound-
ary. We show theoretically that the pro-
posed kernel similarity increases with deci-
sion boundary complexity. The proposed
framework is highly flexible; it can be used
with any black-box classifier and feature at-
tribution method. Empirical results on mul-
tiple tabular and image datasets show that
the GPEC uncertainty estimate improves un-
derstanding of explanations as compared to
existing methods.
Proceedings of the 27th International Conference on Artifi-
cial Intelligence and Statistics (AISTATS) 2024, Valencia,
Spain. PMLR: Volume 238. Copyright 2024 by the au-
thor(s).
1 INTRODUCTION
Post-hoc explainability methods have become a cru-
cial tool for understanding and diagnosing their black-
box model predictions. Recently, many such explainers
have been introduced in the category of local feature
attribution methods; that is, methods that return a
real-valued score representing each feature’s relative
importance for the model prediction. These explain-
ers are local in that they are not limited to using the
same decision rules throughout the data distribution,
therefore they are better able to represent nonlinear
and complex black-box models.
However, recent works have shown that local explain-
ers can be inconsistent or unstable. For example, ex-
plainers may yield highly dissimilar explanations for
similar samples (Alvarez-Melis and Jaakkola,2018;
Khan et al.,2023), exhibit sensitivity to impercepti-
ble perturbations (Dombrowski et al.,2019;Ghorbani
et al.,2019;Slack et al.,2020), or lack stability un-
der repeated application (Visani et al.,2022). When
working in high-stakes applications, it is imperative
to provide the user with an understanding of whether
an explanation is reliable, potentially problematic, or
even misleading. A way to guide users regarding an
explainer’s reliability is to provide corresponding un-
certainty quantification estimates.
One can consider explainers as function approxima-
tors; as such, standard techniques for quantifying the
uncertainty of estimators can be utilized to quantify
the uncertainty of explainers. This is the strategy uti-
lized by existing methods that estimate explainer un-
certainty (e.g. (Slack et al.,2021;Schwab and Karlen,
2019)). However, we observe that for explainers, this
is not sufficient; in addition to uncertainty due to the
function approximation of explainers, explainers also
have to deal with the uncertainty due to the com-
plexity of the decision boundary (DB) of the black-box
arXiv:2210.02419v5 [cs.LG] 4 Mar 2024
Boundary-Aware Uncertainty for Feature Attribution Explainers
Sodium Intake (SI)
Cholesterol (CH)
Patient B
CH
SI
Patient A
CH
SI
Feature Importance
Decision Boundary
Sodium Intake (SI)
Cholesterol (CH)
Decision Boundary
Patient B
CH
SI
Patient A
CH
SI
Feature Importance
Figure 1: Illustrative example of potential pitfalls
when relying on local explainers for samples near com-
plex regions of the decision boundary (left) as com-
pared with a smoothed decision boundary (right).
model in the local region being explained.
Previous works investigating DB geometry have re-
lated higher DB complexity to increased model gen-
eralization error (Valle-Perez et al.,2019) and in-
creased adversarial vulnerability (Moosavi-Dezfooli
et al.,2019;Fawzi et al.,2018). Smoother DBs have
been shown to improve feature attributions (Wang
et al.,2020) and produce more consistent counterfac-
tual explanations (Black et al.,2022). Dombrowski
et al. (2019) show that, in ReLU networks, samples
with similar predictions can yield widely disparate
explanations, which can be regulated through model
smoothing. Consider the following example (Fig. 1):
a prediction model is used for a medical diagnosis us-
ing two features: cholesterol level and sodium intake.
We use the gradient with respect to each feature as
an estimate of feature importance. Patients A and
B have similar cholesterol and sodium levels and re-
ceive the same prediction, however, the complex deci-
sion boundary (left) results in a different top feature
for each patient. In contrast, the smoothed decision
boundary (right) yields more consistent explanations.
We approach this problem from the perspective of sim-
ilarity: given two samples and their respective expla-
nations, how closely related should the explanations
be? From the previous intuition, we define this simi-
larity based on a geometric perspective of the DB com-
plexity between these two points. Specifically, we pro-
pose the novel Weighted Exponential Geodesic (WEG)
kernel, which encodes our expectation that two sam-
ples close in Euclidean space may not actually be sim-
ilar if the DB within a local neighborhood of the sam-
ples is highly complex.
Using this similarity formulation, we propose the
Gaussian Process Explanation UnCertainty (GPEC)
framework (Fig. 2), which is an instance-wise, model-
agnostic, and explainer-agnostic method to quantify
the explanation uncertainty. The proposed notion of
uncertainty is complementary to existing quantifica-
tion methods. Existing methods primarily estimate
the uncertainty related to the choice in model param-
eters and fitting the explainer, which we call function
approximation uncertainty, and does not capture un-
certainty related to DB complexity. GPEC can com-
bine the DB-based uncertainty with function approxi-
mation uncertainty derived from any local feature at-
tribution method.
In summary, we make the following contributions:
We introduce a novel geometric perspective on
capturing explanation uncertainty and define a
geodesic-based similarity between explanations. We
prove theoretically that the proposed similarity cap-
tures the complexity of the decision boundary from
a given black-box classifier.
We propose a novel Gaussian Process-based frame-
work that combines 1) uncertainty from decision
boundary complexity and 2) explainer-specific func-
tion approximation uncertainty to generate uncer-
tainty estimates for any given feature attribution
method and black box model.
Empirical results show GPEC uncertainty improves
understanding of feature attribution methods.
2 RELATED WORKS
Explanation Methods. A variety of methods have
been proposed for improving the transparency of pre-
trained black-box prediction models (Guidotti et al.,
2018;Barredo Arrieta et al.,2020). Within this cat-
egory of post-hoc methods, many methods focus on
local explanations, that is, explaining individual pre-
dictions rather than the entire model. Some of these
methods implement local feature selection (Chen et al.,
2018;Masoomi et al.,2020); others return a real-
valued score for each feature, termed feature attri-
bution methods, which are the primary focus of this
work. For example, LIME (Ribeiro et al.,2016) trains
a local linear regression model to approximate the
black-box model. Lundberg and Lee (2017) general-
izes LIME and five other feature attribution meth-
ods using the SHAP framework, which fulfill a num-
ber of desirable axioms. While LIME and SHAP are
model-agnostic, others are model-specific, such as neu-
ral networks (Bach et al.,2015;Shrikumar et al.,2017;
Sundararajan et al.,2017;Erion et al.,2021), tree
ensembles (Lundberg et al.,2020), or Bayesian neu-
ral networks (Bykov et al.,2020). Another class of
methods involves training surrogate models to explain
the black-box model (Dabkowski and Gal,2017;Chen
et al.,2018;Schwab and Karlen,2019;Guo et al.,2018;
Jethani et al.,2022).
Explanation Uncertainty. One option for improv-
ing explainer trustworthiness is to quantify their asso-
ciated uncertainty. Bootstrap resampling techniques
have been proposed to estimate uncertainty from
Davin Hill, Aria Masoomi, Max Torop, Sandesh Ghimire, Jennifer Dy
𝑥
Features
Black-Box Classifier 𝐹Black-Box Explainer 𝐻
Boundary Samples 𝑀" = " {𝑚 𝐹(𝑚) " = 0.5}
GPEC
Prediction
Explanations
GPEC
Uncertainty
!𝜎!
"= 𝑉𝑎𝑟({𝐻#𝑥 , … 𝐻$𝑥 })
Explanation Noise
𝑘𝑊𝐸𝐺(𝑥, 𝑥’)
WEG Kernel
Data
Sample 𝑥
Features
Importance
Feature 1
Feature 2
Decision Boundary
Figure 2: Overview of the GPEC framework. GPEC takes samples from the classifier’s decision boundary plus
(possibly noisy) explanations and fits a GP model with the novel WEG Kernel. The GPEC estimate incorporates
both the uncertainty derived from the decision boundary complexity and also the explanation approximation
uncertainty from the explainer.
surrogate-based explainers (Schwab and Karlen,2019;
Schulz et al.,2022). Guo et al. (2018) also proposes
a surrogate explainer parameterized with a Bayesian
mixture model. Alternatively, Bykov et al. (2020) and
Patro et al. (2019) introduce methods for explaining
Bayesian neural networks, which can be transferred
to their non-Bayesian counterparts. Covert and Lee
(2021) derive an unbiased version of KernelSHAP and
investigates an efficient way of estimating its uncer-
tainty. Zhang et al. (2019) categorizes different sources
of variance in LIME estimates. Several methods also
investigate LIME and KernelSHAP in a Bayesian con-
text; for example, calculating a posterior over attribu-
tions (Slack et al.,2021), investigating priors for ex-
planations (Zhao et al.,2021), or using active learning
during sampling (Saini and Prasad,2022).
However, existing methods for quantifying explanation
uncertainty only consider the uncertainty of the ex-
plainer as a function approximator. This work intro-
duces an additional notion of uncertainty for explain-
ers that considers the complexity of the classifier DB.
3 UNCERTAINTY FRAMEWORK
FOR EXPLAINERS
We now outline the GPEC framework (Fig. 2), which
is parametrized with a Gaussian Process (GP) regres-
sion model1. Consider a sample x∈ X that we
want to explain in the context of a black-box classi-
fier F:X [0,1], where X RDis the data space
and Dis the number of features. For convenience we
consider the binary classification case; this is extended
to multiclass in App. C. We apply a local feature at-
tribution explainer H:X RD.
Recent works (e.g. Alvarez-Melis and Jaakkola (2018);
Dombrowski et al. (2019)) have shown that local ex-
1A brief review of GP regression is provided in App. B.
planations can lack robustness and stability related to
model complexity. Therefore, when explaining sam-
ples in high-stakes applications, it is critical to un-
derstand the behavior of the explainer, especially in
relation to other samples near x. More concretely, let
XRN×Drepresent a dataset of Nsamples. Here,
each row vector XnRD,nNrepresents a data
point. We apply Hto the rows of Xgenerating N
observed explanations, EnRD,nN, which are
grouped into ERN×D. We can use these observed
sample-explanation pairs to infer the behavior of H
around x, however there are two main challenges.
First, we expect the similarity between the explana-
tions of Xand xto be dependent on F. In particular,
we expect that as the DB in a neighborhood around x
and a given sample Xnbecomes increasingly complex,
H(x) and H(Xn) may become more dissimilar; i.e.
H(Xn) may not contain useful information towards
inferring H(x). In this situation, the user should be
prompted to either draw additional samples near x,
or otherwise be warned of higher uncertainty. Sec-
ond, the observed explanations Ecan be noisy; many
explainers are stochastic and approximated with sam-
pling methods or a learned function.
To solve these challenges, we can model the explainer
with a vector-valued GP regression by treating the ex-
plainer as a latent function inferred using samples X
and explanations E. We model each explanation En
as being generated from a latent function Hplus inde-
pendent Gaussian noise ηn. For convenience, we con-
sider each feature dindependently; see App. Cfor
extensions.
En,d =Hd(Xn) + ηn,d s.t. Hd(Xn)∼ GP(0, k(·,·))
| {z }
Decision Boundary-Aware Uncertainty
(1)
s.t. ηn,d N (0, σ2
n,d)
| {z }
Function Approximation Uncertainty
where k(·,·) is the specified kernel function for the GP
Boundary-Aware Uncertainty for Feature Attribution Explainers
prior. We disentangle each explanation into two com-
ponents, H(Xn) and ηn, which represent two separate
sources of uncertainty: 1) a decision boundary-aware
uncertainty which we capture using the kernel similar-
ity, and 2) a function approximation uncertainty from
the explainer. After specifying H(Xn) and ηn, we can
combine the two sources by calculating the predictive
distribution for x. We take the variance of this dis-
tribution as the GPEC uncertainty estimate:
Vd[x] = k(x, x)k(X, x)[K+σ2
dIN]1k(X, x)
(2)
where KRN×Nis the kernel matrix s.t. Kij =
k(Xi, Xj)i, j ∈ {1...N},k(X, x)RN×1has ele-
ments k(X, x)i=k(Xi, x)i∈ {1...N},σ2
dRN
+is
the variance parameter for explanation noise, and INis
the identity matrix. From Eq. (2) we see that predic-
tive variance captures DB-aware uncertainty through
the kernel function k(·,·), and also the function ap-
proximation uncertainty through the σ2
dINterm.
Function Approximation Uncertainty. The ηn
component of Eq. (1) represents the uncertainty
stemming from explainer estimation. For example,
ηncan represent the variance due to sampling (e.g.
perturbation-based explainers) or explainer training
(e.g. surrogate-based explainers). Explainers that in-
clude some estimate of uncertainty (e.g. BayesLIME,
BayesSHAP, CXPlain) can be directly used to esti-
mate σ2
n. For other stochastic explanation methods,
we can estimate σ2
nempirically by resampling Jex-
planations for the same sample Xn:
ˆσ2
n=1
|J|
J
X
i=1 Hi(Xn)1
|J|
J
X
j=1
Hj(Xn)2(3)
where each Hi(Xn) is a sampled explanation. Alter-
natively, for deterministic explanation methods we can
omit the ηnterm and assume noiseless explanations.
Decision Boundary-Aware Uncertainty. In con-
trast, the H(Xn) component of Eq. (1) represents the
distribution of functions that could have generated the
observed explanations. The choice of kernel k(·,·) en-
codes our a priori assumption regarding the similarity
between explanations based on the similarity of their
corresponding inputs. In other words, given two sam-
ples x, x X , how much information do we expect
a given explanation H(x) to provide for a nearby ex-
planation H(x)? As the DB between H(x) and H(x)
becomes more complex, we would expect for this infor-
mation to decrease. In Section 4, we consider a novel
kernel formulation that reflects the complexity of the
DB in a local neighborhood of the samples.
4 WEG KERNEL
Intuitively, the GP kernel encodes the assumption that
each explanation provides some information about
other nearby explanations, which is defined through
kernel similarity. To capture boundary-aware uncer-
tainty, we want to define a similarity k(x, x) that is
inversely related to the complexity or smoothness of
the DB between x, x X .
4.1 Geometry of the Decision Boundary
We represent the DB as a hypersurface embedded in
RDwith co-dimension one. Given the classifier F, we
define the DB2as MF={mRD:F(m) = 1
2}. For
any two points m, m∈ MF, let γ: [0,1] → MFbe
a differentiable map such that γ(0) = mand γ(1) =
m, representing a 1-dimensional curve on MF. We
can then define distances along the DB as geodesic
distances in MF(Fig 3A):
dgeo(m, m) = min
γZ1
0
|| ˙γ(t)||dtm, m∈ MF(4)
The relative complexity of the DB can be character-
ized by the geodesic distances between points on the
DB. For example, the simplest form that the DB can
take is a linear boundary. Consider a black-box model
with linear DB M1. For two points z, z∈ M1,
dgeo(z, z) = ||zz||2which corresponds with the
minimum geodesic distance in the ambient space. For
any nonlinear DB M2that also contains z, z, it fol-
lows that dgeo(z, z)>||zz||2. As the complexity
of the DB increases, there is a general corresponding
increase in geodesic distances between fixed points on
the DB. We can adapt geodesic distance in our kernel
selection through the exponential geodesic (EG) kernel
(Feragen et al.,2015).
kEG(m, m) = exp [λdgeo(m, m)] (5)
The EG kernel has been previously investigated in the
context of Riemannian manifolds (Feragen et al.,2015;
Feragen and Hauberg,2016). In particular, while prior
work shows that the EG kernel fails to be positive
definite for all values of λin non-Euclidean space, there
exists large intervals of λ > 0 for which the EG kernel
is positive definite. Appropriate values can be selected
through grid search and cross validation; we assume
that a valid value of λhas been selected.
Therefore, by sampling MF, we can use the EG kernel
matrix to capture DB complexity. However, a chal-
lenge remains in relating points x, x X \ MFto the
nearby DB. In Section 4.2 we consider a continuous
weighting over MFbased on distance to x, x.
2Without loss of generality, we assume that the classifier
decision rule is 1
2
Davin Hill, Aria Masoomi, Max Torop, Sandesh Ghimire, Jennifer Dy
(A) (B) (C)
𝑚′
𝑚
𝑥
𝑥𝑥
𝑥
𝑥𝑥′
𝑚!
𝑚"
Class 2
Class 1
Geodesic
Figure 3: Consider a classifier with DB defined as M0={(x1, f (x1)) : x1R>0}where f(x1) = 2 cos( 10
x1).
(A) Illustration of geodesic distance dgeo(m, m) between two points m, m ∈ M0.(B) Evaluation of the WEG
kernel for M0(top) and a linear DB (below). The gray region highlights the set {x:k(x, x)0.9}for a given x
(red). This region increases as the local DB become more linear. (C) During WEG approximation, we calculate
Euclidean distances between x, x(red, green) and DB samples m1, ..., mJ∈ M0(blue). When appropriately
normalized (Eq. (6)), this acts as a weighting for each element of the EG kernel.
4.2 Weighting Decision Boundary Samples
Let p(m) denote a distribution with support de-
fined over MFsuch that we can draw DB samples
m1...mJp(m) using a DB sampling algorithm (see
Sec. 4.4). We weight p(m) according to the 2norm
between mand a fixed data sample xto create a
weighted distribution q(m|x, ρ):
q(m|x, ρ)exp ρ||xm||2
2p(m) (6)
where ρrepresents a hyperpameter that controls the
sensitivity of the weighting. We can then define the
kernel function kWEG(x, x) by taking the expected
value over the weighted distributions.
kWEG(x, x) = Z Z kEG(m, m)
×q(m|x, ρ)q(m|x, ρ) dmdm(7)
=1
ZmZmZ Z exp [λdgeo(m, m)]
×exp ρ(||xm||2
2+||xm||2
2)p(m)p(m)dmdm
(8)
where Zm, Zmare normalizing constants for q(m|x, ρ)
and q(m|x, ρ), respectively. Eq. (8) is an example
of a marginalized kernel (Tsuda et al.,2002): a kernel
defined by the expected value of observed samples x, x
over latent variables m, m. Given that the underlying
EG kernel is positive definite, it follows that the WEG
kernel forms a valid kernel.
With the WEG kernel, we can calculate a similarity
between x, x∈ X that decreases as the complexity of
the DB segments between the two points increases. In
Fig. 3B we evaluate the WEG kernel similarity on non-
linear and a linear DB. We observe that WEG similar-
ity reflects the complexity of the DB; as the decision
boundary becomes more linear in a local region, the
similarity between neighboring points increases. To
evaluate the WEG kernel theoretically, we consider
two properties. Theorem 1establishes that the EG
kernel is a special case of the WEG kernel for when
x, x X ∩ MF.
Theorem 1. Given two points x, x X ∩ MF, then
limρ→∞ kWEG(x, x) = kEG(x, x)
Proof details are shown in App. C.1. Intuitively, as
ρincreases the manifold distribution closest to the
points x, xbecomes weighted increasingly heavily. At
the limit, the weighting concentrates entirely on x, x
themselves, which recovers the EG kernel. Therefore
we see that the WEG kernel is a generalization of the
EG kernel with a weighting controlled by ρ.
Theorem 2establishes the inverse relationship between
DB complexity and WEG similarity. Given a classifier
with a piecewise linear DB, we show that this DB rep-
resents a local maximum with respect to WEG kernel
similarity; i.e. as we perturb the DB to be nonlinear,
kernel similarity decreases. We first define perturba-
tions on the DB. Note that int(S) indicates the interior
of a set Sand id indicates the identity mapping.
Definition 1 (Manifold Perturbation).Let {Uα}αI
be charts of an atlas for a manifold P RD, where
Iis a set of indices. Let Pand e
Pbe differentiable
manifolds embedded in RD, where Pis a Piecewise
Linear manifold. Let R:P → e
Pbe a diffeomorphism.
We say e
Pis a perturbation of Pon the ith chart if R
satisfies the following two conditions: 1
There exists a
compact subset KiUis.t. R|P\int(Ki)= id|P\int(Ki)
and R|int(Ki)̸= id|int(Ki).2
There exists a linear
homeomorphism between an open subset f
UiUiwith
Rd1which contains Ki.
Theorem 2. Let Pbe a (d1)-dimension Piecewise
Linear manifold embedded in RD. Let e
Pbe a per-
turbation of Pand define ˜
k(x, x)and k(x, x)as the
摘要:

Boundary-AwareUncertaintyforFeatureAttributionExplainersDavinHillAriaMasoomiMaxToropNortheasternUniversitydhill@ece.neu.eduNortheasternUniversitymasoomi.a@northeastern.eduNortheasternUniversitytorop.m@northeastern.eduSandeshGhimireJenniferDyNortheasternUniversitydrsandeshghimire@gmail.comNortheaster...

展开>> 收起<<
Boundary-Aware Uncertainty for Feature Attribution Explainers.pdf

共30页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:30 页 大小:9.45MB 格式:PDF 时间:2025-04-22

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 30
客服
关注