Boundary-Aware Uncertainty for Feature Attribution Explainers

2025-04-22 0 0 9.45MB 30 页 10玖币

侵权投诉

Davin Hill Aria Masoomi Max Torop

Northeastern University

dhill@ece.neu.edu

Northeastern University

masoomi.a@northeastern.edu

Northeastern University

torop.m@northeastern.edu

Sandesh Ghimire Jennifer Dy

Northeastern University

drsandeshghimire@gmail.com

Northeastern University

jdy@ece.neu.edu

Abstract

Post-hoc explanation methods have become a

critical tool for understanding black-box clas-

siﬁers in high-stakes applications. However,

high-performing classiﬁers are often highly

nonlinear and can exhibit complex behav-

ior around the decision boundary, leading

to brittle or misleading local explanations.

Therefore there is an impending need to

quantify the uncertainty of such explanation

methods in order to understand when expla-

nations are trustworthy. In this work we

propose the Gaussian Process Explanation

UnCertainty (GPEC) framework, which gen-

erates a uniﬁed uncertainty estimate com-

bining decision boundary-aware uncertainty

with explanation function approximation un-

certainty. We introduce a novel geodesic-

based kernel, which captures the complex-

ity of the target black-box decision bound-

ary. We show theoretically that the pro-

posed kernel similarity increases with deci-

sion boundary complexity. The proposed

framework is highly ﬂexible; it can be used

with any black-box classiﬁer and feature at-

tribution method. Empirical results on mul-

tiple tabular and image datasets show that

the GPEC uncertainty estimate improves un-

derstanding of explanations as compared to

existing methods.

Proceedings of the 27th International Conference on Artiﬁ-

cial Intelligence and Statistics (AISTATS) 2024, Valencia,

thor(s).

1 INTRODUCTION

Post-hoc explainability methods have become a cru-

cial tool for understanding and diagnosing their black-

box model predictions. Recently, many such explainers

have been introduced in the category of local feature

attribution methods; that is, methods that return a

real-valued score representing each feature’s relative

importance for the model prediction. These explain-

ers are local in that they are not limited to using the

same decision rules throughout the data distribution,

therefore they are better able to represent nonlinear

and complex black-box models.

However, recent works have shown that local explain-

ers can be inconsistent or unstable. For example, ex-

plainers may yield highly dissimilar explanations for

similar samples (Alvarez-Melis and Jaakkola,2018;

Khan et al.,2023), exhibit sensitivity to impercepti-

ble perturbations (Dombrowski et al.,2019;Ghorbani

et al.,2019;Slack et al.,2020), or lack stability un-

der repeated application (Visani et al.,2022). When

working in high-stakes applications, it is imperative

to provide the user with an understanding of whether

an explanation is reliable, potentially problematic, or

even misleading. A way to guide users regarding an

explainer’s reliability is to provide corresponding un-

certainty quantiﬁcation estimates.

One can consider explainers as function approxima-

tors; as such, standard techniques for quantifying the

uncertainty of estimators can be utilized to quantify

the uncertainty of explainers. This is the strategy uti-

lized by existing methods that estimate explainer un-

certainty (e.g. (Slack et al.,2021;Schwab and Karlen,

2019)). However, we observe that for explainers, this

is not suﬃcient; in addition to uncertainty due to the

function approximation of explainers, explainers also

have to deal with the uncertainty due to the com-

plexity of the decision boundary (DB) of the black-box

arXiv:2210.02419v5 [cs.LG] 4 Mar 2024

Boundary-Aware Uncertainty for Feature Attribution Explainers

Sodium Intake (SI)

Cholesterol (CH)

Patient B

Patient A

Feature Importance

Decision Boundary

Sodium Intake (SI)

Cholesterol (CH)

Decision Boundary

Patient B

Patient A

Feature Importance

Figure 1: Illustrative example of potential pitfalls

when relying on local explainers for samples near com-

plex regions of the decision boundary (left) as com-

pared with a smoothed decision boundary (right).

model in the local region being explained.

Previous works investigating DB geometry have re-

lated higher DB complexity to increased model gen-

eralization error (Valle-Perez et al.,2019) and in-

creased adversarial vulnerability (Moosavi-Dezfooli

et al.,2019;Fawzi et al.,2018). Smoother DBs have

been shown to improve feature attributions (Wang

et al.,2020) and produce more consistent counterfac-

tual explanations (Black et al.,2022). Dombrowski

et al. (2019) show that, in ReLU networks, samples

with similar predictions can yield widely disparate

explanations, which can be regulated through model

smoothing. Consider the following example (Fig. 1):

a prediction model is used for a medical diagnosis us-

ing two features: cholesterol level and sodium intake.

We use the gradient with respect to each feature as

an estimate of feature importance. Patients A and

B have similar cholesterol and sodium levels and re-

ceive the same prediction, however, the complex deci-

sion boundary (left) results in a diﬀerent top feature

for each patient. In contrast, the smoothed decision

boundary (right) yields more consistent explanations.

We approach this problem from the perspective of sim-

ilarity: given two samples and their respective expla-

nations, how closely related should the explanations

be? From the previous intuition, we deﬁne this simi-

larity based on a geometric perspective of the DB com-

plexity between these two points. Speciﬁcally, we pro-

pose the novel Weighted Exponential Geodesic (WEG)

kernel, which encodes our expectation that two sam-

ples close in Euclidean space may not actually be sim-

ilar if the DB within a local neighborhood of the sam-

ples is highly complex.

Using this similarity formulation, we propose the

Gaussian Process Explanation UnCertainty (GPEC)

framework (Fig. 2), which is an instance-wise, model-

agnostic, and explainer-agnostic method to quantify

the explanation uncertainty. The proposed notion of

uncertainty is complementary to existing quantiﬁca-

tion methods. Existing methods primarily estimate

the uncertainty related to the choice in model param-

eters and ﬁtting the explainer, which we call function

approximation uncertainty, and does not capture un-

certainty related to DB complexity. GPEC can com-

bine the DB-based uncertainty with function approxi-

mation uncertainty derived from any local feature at-

tribution method.

In summary, we make the following contributions:

•We introduce a novel geometric perspective on

capturing explanation uncertainty and deﬁne a

geodesic-based similarity between explanations. We

prove theoretically that the proposed similarity cap-

tures the complexity of the decision boundary from

a given black-box classiﬁer.

•We propose a novel Gaussian Process-based frame-

work that combines 1) uncertainty from decision

boundary complexity and 2) explainer-speciﬁc func-

tion approximation uncertainty to generate uncer-

tainty estimates for any given feature attribution

method and black box model.

•Empirical results show GPEC uncertainty improves

understanding of feature attribution methods.

2 RELATED WORKS

Explanation Methods. A variety of methods have

been proposed for improving the transparency of pre-

trained black-box prediction models (Guidotti et al.,

2018;Barredo Arrieta et al.,2020). Within this cat-

egory of post-hoc methods, many methods focus on

local explanations, that is, explaining individual pre-

dictions rather than the entire model. Some of these

methods implement local feature selection (Chen et al.,

2018;Masoomi et al.,2020); others return a real-

valued score for each feature, termed feature attri-

bution methods, which are the primary focus of this

work. For example, LIME (Ribeiro et al.,2016) trains

a local linear regression model to approximate the

black-box model. Lundberg and Lee (2017) general-

izes LIME and ﬁve other feature attribution meth-

ods using the SHAP framework, which fulﬁll a num-

ber of desirable axioms. While LIME and SHAP are

model-agnostic, others are model-speciﬁc, such as neu-

ral networks (Bach et al.,2015;Shrikumar et al.,2017;

Sundararajan et al.,2017;Erion et al.,2021), tree

ensembles (Lundberg et al.,2020), or Bayesian neu-

ral networks (Bykov et al.,2020). Another class of

methods involves training surrogate models to explain

the black-box model (Dabkowski and Gal,2017;Chen

et al.,2018;Schwab and Karlen,2019;Guo et al.,2018;

Jethani et al.,2022).

Explanation Uncertainty. One option for improv-

ing explainer trustworthiness is to quantify their asso-

ciated uncertainty. Bootstrap resampling techniques

have been proposed to estimate uncertainty from

Davin Hill, Aria Masoomi, Max Torop, Sandesh Ghimire, Jennifer Dy

𝑥∗

Features

Black-Box Classifier 𝐹Black-Box Explainer 𝐻

Boundary Samples 𝑀" = " {𝑚 ∶ 𝐹(𝑚) " = 0.5}

GPEC

Prediction

Explanations

GPEC

Uncertainty

!𝜎!

"= 𝑉𝑎𝑟({𝐻#𝑥 , … 𝐻$𝑥 })

Explanation Noise

𝑘𝑊𝐸𝐺(𝑥, 𝑥’)

WEG Kernel

Data

Sample 𝑥∗

Features

Importance

Feature 1

Feature 2

Decision Boundary

Figure 2: Overview of the GPEC framework. GPEC takes samples from the classiﬁer’s decision boundary plus

(possibly noisy) explanations and ﬁts a GP model with the novel WEG Kernel. The GPEC estimate incorporates

both the uncertainty derived from the decision boundary complexity and also the explanation approximation

uncertainty from the explainer.

surrogate-based explainers (Schwab and Karlen,2019;

Schulz et al.,2022). Guo et al. (2018) also proposes

a surrogate explainer parameterized with a Bayesian

mixture model. Alternatively, Bykov et al. (2020) and

Patro et al. (2019) introduce methods for explaining

Bayesian neural networks, which can be transferred

to their non-Bayesian counterparts. Covert and Lee

(2021) derive an unbiased version of KernelSHAP and

investigates an eﬃcient way of estimating its uncer-

tainty. Zhang et al. (2019) categorizes diﬀerent sources

of variance in LIME estimates. Several methods also

investigate LIME and KernelSHAP in a Bayesian con-

text; for example, calculating a posterior over attribu-

tions (Slack et al.,2021), investigating priors for ex-

planations (Zhao et al.,2021), or using active learning

during sampling (Saini and Prasad,2022).

However, existing methods for quantifying explanation

uncertainty only consider the uncertainty of the ex-

plainer as a function approximator. This work intro-

duces an additional notion of uncertainty for explain-

ers that considers the complexity of the classiﬁer DB.

3 UNCERTAINTY FRAMEWORK

FOR EXPLAINERS

We now outline the GPEC framework (Fig. 2), which

is parametrized with a Gaussian Process (GP) regres-

sion model1. Consider a sample x∗∈ X that we

want to explain in the context of a black-box classi-

ﬁer F:X → [0,1], where X ⊆ RDis the data space

and Dis the number of features. For convenience we

consider the binary classiﬁcation case; this is extended

to multiclass in App. C. We apply a local feature at-

tribution explainer H:X → RD.

Recent works (e.g. Alvarez-Melis and Jaakkola (2018);

Dombrowski et al. (2019)) have shown that local ex-

1A brief review of GP regression is provided in App. B.

planations can lack robustness and stability related to

model complexity. Therefore, when explaining sam-

ples in high-stakes applications, it is critical to un-

derstand the behavior of the explainer, especially in

relation to other samples near x∗. More concretely, let

X∈RN×Drepresent a dataset of Nsamples. Here,

each row vector Xn∈RD,n∈Nrepresents a data

point. We apply Hto the rows of Xgenerating N

observed explanations, En∈RD,n∈N, which are

grouped into E∈RN×D. We can use these observed

sample-explanation pairs to infer the behavior of H

around x∗, however there are two main challenges.

First, we expect the similarity between the explana-

tions of Xand x∗to be dependent on F. In particular,

we expect that as the DB in a neighborhood around x∗

and a given sample Xnbecomes increasingly complex,

H(x∗) and H(Xn) may become more dissimilar; i.e.

H(Xn) may not contain useful information towards

inferring H(x∗). In this situation, the user should be

prompted to either draw additional samples near x∗,

or otherwise be warned of higher uncertainty. Sec-

ond, the observed explanations Ecan be noisy; many

explainers are stochastic and approximated with sam-

pling methods or a learned function.

To solve these challenges, we can model the explainer

with a vector-valued GP regression by treating the ex-

plainer as a latent function inferred using samples X

and explanations E. We model each explanation En

as being generated from a latent function Hplus inde-

pendent Gaussian noise ηn. For convenience, we con-

sider each feature dindependently; see App. Cfor

extensions.

En,d =Hd(Xn) + ηn,d s.t. Hd(Xn)∼ GP(0, k(·,·))

| {z }

Decision Boundary-Aware Uncertainty

(1)

s.t. ηn,d ∼ N (0, σ2

n,d)

| {z }

Function Approximation Uncertainty

where k(·,·) is the speciﬁed kernel function for the GP

Boundary-Aware Uncertainty for Feature Attribution Explainers

prior. We disentangle each explanation into two com-

ponents, H(Xn) and ηn, which represent two separate

sources of uncertainty: 1) a decision boundary-aware

uncertainty which we capture using the kernel similar-

ity, and 2) a function approximation uncertainty from

the explainer. After specifying H(Xn) and ηn, we can

combine the two sources by calculating the predictive

distribution for x∗. We take the variance of this dis-

tribution as the GPEC uncertainty estimate:

Vd[x∗] = k(x∗, x∗)−k(X, x∗)⊺[K+σ2

dIN]−1k(X, x∗)

(2)

where K∈RN×Nis the kernel matrix s.t. Kij =

k(Xi, Xj)∀i, j ∈ {1...N},k(X, x∗)∈RN×1has ele-

ments k(X, x∗)i=k(Xi, x∗)i∈ {1...N},σ2

d∈RN

+is

the variance parameter for explanation noise, and INis

the identity matrix. From Eq. (2) we see that predic-

tive variance captures DB-aware uncertainty through

the kernel function k(·,·), and also the function ap-

proximation uncertainty through the σ2

dINterm.

Function Approximation Uncertainty. The ηn

component of Eq. (1) represents the uncertainty

stemming from explainer estimation. For example,

ηncan represent the variance due to sampling (e.g.

perturbation-based explainers) or explainer training

(e.g. surrogate-based explainers). Explainers that in-

clude some estimate of uncertainty (e.g. BayesLIME,

BayesSHAP, CXPlain) can be directly used to esti-

mate σ2

n. For other stochastic explanation methods,

we can estimate σ2

nempirically by resampling Jex-

planations for the same sample Xn:

ˆσ2

n=1

|J|

i=1 Hi(Xn)−1

|J|

j=1

Hj(Xn)2(3)

where each Hi(Xn) is a sampled explanation. Alter-

natively, for deterministic explanation methods we can

omit the ηnterm and assume noiseless explanations.

Decision Boundary-Aware Uncertainty. In con-

trast, the H(Xn) component of Eq. (1) represents the

distribution of functions that could have generated the

observed explanations. The choice of kernel k(·,·) en-

codes our a priori assumption regarding the similarity

between explanations based on the similarity of their

corresponding inputs. In other words, given two sam-

ples x, x′∈ X , how much information do we expect

a given explanation H(x) to provide for a nearby ex-

planation H(x′)? As the DB between H(x) and H(x′)

becomes more complex, we would expect for this infor-

mation to decrease. In Section 4, we consider a novel

kernel formulation that reﬂects the complexity of the

DB in a local neighborhood of the samples.

4 WEG KERNEL

Intuitively, the GP kernel encodes the assumption that

each explanation provides some information about

other nearby explanations, which is deﬁned through

kernel similarity. To capture boundary-aware uncer-

tainty, we want to deﬁne a similarity k(x, x′) that is

inversely related to the complexity or smoothness of

the DB between x, x′∈ X .

4.1 Geometry of the Decision Boundary

We represent the DB as a hypersurface embedded in

RDwith co-dimension one. Given the classiﬁer F, we

deﬁne the DB2as MF={m∈RD:F(m) = 1

2}. For

any two points m, m′∈ MF, let γ: [0,1] → MFbe

a diﬀerentiable map such that γ(0) = mand γ(1) =

m′, representing a 1-dimensional curve on MF. We

can then deﬁne distances along the DB as geodesic

distances in MF(Fig 3A):

dgeo(m, m′) = min

γZ1

|| ˙γ(t)||dt∀m, m′∈ MF(4)

The relative complexity of the DB can be character-

ized by the geodesic distances between points on the

DB. For example, the simplest form that the DB can

take is a linear boundary. Consider a black-box model

with linear DB M1. For two points z, z′∈ M1,

dgeo(z, z′) = ||z−z′||2which corresponds with the

minimum geodesic distance in the ambient space. For

any nonlinear DB M2that also contains z, z′, it fol-

lows that dgeo(z, z′)>||z−z′||2. As the complexity

of the DB increases, there is a general corresponding

increase in geodesic distances between ﬁxed points on

the DB. We can adapt geodesic distance in our kernel

selection through the exponential geodesic (EG) kernel

(Feragen et al.,2015).

kEG(m, m′) = exp [−λdgeo(m, m′)] (5)

The EG kernel has been previously investigated in the

context of Riemannian manifolds (Feragen et al.,2015;

Feragen and Hauberg,2016). In particular, while prior

work shows that the EG kernel fails to be positive

deﬁnite for all values of λin non-Euclidean space, there

exists large intervals of λ > 0 for which the EG kernel

is positive deﬁnite. Appropriate values can be selected

through grid search and cross validation; we assume

that a valid value of λhas been selected.

Therefore, by sampling MF, we can use the EG kernel

matrix to capture DB complexity. However, a chal-

lenge remains in relating points x, x′∈ X \ MFto the

nearby DB. In Section 4.2 we consider a continuous

weighting over MFbased on distance to x, x′.

2Without loss of generality, we assume that the classiﬁer

decision rule is 1

Davin Hill, Aria Masoomi, Max Torop, Sandesh Ghimire, Jennifer Dy

(A) (B) (C)

𝑚′

𝑚

𝑥

𝑥𝑥

𝑥

𝑥𝑥′

𝑚!

𝑚"

Class 2

Class 1

Geodesic

Figure 3: Consider a classiﬁer with DB deﬁned as M0={(x1, f (x1)) : x1∈R>0}where f(x1) = 2 cos( 10

x1).

(A) Illustration of geodesic distance dgeo(m, m′) between two points m′, m ∈ M0.(B) Evaluation of the WEG

kernel for M0(top) and a linear DB (below). The gray region highlights the set {x′:k(x, x′)≥0.9}for a given x

(red). This region increases as the local DB become more linear. (C) During WEG approximation, we calculate

Euclidean distances between x, x′(red, green) and DB samples m1, ..., mJ∈ M0(blue). When appropriately

normalized (Eq. (6)), this acts as a weighting for each element of the EG kernel.

4.2 Weighting Decision Boundary Samples

Let p(m) denote a distribution with support de-

ﬁned over MFsuch that we can draw DB samples

m1...mJ∼p(m) using a DB sampling algorithm (see

Sec. 4.4). We weight p(m) according to the ℓ2norm

between mand a ﬁxed data sample xto create a

weighted distribution q(m|x, ρ):

q(m|x, ρ)∝exp −ρ||x−m||2

2p(m) (6)

where ρrepresents a hyperpameter that controls the

sensitivity of the weighting. We can then deﬁne the

kernel function kWEG(x, x′) by taking the expected

value over the weighted distributions.

kWEG(x, x′) = Z Z kEG(m, m′)

×q(m|x, ρ)q(m′|x′, ρ) dmdm′(7)

ZmZm′Z Z exp [−λdgeo(m, m′)]

×exp −ρ(||x−m||2

2+||x′−m′||2

2)p(m)p(m′)dmdm′

(8)

where Zm, Zm′are normalizing constants for q(m|x, ρ)

and q(m′|x′, ρ), respectively. Eq. (8) is an example

of a marginalized kernel (Tsuda et al.,2002): a kernel

deﬁned by the expected value of observed samples x, x′

over latent variables m, m′. Given that the underlying

EG kernel is positive deﬁnite, it follows that the WEG

kernel forms a valid kernel.

With the WEG kernel, we can calculate a similarity

between x, x′∈ X that decreases as the complexity of

the DB segments between the two points increases. In

Fig. 3B we evaluate the WEG kernel similarity on non-

linear and a linear DB. We observe that WEG similar-

ity reﬂects the complexity of the DB; as the decision

boundary becomes more linear in a local region, the

similarity between neighboring points increases. To

evaluate the WEG kernel theoretically, we consider

two properties. Theorem 1establishes that the EG

kernel is a special case of the WEG kernel for when

x, x′∈ X ∩ MF.

Theorem 1. Given two points x, x′∈ X ∩ MF, then

limρ→∞ kWEG(x, x′) = kEG(x, x′)

Proof details are shown in App. C.1. Intuitively, as

ρincreases the manifold distribution closest to the

points x, x′becomes weighted increasingly heavily. At

the limit, the weighting concentrates entirely on x, x′

themselves, which recovers the EG kernel. Therefore

we see that the WEG kernel is a generalization of the

EG kernel with a weighting controlled by ρ.

Theorem 2establishes the inverse relationship between

DB complexity and WEG similarity. Given a classiﬁer

with a piecewise linear DB, we show that this DB rep-

resents a local maximum with respect to WEG kernel

similarity; i.e. as we perturb the DB to be nonlinear,

kernel similarity decreases. We ﬁrst deﬁne perturba-

tions on the DB. Note that int(S) indicates the interior

of a set Sand id indicates the identity mapping.

Deﬁnition 1 (Manifold Perturbation).Let {Uα}α∈I

be charts of an atlas for a manifold P ⊂ RD, where

Iis a set of indices. Let Pand e

Pbe diﬀerentiable

manifolds embedded in RD, where Pis a Piecewise

Linear manifold. Let R:P → e

Pbe a diﬀeomorphism.

We say e

Pis a perturbation of Pon the ith chart if R

satisﬁes the following two conditions: 1

○There exists a

compact subset Ki⊂Uis.t. R|P\int(Ki)= id|P\int(Ki)

and R|int(Ki)̸= id|int(Ki).2

○There exists a linear

homeomorphism between an open subset f

Ui⊆Uiwith

Rd−1which contains Ki.

Theorem 2. Let Pbe a (d–1)-dimension Piecewise

Linear manifold embedded in RD. Let e

Pbe a per-

turbation of Pand deﬁne ˜

k(x, x′)and k(x, x′)as the

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Boundary-AwareUncertaintyforFeatureAttributionExplainersDavinHillAriaMasoomiMaxToropNortheasternUniversitydhill@ece.neu.eduNortheasternUniversitymasoomi.a@northeastern.eduNortheasternUniversitytorop.m@northeastern.eduSandeshGhimireJenniferDyNortheasternUniversitydrsandeshghimire@gmail.comNortheaster...

展开>> 收起<<

Boundary-Aware Uncertainty for Feature Attribution Explainers.pdf

共30页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Boundary-Aware Uncertainty for Feature Attribution Explainers

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: