Towards Formal XAI Formally Approximate Minimal Explanations of Neural Networks Shahaf Bassan and Guy Katz

2025-05-06 0 0 3.41MB 30 页 10玖币

侵权投诉

Towards Formal XAI: Formally Approximate

Minimal Explanations of Neural Networks

Shahaf Bassan and Guy Katz

The Hebrew University of Jerusalem, Jerusalem, Israel

{shahaf.bassan, g.katz}@mail.huji.ac.il

Abstract. With the rapid growth of machine learning, deep neural net-

works (DNNs) are now being used in numerous domains. Unfortunately,

DNNs are “black-boxes”, and cannot be interpreted by humans, which is

a substantial concern in safety-critical systems. To mitigate this issue, re-

searchers have begun working on explainable AI (XAI) methods, which

can identify a subset of input features that are the cause of a DNN’s

decision for a given input. Most existing techniques are heuristic, and

cannot guarantee the correctness of the explanation provided. In con-

trast, recent and exciting attempts have shown that formal methods can

be used to generate provably correct explanations. Although these meth-

ods are sound, the computational complexity of the underlying veriﬁca-

tion problem limits their scalability; and the explanations they produce

might sometimes be overly complex. Here, we propose a novel approach

to tackle these limitations. We (i) suggest an eﬃcient, veriﬁcation-based

method for ﬁnding minimal explanations, which constitute a provable

approximation of the global, minimum explanation; (ii) show how DNN

veriﬁcation can assist in calculating lower and upper bounds on the op-

timal explanation; (iii) propose heuristics that signiﬁcantly improve the

scalability of the veriﬁcation process; and (iv) suggest the use of bundles,

which allows us to arrive at more succinct and interpretable explanations.

Our evaluation shows that our approach signiﬁcantly outperforms state-

of-the-art techniques, and produces explanations that are more useful to

humans. We thus regard this work as a step toward leveraging veriﬁcation

technology in producing DNNs that are more reliable and comprehensi-

ble.

1 Introduction

Machine learning (ML) is a rapidly growing ﬁeld with a wide range of applica-

tions, including safety-critical, high-risk systems in the ﬁelds of health care [19],

aviation [39] and autonomous driving [12]. Despite their success, ML models,

and especially deep neural networks (DNNs), remain “black-boxes” — they are

incomprehensible to humans and are prone to unexpected behaviour and errors.

This issue can result in major catastrophes [13, 74], and also in poor decision-

making due to brittleness or bias [8, 25].

In order to render DNNs more comprehensible to humans, researchers have

been working on explainable AI (XAI ), where we seek to construct models for

arXiv:2210.13915v2 [cs.LG] 9 Feb 2023

explaining and interpreting the decisions of DNNs [51, 56–58]. Work to date has

focused on heuristic approaches, which provide explanations, but do not provide

guarantees about the correctness or succinctness of these explanations [14,33,45].

Although these approaches are an important step, their limitations might result

in skewed results, possibly failing to meet the regulatory guidelines of institu-

tions and organizations such as the European Union, the US government, and

the OECD [52]. Thus, producing DNN explanations that are provably accurate

remains of utmost importance.

More recently, the formal veriﬁcation community has proposed approaches

for providing formal and rigorous explanations for DNN decision making [28,32,

52, 60]. Many of these approaches rely on the recent and rapid developments in

DNN veriﬁcation [1, 9, 10, 40]. These approaches typically produce an abductive

explanation (also known as a prime implicant, or PI-explanation) [32, 59, 60]:

a minimum subset of input features, which by themselves already determine

the classiﬁcation produced by the DNN, regardless of any other input features.

These explanations aﬀord formal guarantees, and can be computed via DNN

veriﬁcation [32].

Abductive explanations are highly useful, but there are two major diﬃcul-

ties in computing them. First, there is the issue of scalability: computing locally

minimal explanations might require a polynomial number of costly invocations

of the underlying DNN veriﬁer, and computing a globally minimal explanation

is even more challenging [?,32,49]. The second diﬃculty is that users may some-

times prefer “high-level” explanations, not based solely on input features, as

these may be easier to grasp and interpret compared to “low-level”, complex,

feature-based explanations.

To tackle the ﬁrst diﬃculty, we propose here new approaches for more eﬃ-

ciently producing veriﬁcation-based abductive explanations. More concretely, we

propose a method for provably approximating minimum explanations, allowing

stakeholders to use slightly larger explanations that can be discovered much more

quickly. To accomplish this, we leverage the recently discovered dual relationship

between explanations and contrastive examples [31]; and also take advantage of

the sensitivity of DNNs to small adversarial perturbations [65], to compute both

lower and upper bounds for the minimum explanation. In addition, we propose

novel heuristics for signiﬁcantly expediting the underlying veriﬁcation process.

In addressing the second diﬃculty, i.e. the interpretability limitations of “low-

level” explanations, we propose to construct explanations in terms of bundles,

which are sets of related features. We empirically show that using our method

to produce bundle explanations can signiﬁcantly improve the interpretability of

the results, and even the scalability of the approach, while still maintaining the

soundness of the resulting explanations.

To summarize, our contributions include the following: (i) We are the ﬁrst

to suggest a method that formally produces sound and minimal abductive ex-

planations that provably approximate the global-minimum explanation. (ii) Our

three suggested novel heuristics expedite the search for minimal abductive ex-

planations, signiﬁcantly outperforming the state of the art. (iii) We suggest a

novel approach for using bundles to eﬃciently produce sound and provable ex-

planations that are more interpretable and succinct.

For evaluation purposes, we implemented our approach as a proof-of-concept

tool. Although our method can be applied to any ML model, we focused here

on DNNs, where the veriﬁcation process is known to be NP-complete [40], and

the scalable generation of explanations is known to be challenging [32, 59]. We

used our tool to test the approach on DNNs trained for digit and clothing classi-

ﬁcation, and also compared it to state-of-the-art approaches [32,33]. Our results

indicate that our approach was successful in quickly producing meaningful ex-

planations, often running 40% faster than existing tools. We believe that these

promising results showcase the potential of this line of work.

The rest of the paper is organized as follows. Sec. 2 contains background on

DNNs and their veriﬁcation, as well as on formal, minimal explanations. Sec. 3

covers the main method for calculating approximations of minimum explana-

tions, and Sec. 4 covers methods for improving the eﬃciency of calculating these

approximations. Sec. 5 covers the use of bundles in constructing “high-level”,

provable explanations. Next, we present our evaluation in Sec. 6. Related work

is covered in Sec. 7, and we conclude in Sec. 8.

2 Background

DNNs. A deep neural network (DNN) [47] is a directed graph composed of

layers of nodes, commonly called neurons. In feed-forward NNs the data ﬂows

from the ﬁrst (input) layer, through intermediate (hidden) layers, and onto an

output layer. A DNN’s output is calculated by assigning values to its input

neurons, and then iteratively calculating the values of neurons in subsequent

layers. In the case of classiﬁcation, which is the focus of this paper, each output

neuron corresponds to a speciﬁc class, and the output neuron with the highest

value corresponds to the class the input is classiﬁed to.

-3

-4

-1

ReLU

Fig. 1: A simple DNN.

Fig. 1 depicts a simple, feed-forward

DNN. The input layer includes three neu-

rons, followed by a weighted sum layer,

which calculates an aﬃne transformation

of values from the input layer. Given the

input V1=[1,1,1]T, the second layers

computes the values V2=[6,9,11]T. Next

comes a ReLU layer, which computes the

function ReLU(x)=max(0, x)for each

neuron in the preceding layer, resulting in

V3=[6,9,11]T. The ﬁnal (output) layer then computes an aﬃne transformation,

resulting in V4=[15,−4]T. This indicates that input V1=[1,1,1]Tis classiﬁed

as the category corresponding to the ﬁrst output neuron, which is assigned the

greater value.

DNN Veriﬁcation. A DNN veriﬁcation query is a tuple ⟨P, N, Q⟩, where Nis a

DNN that maps an input vector xto an output vector y=N(x),Pis a predicate

on x, and Qis a predicate on y. A DNN veriﬁer needs to decide whether there

exists an input x0that satisﬁes P(x0)∧Q(N(x0)) (the SAT case) or not (the

UNSAT case). Typically, Pand Qare expressed in the logic of real arithmetic [50].

The DNN veriﬁcation problem is known to be NP-Complete [40].

Formal Explanations. We focus here on explanations for classiﬁcation prob-

lems, where a model is trained to predict a label for each given input. A clas-

siﬁcation problem is a tuple ⟨F, D, K, N⟩where (i) F={1, ..., m}denotes the

features; (ii) D={D1, D2..., Dm}denotes the domains of each of the features,

i.e. the possible values that each feature can take. The entire feature (input)

space is hence F=D1×D2×... ×Dm; (iii) K={c1, c2, ..., cn}is a set of classes,

i.e. the possible labels; and (iv) N∶F→Kis a (non-constant) classiﬁcation

function (in our case, a neural network). A classiﬁcation instance is the pair

(v, c), where v∈F,c∈K, and c=N(v). In other words, vis mapped by the

neural network Nto class c.

Looking at (v, c), we often wish to know why vwas classiﬁed as c. Informally,

an explanation is a subset of features E⊆F, such that assigning these features

to the values assigned to them in valready determines that the input will be

classiﬁed as c, regardless of the remaining features F∖E. In other words, even

if the values that are not in the explanation are changed arbitrarily, the classiﬁ-

cation remains the same. More formally, given input v=(v1, ...vm)∈Fwith the

classiﬁcation N(v)=c, an explanation (sometimes referred to as an abductive

explanation, or an AXP) is a subset of the features E⊆F, such that:

∀(x∈F).[⋀

i∈E

(xi=vi)→(N(x)=c)] (1)

We continue with the running example from Fig. 1. For simplicity, we assume

that each input neuron can only be assigned the values 0 or 1. It can be observed

that for input V1=[1,1,1]T, the set {v1

1, v2

1}is an explanation; indeed, once the

ﬁrst two entries in V1are set to 1, the classiﬁcation remains the same for any

value of the third entry (see Fig. 2). We can prove this by encoding a veriﬁcation

query ⟨P, N, Q⟩=⟨E=v, N, Q¬c⟩, where Eis the candidate explanation, and

E=vmeans that we restrict the features in Eto their values in v; and Q¬c

implies that the classiﬁcation is not c. An UNSAT result for this query indicates

that Eis an explanation for instance (v, c).

Clearly, the set of all features constitutes a trivial explanation. However,

we are interested in smaller explanation subsets, which can provide useful in-

formation regarding the decision of the classiﬁer. More precisely, we search for

minimal explanations and minimum explanations. A subset E⊆Fis a minimal

explanation (also referred to as a local-minimal explanation, or a subset-minimal

explanation) of instance (v, c)if it is an explanation that ceases to be an expla-

nation if even a single feature is removed from it:

(∀(x∈F).[∧i∈E(xi=vi)→(N(x)=c)])∧

(∀(j∈E).[∃(y∈F).[∧i∈E∖j(yi=vi)∧(N(y)≠c)]) (2)

Fig. 3 demonstrates that {v1

1, v2

1}is a minimal explanation in our running ex-

ample: removing any of its features allows mis-classiﬁcation.

-3

-4

-1

ReLU

-3

-4

-1

ReLU

Fig. 2: {v1

1, v2

1}is an explanation for input V1=[1,1,1]T

-3

-4

-1

ReLU

-3

-4

-1

ReLU

Fig. 3: {v1

1, v2

1}is a minimal explanation for input V1=[1,1,1]T.

Aminimum explanation (sometimes referred to as a cardinal minimal ex-

planation or a PI-explanation) is deﬁned as a minimal explanation of minimum

size; i.e., if Eis a minimum explanation, then there does not exist a minimal

explanation E′≠Esuch that ∣E′∣<∣E∣. Fig. 4 demonstrates that {v3

1}is a

minimum explanation for our running example.

-3

-4

-1

ReLU

-3

-4

-1

ReLU

-3

-4

-1

ReLU

Fig. 4: {v3

1}is a minimum explanation for input V1=[1,1,1]T.

Contrastive Example. A subset of features C⊆Fis called a contrastive exam-

ple or a contrastive explanation (CXP) if altering the features in Cis suﬃcient

to cause the misclassiﬁcation of a given classiﬁcation instance (v, c):

∃(x∈F).[∧i∈F∖C(xi=vi)∧(N(x)≠c)] (3)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TowardsFormalXAI:FormallyApproximateMinimalExplanationsofNeuralNetworksShahafBassanandGuyKatzTheHebrewUniversityofJerusalem,Jerusalem,Israelfshahaf.bassan,g.katzg@mail.huji.ac.ilAbstract.Withtherapidgrowthofmachinelearning,deepneuralnet-works(DNNs)arenowbeingusedinnumerousdomains.Unfortunately,DNNsa...

展开>> 收起<<

Towards Formal XAI Formally Approximate Minimal Explanations of Neural Networks Shahaf Bassan and Guy Katz.pdf

共30页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Towards Formal XAI Formally Approximate Minimal Explanations of Neural Networks Shahaf Bassan and Guy Katz

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: