Towards Formal XAI Formally Approximate Minimal Explanations of Neural Networks Shahaf Bassan and Guy Katz

2025-05-06 0 0 3.41MB 30 页 10玖币
侵权投诉
Towards Formal XAI: Formally Approximate
Minimal Explanations of Neural Networks
Shahaf Bassan and Guy Katz
The Hebrew University of Jerusalem, Jerusalem, Israel
{shahaf.bassan, g.katz}@mail.huji.ac.il
Abstract. With the rapid growth of machine learning, deep neural net-
works (DNNs) are now being used in numerous domains. Unfortunately,
DNNs are “black-boxes”, and cannot be interpreted by humans, which is
a substantial concern in safety-critical systems. To mitigate this issue, re-
searchers have begun working on explainable AI (XAI) methods, which
can identify a subset of input features that are the cause of a DNN’s
decision for a given input. Most existing techniques are heuristic, and
cannot guarantee the correctness of the explanation provided. In con-
trast, recent and exciting attempts have shown that formal methods can
be used to generate provably correct explanations. Although these meth-
ods are sound, the computational complexity of the underlying verifica-
tion problem limits their scalability; and the explanations they produce
might sometimes be overly complex. Here, we propose a novel approach
to tackle these limitations. We (i) suggest an efficient, verification-based
method for finding minimal explanations, which constitute a provable
approximation of the global, minimum explanation; (ii) show how DNN
verification can assist in calculating lower and upper bounds on the op-
timal explanation; (iii) propose heuristics that significantly improve the
scalability of the verification process; and (iv) suggest the use of bundles,
which allows us to arrive at more succinct and interpretable explanations.
Our evaluation shows that our approach significantly outperforms state-
of-the-art techniques, and produces explanations that are more useful to
humans. We thus regard this work as a step toward leveraging verification
technology in producing DNNs that are more reliable and comprehensi-
ble.
1 Introduction
Machine learning (ML) is a rapidly growing field with a wide range of applica-
tions, including safety-critical, high-risk systems in the fields of health care [19],
aviation [39] and autonomous driving [12]. Despite their success, ML models,
and especially deep neural networks (DNNs), remain “black-boxes” — they are
incomprehensible to humans and are prone to unexpected behaviour and errors.
This issue can result in major catastrophes [13, 74], and also in poor decision-
making due to brittleness or bias [8, 25].
In order to render DNNs more comprehensible to humans, researchers have
been working on explainable AI (XAI ), where we seek to construct models for
arXiv:2210.13915v2 [cs.LG] 9 Feb 2023
explaining and interpreting the decisions of DNNs [51, 56–58]. Work to date has
focused on heuristic approaches, which provide explanations, but do not provide
guarantees about the correctness or succinctness of these explanations [14,33,45].
Although these approaches are an important step, their limitations might result
in skewed results, possibly failing to meet the regulatory guidelines of institu-
tions and organizations such as the European Union, the US government, and
the OECD [52]. Thus, producing DNN explanations that are provably accurate
remains of utmost importance.
More recently, the formal verification community has proposed approaches
for providing formal and rigorous explanations for DNN decision making [28,32,
52, 60]. Many of these approaches rely on the recent and rapid developments in
DNN verification [1, 9, 10, 40]. These approaches typically produce an abductive
explanation (also known as a prime implicant, or PI-explanation) [32, 59, 60]:
a minimum subset of input features, which by themselves already determine
the classification produced by the DNN, regardless of any other input features.
These explanations afford formal guarantees, and can be computed via DNN
verification [32].
Abductive explanations are highly useful, but there are two major difficul-
ties in computing them. First, there is the issue of scalability: computing locally
minimal explanations might require a polynomial number of costly invocations
of the underlying DNN verifier, and computing a globally minimal explanation
is even more challenging [?,32,49]. The second difficulty is that users may some-
times prefer “high-level” explanations, not based solely on input features, as
these may be easier to grasp and interpret compared to “low-level”, complex,
feature-based explanations.
To tackle the first difficulty, we propose here new approaches for more effi-
ciently producing verification-based abductive explanations. More concretely, we
propose a method for provably approximating minimum explanations, allowing
stakeholders to use slightly larger explanations that can be discovered much more
quickly. To accomplish this, we leverage the recently discovered dual relationship
between explanations and contrastive examples [31]; and also take advantage of
the sensitivity of DNNs to small adversarial perturbations [65], to compute both
lower and upper bounds for the minimum explanation. In addition, we propose
novel heuristics for significantly expediting the underlying verification process.
In addressing the second difficulty, i.e. the interpretability limitations of “low-
level” explanations, we propose to construct explanations in terms of bundles,
which are sets of related features. We empirically show that using our method
to produce bundle explanations can significantly improve the interpretability of
the results, and even the scalability of the approach, while still maintaining the
soundness of the resulting explanations.
To summarize, our contributions include the following: (i) We are the first
to suggest a method that formally produces sound and minimal abductive ex-
planations that provably approximate the global-minimum explanation. (ii) Our
three suggested novel heuristics expedite the search for minimal abductive ex-
planations, significantly outperforming the state of the art. (iii) We suggest a
novel approach for using bundles to efficiently produce sound and provable ex-
planations that are more interpretable and succinct.
For evaluation purposes, we implemented our approach as a proof-of-concept
tool. Although our method can be applied to any ML model, we focused here
on DNNs, where the verification process is known to be NP-complete [40], and
the scalable generation of explanations is known to be challenging [32, 59]. We
used our tool to test the approach on DNNs trained for digit and clothing classi-
fication, and also compared it to state-of-the-art approaches [32,33]. Our results
indicate that our approach was successful in quickly producing meaningful ex-
planations, often running 40% faster than existing tools. We believe that these
promising results showcase the potential of this line of work.
The rest of the paper is organized as follows. Sec. 2 contains background on
DNNs and their verification, as well as on formal, minimal explanations. Sec. 3
covers the main method for calculating approximations of minimum explana-
tions, and Sec. 4 covers methods for improving the efficiency of calculating these
approximations. Sec. 5 covers the use of bundles in constructing “high-level”,
provable explanations. Next, we present our evaluation in Sec. 6. Related work
is covered in Sec. 7, and we conclude in Sec. 8.
2 Background
DNNs. A deep neural network (DNN) [47] is a directed graph composed of
layers of nodes, commonly called neurons. In feed-forward NNs the data flows
from the first (input) layer, through intermediate (hidden) layers, and onto an
output layer. A DNN’s output is calculated by assigning values to its input
neurons, and then iteratively calculating the values of neurons in subsequent
layers. In the case of classification, which is the focus of this paper, each output
neuron corresponds to a specific class, and the output neuron with the highest
value corresponds to the class the input is classified to.
2
-3
-4
0
2
3
2
3
3
5
7
6
0
0
0
1
-1
1
-1
0
1
ReLU
ReLU
ReLU
Fig. 1: A simple DNN.
Fig. 1 depicts a simple, feed-forward
DNN. The input layer includes three neu-
rons, followed by a weighted sum layer,
which calculates an affine transformation
of values from the input layer. Given the
input V1=[1,1,1]T, the second layers
computes the values V2=[6,9,11]T. Next
comes a ReLU layer, which computes the
function ReLU(x)=max(0, x)for each
neuron in the preceding layer, resulting in
V3=[6,9,11]T. The final (output) layer then computes an affine transformation,
resulting in V4=[15,4]T. This indicates that input V1=[1,1,1]Tis classified
as the category corresponding to the first output neuron, which is assigned the
greater value.
DNN Verification. A DNN verification query is a tuple P, N, Q, where Nis a
DNN that maps an input vector xto an output vector y=N(x),Pis a predicate
on x, and Qis a predicate on y. A DNN verifier needs to decide whether there
exists an input x0that satisfies P(x0)Q(N(x0)) (the SAT case) or not (the
UNSAT case). Typically, Pand Qare expressed in the logic of real arithmetic [50].
The DNN verification problem is known to be NP-Complete [40].
Formal Explanations. We focus here on explanations for classification prob-
lems, where a model is trained to predict a label for each given input. A clas-
sification problem is a tuple F, D, K, Nwhere (i) F={1, ..., m}denotes the
features; (ii) D={D1, D2..., Dm}denotes the domains of each of the features,
i.e. the possible values that each feature can take. The entire feature (input)
space is hence F=D1×D2×... ×Dm; (iii) K={c1, c2, ..., cn}is a set of classes,
i.e. the possible labels; and (iv) NFKis a (non-constant) classification
function (in our case, a neural network). A classification instance is the pair
(v, c), where vF,cK, and c=N(v). In other words, vis mapped by the
neural network Nto class c.
Looking at (v, c), we often wish to know why vwas classified as c. Informally,
an explanation is a subset of features EF, such that assigning these features
to the values assigned to them in valready determines that the input will be
classified as c, regardless of the remaining features FE. In other words, even
if the values that are not in the explanation are changed arbitrarily, the classifi-
cation remains the same. More formally, given input v=(v1, ...vm)Fwith the
classification N(v)=c, an explanation (sometimes referred to as an abductive
explanation, or an AXP) is a subset of the features EF, such that:
(xF).[
iE
(xi=vi)(N(x)=c)] (1)
We continue with the running example from Fig. 1. For simplicity, we assume
that each input neuron can only be assigned the values 0 or 1. It can be observed
that for input V1=[1,1,1]T, the set {v1
1, v2
1}is an explanation; indeed, once the
first two entries in V1are set to 1, the classification remains the same for any
value of the third entry (see Fig. 2). We can prove this by encoding a verification
query P, N, Q=E=v, N, Q¬c, where Eis the candidate explanation, and
E=vmeans that we restrict the features in Eto their values in v; and Q¬c
implies that the classification is not c. An UNSAT result for this query indicates
that Eis an explanation for instance (v, c).
Clearly, the set of all features constitutes a trivial explanation. However,
we are interested in smaller explanation subsets, which can provide useful in-
formation regarding the decision of the classifier. More precisely, we search for
minimal explanations and minimum explanations. A subset EFis a minimal
explanation (also referred to as a local-minimal explanation, or a subset-minimal
explanation) of instance (v, c)if it is an explanation that ceases to be an expla-
nation if even a single feature is removed from it:
((xF).[iE(xi=vi)(N(x)=c)])
((jE).[(yF).[iEj(yi=vi)(N(y)c)]) (2)
Fig. 3 demonstrates that {v1
1, v2
1}is a minimal explanation in our running ex-
ample: removing any of its features allows mis-classification.
2
-3
-4
0
2
3
2
3
3
5
7
6
0
0
0
1
-1
1
-1
0
1
ReLU
ReLU
ReLU
2
-3
-4
0
2
3
2
3
3
5
7
6
0
0
0
1
-1
1
-1
0
1
ReLU
ReLU
ReLU
Fig. 2: {v1
1, v2
1}is an explanation for input V1=[1,1,1]T
2
-3
-4
0
2
3
2
3
3
5
7
6
0
0
0
1
-1
1
-1
0
1
ReLU
ReLU
ReLU
2
-3
-4
0
2
3
2
3
3
5
7
6
0
0
0
1
-1
1
-1
0
1
ReLU
ReLU
ReLU
Fig. 3: {v1
1, v2
1}is a minimal explanation for input V1=[1,1,1]T.
Aminimum explanation (sometimes referred to as a cardinal minimal ex-
planation or a PI-explanation) is defined as a minimal explanation of minimum
size; i.e., if Eis a minimum explanation, then there does not exist a minimal
explanation EEsuch that E<E. Fig. 4 demonstrates that {v3
1}is a
minimum explanation for our running example.
2
-3
-4
0
2
3
2
3
3
5
7
6
0
0
0
1
-1
1
-1
0
1
ReLU
ReLU
ReLU
2
-3
-4
0
2
3
2
3
3
5
7
6
0
0
0
1
-1
1
-1
0
1
ReLU
ReLU
ReLU
2
-3
-4
0
2
3
2
3
3
5
7
6
0
0
0
1
-1
1
-1
0
1
ReLU
ReLU
ReLU
Fig. 4: {v3
1}is a minimum explanation for input V1=[1,1,1]T.
Contrastive Example. A subset of features CFis called a contrastive exam-
ple or a contrastive explanation (CXP) if altering the features in Cis sufficient
to cause the misclassification of a given classification instance (v, c):
(xF).[iFC(xi=vi)(N(x)c)] (3)
摘要:

TowardsFormalXAI:FormallyApproximateMinimalExplanationsofNeuralNetworksShahafBassanandGuyKatzTheHebrewUniversityofJerusalem,Jerusalem,Israelfshahaf.bassan,g.katzg@mail.huji.ac.ilAbstract.Withtherapidgrowthofmachinelearning,deepneuralnet-works(DNNs)arenowbeingusedinnumerousdomains.Unfortunately,DNNsa...

展开>> 收起<<
Towards Formal XAI Formally Approximate Minimal Explanations of Neural Networks Shahaf Bassan and Guy Katz.pdf

共30页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:30 页 大小:3.41MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 30
客服
关注