Neural Networks are Decision Trees Caglar Aytekin AI Lead

2025-05-02 0 0 336.75KB 8 页 10玖币

侵权投诉

Neural Networks are Decision Trees

Caglar Aytekin

AI Lead

AAC Technologies

caglaraytekin@aactechnologies.com, cagosmail@gmail.com

Abstract

In this manuscript, we show that any neural network

with any activation function can be represented as a de-

cision tree. The representation is equivalence and not an

approximation, thus keeping the accuracy of the neural net-

work exactly as is. We believe that this work provides bet-

ter understanding of neural networks and paves the way to

tackle their black-box nature. We share equivalent trees

of some neural networks and show that besides providing

interpretability, tree representation can also achieve some

computational advantages for small networks. The analysis

holds both for fully connected and convolutional networks,

which may or may not also include skip connections and/or

normalizations.

1. Introduction

Despite the immense success of neural networks over the

past decade, the black-box nature of their predictions pre-

vent their wider and more reliable adoption in many indus-

tries, such as health and security. This fact led researchers to

investigate ways to explain neural network decisions. The

efforts in explaining neural network decisions can be cate-

gorized into several approaches: saliency maps, approxima-

tion by interpretable methods and joint models.

Saliency maps are ways of highlighting areas on the in-

put, of which a neural network make use of the most while

prediction. An earlier work [20] takes the gradient of the

neural network output with respect to the input in order to

visualize an input-speciﬁc linearization of the entire net-

work. Another work [26] uses a deconvnet to go back to

features from decisions. The saliency maps obtained via

these methods are often noisy and prevent a clear under-

standing of the decisions made. Another track of meth-

ods [29], [18], [4], [6] make use of the derivative of a neural

network output with respect to an activation, usually the one

right before fully connected layers. This saliency maps ob-

tained by this track, and some other works [27], [11], [5] are

clearer in the sense of highlighting areas related to the pre-

dicted class. Although useful for purposes such as check-

ing whether the support area for decisions are sound, these

methods still lack a detailed logical reasoning of why such

decision is made.

Conversion between neural networks and interpretable

by-design models -such as decision trees- has been a topic

of interest. In [8], a method was devised to initialize neural

networks with decision trees. [9,19,25] also provides neu-

ral network equivalents of decision trees. The neural net-

works in these works have speciﬁc architectures, thus the

conversion lacks generalization to any model. In [24], neu-

ral networks were trained in such a way that their decision

boundaries can be approximated by trees. This work does

not provide a correspondence between neural networks and

decision trees, and merely uses the latter as a regulariza-

tion. In [7], a neural network was used to train a decision

tree. Such tree distillation is an approximation of a neural

network and not a direct conversion, thus performs poorly

on the tasks that the neural network was trained on.

Joint neural network and decision tree models [12], [16],

[13], [14], [17], [2], [10], [22] genarally use deep learning to

assists some trees, or come up with a neural network struc-

ture so it resembles a tree. A recent work [23] replaces the

ﬁnal fully connected layer of a neural network with a deci-

sion tree. Since the backbone features are still that of neural

networks, the explanation is sought to be achieved via pro-

viding a means to humans to validate the decision as a good

or bad one, rather than a complete logical reasoning of the

decision.

In this paper, we show that any neural network having

any activations has a directly equivalent decision tree rep-

resentation. Thus, the induced tree output is exactly the

same with that of the neural network and tree representation

doesn’t limit or require altering of the neural architecture

in any way. We believe that the decision tree equivalence

provides better understanding of neural networks and paves

the way to tackle the black-box nature of neural networks,

e.g. via analyzing the category that a test sample belongs

to, which can be extracted by the node rules that a sample

is categorized. We show that the decision tree equivalent of

arXiv:2210.05189v3 [cs.LG] 25 Oct 2022

a neural network can be found for either fully connected or

convolutional neural networks which may include skip lay-

ers and normalizations as well. Besides the interpretability

aspect, we show that the induced tree is also advantageous

to the corresponding neural network in terms of computa-

tional complexity, at the expense of increased storage mem-

ory.

Upon writing this paper, we have noticed the following

works having overlaps with ours [28], [3], [15], [21], es-

pecially for feedforward ReLU networks. We extend the

ﬁndings in these works to any activation function and also

recurrent neural networks.

2. Decision Tree Analysis of Neural Networks

The derivations in this section will be ﬁrst made for feed-

forward neural networks with piece-wise linear activation

functions such as ReLU, Leaky ReLU, etc. Next, the analy-

sis will be extended to any neural network with any activa-

tion function.

2.1. Fully Connected Networks

Let Wibe the weight matrix of a network’s ith layer.

Let σbe any piece-wise linear activation function, and x0

be the input to the neural network. Then, the output and an

intermediate feature of a feed-forward neural network can

be represented as in Eq. 1.

NN(x0) = WT

n−1σ(WT

n−2σ(...WT

1σ(WT

0x0)))

xi=σ(WT

i−1σ(...WT

1σ(WT

0x0))) (1)

Note that in Eq. 1, we omit any ﬁnal activation (e.g.

softmax) and we ignore the bias term as it can be simply

included by concatenating a 1value to each xi. The acti-

vation function σacts as an element-wise scalar multiplica-

tion, hence the following can be written.

iσ(WT

i−1xi−1) = WT

i(ai−1(WT

i−1xi−1)) (2)

In Eq. 2,ai−1is a vector indicating the slopes of activa-

tions in the corresponding linear regions where WT

i−1xi−1

fall into, denotes element-wise multiplication. Note that,

ai−1can directly be interpreted as a categorization result

since it includes indicators (slopes) of linear regions in ac-

tivation function. The Eq. 2can be re-organized as follows.

iσ(WT

i−1xi−1)=(Wiai−1)TWT

i−1xi−1(3)

In Eq. 3, we use as a column-wise element-wise

multiplication on Wi. This corresponds to element-wise

multiplication by a matrix obtained via by repeating ai−1

column-vector to match the size of Wi. Using Eq. 3, Eq. 1

can be rewritten as follows.

NN(x0)=(Wn−1an−2)T(Wn−2an−3)T

...(W1a0)TWT

0x0

(4)

From Eq. 4, one can deﬁne an effective weight matrix

iof a layer ito be applied directly on input x0as follows:

Ci−1ˆ

i= (Wiai−1)T...(W1a0)TWT

Ci−1ˆ

ix0=WT

ixi

(5)

In Eq. 5, the categorization vector until layer iis de-

ﬁned as follows: ci−1=a0ka1k...ai−1, where kis the

concatenation operator.

One can directly observe from Eq. 5that, the effec-

tive matrix of layer iis only dependent on the categoriza-

tion vectors from previous layers. This indicates that in

each layer, a new efﬁcient ﬁlter is selected -to be applied

on the network input- based on the previous categoriza-

tions/decisions. This directly shows that a fully connected

neural network can be represented as a single decision tree,

where effective matrices acts as categorization rules. In

each layer i, response of effective matrix Ci−1ˆ

iis catego-

rized into aivector, and based on this categorization result,

next layer’s effective matrix Ciˆ

i+1 is determined. A layer

iis thus represented as kmi-way categorization, where miis

the number ﬁlters in each layer iand kis the total number

of linear regions in an activation. This categorization in a

layer ican thus be represented by a tree of depth mi, where

a node in any depth is branched into kcategorizations.

In order to better illustrate the equivalent decision tree of

a neural network, in Algorithm 1, we rewrite Eq. 5for the

entire network, as an algorithm. For the sake of simplic-

ity and without loss of generality, we provide the algorithm

with the ReLU activation function, where a∈ {0,1}. It

can be clearly observed that, the lines 5−9in Algorithm 1

corresponds to a node in the decision tree, where a simple

yes/no decision is made.

The decision tree equivalent of a neural network can thus

be constructed as in Algorithm 2. Using this algorithm,

we share a a tree representation obtained for a neural net-

work with three layers, having 2,1 and 1 ﬁlter for layer 1,

2 and 3 respectively. The network has ReLU activation in

between layers, and no activation after last layer. It can be

observed from Algorithm 2and Fig. 1that the depth of a

NN-equivalent tree is d=Pn−2

i=0 mi, and total number of

categories in last branch is 2d. At ﬁrst glance, the number

of categories seem huge. For example, if ﬁrst layer of a neu-

ral network contains 64 ﬁlters, there would exist at least 264

branches in a tree, which is already intractable. But, there

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NeuralNetworksareDecisionTreesCaglarAytekinAILeadAACTechnologiescaglaraytekin@aactechnologies.com,cagosmail@gmail.comAbstractInthismanuscript,weshowthatanyneuralnetworkwithanyactivationfunctioncanberepresentedasade-cisiontree.Therepresentationisequivalenceandnotanapproximation,thuskeepingtheaccuracy...

展开>> 收起<<

Neural Networks are Decision Trees Caglar Aytekin AI Lead.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Neural Networks are Decision Trees Caglar Aytekin AI Lead

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: