Neural Networks are Decision Trees Caglar Aytekin AI Lead

2025-05-02 0 0 336.75KB 8 页 10玖币
侵权投诉
Neural Networks are Decision Trees
Caglar Aytekin
AI Lead
AAC Technologies
caglaraytekin@aactechnologies.com, cagosmail@gmail.com
Abstract
In this manuscript, we show that any neural network
with any activation function can be represented as a de-
cision tree. The representation is equivalence and not an
approximation, thus keeping the accuracy of the neural net-
work exactly as is. We believe that this work provides bet-
ter understanding of neural networks and paves the way to
tackle their black-box nature. We share equivalent trees
of some neural networks and show that besides providing
interpretability, tree representation can also achieve some
computational advantages for small networks. The analysis
holds both for fully connected and convolutional networks,
which may or may not also include skip connections and/or
normalizations.
1. Introduction
Despite the immense success of neural networks over the
past decade, the black-box nature of their predictions pre-
vent their wider and more reliable adoption in many indus-
tries, such as health and security. This fact led researchers to
investigate ways to explain neural network decisions. The
efforts in explaining neural network decisions can be cate-
gorized into several approaches: saliency maps, approxima-
tion by interpretable methods and joint models.
Saliency maps are ways of highlighting areas on the in-
put, of which a neural network make use of the most while
prediction. An earlier work [20] takes the gradient of the
neural network output with respect to the input in order to
visualize an input-specific linearization of the entire net-
work. Another work [26] uses a deconvnet to go back to
features from decisions. The saliency maps obtained via
these methods are often noisy and prevent a clear under-
standing of the decisions made. Another track of meth-
ods [29], [18], [4], [6] make use of the derivative of a neural
network output with respect to an activation, usually the one
right before fully connected layers. This saliency maps ob-
tained by this track, and some other works [27], [11], [5] are
clearer in the sense of highlighting areas related to the pre-
dicted class. Although useful for purposes such as check-
ing whether the support area for decisions are sound, these
methods still lack a detailed logical reasoning of why such
decision is made.
Conversion between neural networks and interpretable
by-design models -such as decision trees- has been a topic
of interest. In [8], a method was devised to initialize neural
networks with decision trees. [9,19,25] also provides neu-
ral network equivalents of decision trees. The neural net-
works in these works have specific architectures, thus the
conversion lacks generalization to any model. In [24], neu-
ral networks were trained in such a way that their decision
boundaries can be approximated by trees. This work does
not provide a correspondence between neural networks and
decision trees, and merely uses the latter as a regulariza-
tion. In [7], a neural network was used to train a decision
tree. Such tree distillation is an approximation of a neural
network and not a direct conversion, thus performs poorly
on the tasks that the neural network was trained on.
Joint neural network and decision tree models [12], [16],
[13], [14], [17], [2], [10], [22] genarally use deep learning to
assists some trees, or come up with a neural network struc-
ture so it resembles a tree. A recent work [23] replaces the
final fully connected layer of a neural network with a deci-
sion tree. Since the backbone features are still that of neural
networks, the explanation is sought to be achieved via pro-
viding a means to humans to validate the decision as a good
or bad one, rather than a complete logical reasoning of the
decision.
In this paper, we show that any neural network having
any activations has a directly equivalent decision tree rep-
resentation. Thus, the induced tree output is exactly the
same with that of the neural network and tree representation
doesn’t limit or require altering of the neural architecture
in any way. We believe that the decision tree equivalence
provides better understanding of neural networks and paves
the way to tackle the black-box nature of neural networks,
e.g. via analyzing the category that a test sample belongs
to, which can be extracted by the node rules that a sample
is categorized. We show that the decision tree equivalent of
arXiv:2210.05189v3 [cs.LG] 25 Oct 2022
a neural network can be found for either fully connected or
convolutional neural networks which may include skip lay-
ers and normalizations as well. Besides the interpretability
aspect, we show that the induced tree is also advantageous
to the corresponding neural network in terms of computa-
tional complexity, at the expense of increased storage mem-
ory.
Upon writing this paper, we have noticed the following
works having overlaps with ours [28], [3], [15], [21], es-
pecially for feedforward ReLU networks. We extend the
findings in these works to any activation function and also
recurrent neural networks.
2. Decision Tree Analysis of Neural Networks
The derivations in this section will be first made for feed-
forward neural networks with piece-wise linear activation
functions such as ReLU, Leaky ReLU, etc. Next, the analy-
sis will be extended to any neural network with any activa-
tion function.
2.1. Fully Connected Networks
Let Wibe the weight matrix of a network’s ith layer.
Let σbe any piece-wise linear activation function, and x0
be the input to the neural network. Then, the output and an
intermediate feature of a feed-forward neural network can
be represented as in Eq. 1.
NN(x0) = WT
n1σ(WT
n2σ(...WT
1σ(WT
0x0)))
xi=σ(WT
i1σ(...WT
1σ(WT
0x0))) (1)
Note that in Eq. 1, we omit any final activation (e.g.
softmax) and we ignore the bias term as it can be simply
included by concatenating a 1value to each xi. The acti-
vation function σacts as an element-wise scalar multiplica-
tion, hence the following can be written.
WT
iσ(WT
i1xi1) = WT
i(ai1(WT
i1xi1)) (2)
In Eq. 2,ai1is a vector indicating the slopes of activa-
tions in the corresponding linear regions where WT
i1xi1
fall into, denotes element-wise multiplication. Note that,
ai1can directly be interpreted as a categorization result
since it includes indicators (slopes) of linear regions in ac-
tivation function. The Eq. 2can be re-organized as follows.
WT
iσ(WT
i1xi1)=(Wiai1)TWT
i1xi1(3)
In Eq. 3, we use as a column-wise element-wise
multiplication on Wi. This corresponds to element-wise
multiplication by a matrix obtained via by repeating ai1
column-vector to match the size of Wi. Using Eq. 3, Eq. 1
can be rewritten as follows.
NN(x0)=(Wn1an2)T(Wn2an3)T
...(W1a0)TWT
0x0
(4)
From Eq. 4, one can define an effective weight matrix
ˆ
WT
iof a layer ito be applied directly on input x0as follows:
Ci1ˆ
WT
i= (Wiai1)T...(W1a0)TWT
0
Ci1ˆ
WT
ix0=WT
ixi
(5)
In Eq. 5, the categorization vector until layer iis de-
fined as follows: ci1=a0ka1k...ai1, where kis the
concatenation operator.
One can directly observe from Eq. 5that, the effec-
tive matrix of layer iis only dependent on the categoriza-
tion vectors from previous layers. This indicates that in
each layer, a new efficient filter is selected -to be applied
on the network input- based on the previous categoriza-
tions/decisions. This directly shows that a fully connected
neural network can be represented as a single decision tree,
where effective matrices acts as categorization rules. In
each layer i, response of effective matrix Ci1ˆ
WT
iis catego-
rized into aivector, and based on this categorization result,
next layer’s effective matrix Ciˆ
WT
i+1 is determined. A layer
iis thus represented as kmi-way categorization, where miis
the number filters in each layer iand kis the total number
of linear regions in an activation. This categorization in a
layer ican thus be represented by a tree of depth mi, where
a node in any depth is branched into kcategorizations.
In order to better illustrate the equivalent decision tree of
a neural network, in Algorithm 1, we rewrite Eq. 5for the
entire network, as an algorithm. For the sake of simplic-
ity and without loss of generality, we provide the algorithm
with the ReLU activation function, where a∈ {0,1}. It
can be clearly observed that, the lines 59in Algorithm 1
corresponds to a node in the decision tree, where a simple
yes/no decision is made.
The decision tree equivalent of a neural network can thus
be constructed as in Algorithm 2. Using this algorithm,
we share a a tree representation obtained for a neural net-
work with three layers, having 2,1 and 1 filter for layer 1,
2 and 3 respectively. The network has ReLU activation in
between layers, and no activation after last layer. It can be
observed from Algorithm 2and Fig. 1that the depth of a
NN-equivalent tree is d=Pn2
i=0 mi, and total number of
categories in last branch is 2d. At first glance, the number
of categories seem huge. For example, if first layer of a neu-
ral network contains 64 filters, there would exist at least 264
branches in a tree, which is already intractable. But, there
摘要:

NeuralNetworksareDecisionTreesCaglarAytekinAILeadAACTechnologiescaglaraytekin@aactechnologies.com,cagosmail@gmail.comAbstractInthismanuscript,weshowthatanyneuralnetworkwithanyactivationfunctioncanberepresentedasade-cisiontree.Therepresentationisequivalenceandnotanapproximation,thuskeepingtheaccuracy...

展开>> 收起<<
Neural Networks are Decision Trees Caglar Aytekin AI Lead.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:336.75KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注