To tree or not to tree Assessing the impact of smoothing the decision boundaries Anthea M erida Argyris Kalogeratos and Mathilde Mougeot

2025-05-06 0 0 812.83KB 12 页 10玖币

侵权投诉

To tree or not to tree? Assessing the impact of

smoothing the decision boundaries

Anthea M´erida, Argyris Kalogeratos, and Mathilde Mougeot

Centre Borelli,ENS Paris-Saclay, Universit´e Paris-Saclay, France

name.surname@ens-paris-saclay.fr

Abstract. When analyzing a dataset, it can be useful to assess how

smooth the decision boundaries need to be for a model to better ﬁt the

data. This paper addresses this question by proposing the quantiﬁca-

tion of how much should the ‘rigid’ decision boundaries, produced by an

algorithm that naturally ﬁnds such solutions, be relaxed to obtain a per-

formance improvement. The approach we propose starts with the rigid

decision boundaries of a seed Decision Tree (seed DT), which is used

to initialize a Neural DT (NDT). The initial boundaries are challenged

by relaxing them progressively through training the NDT. During this

process, we measure the NDT’s performance and decision agreement to

its seed DT. We show how these two measures can help the user in ﬁgur-

ing out how expressive his model should be, before exploring it further

via model selection. The validity of our approach is demonstrated with

experiments on simulated and benchmark datasets.

Keywords: Decision trees, neural decision trees, neural networks, model

family selection, model selection, interpretability, data exploration.

1 Introduction

During the exploratory phase of data analysis, and before choosing a model to

ﬁt it to the data, it is interesting to know whether the data can be suﬃciently

summarized with decision boundaries composed of a set of ‘hard’ or ‘rigid’ rules,

which can be interpreted by humans. Rejecting this assumption would mean that

it is necessary to have a certain degree of smoothness to the decision boundaries

in order to capture better the structure of the dataset. Typically, one would

simply compare members of diﬀerent model families and select one through a

procedure such as cross-validation (CV). In machine learning, where it is more

usual to focus on the prediction capacity of models, CV is a widespread procedure

both for model and algorithm selection [6]. Indeed, CV is easy to implement and

simpliﬁes the comparison of diﬀerent models based on the variability of a chosen

performance metric.

From a higher level point-of-view, automated machine learning (auto-ML) or

meta-learning methods can be used to ﬁrst decide which type of algorithm can

be suitable for a dataset, and then to train the ﬁnal model on it. Indeed, pack-

ages such as Auto-WEKA [8] and Auto-sklearn [7] provide tools to automate the

arXiv:2210.03672v1 [cs.LG] 7 Oct 2022

2 M´erida et al.

Data

Seed DT NDT(𝜃)

find 𝜃*

train

Performance(𝜃)

Agreement(𝜃)

train

translate

structure

measure

𝜃*

Curves ∀𝜃:

Performance(𝜃)

Agreement(𝜃)

output

input

𝜃*

Performance(𝜃*)

Agreement(𝜃*)

Curves ∀𝜃:

Performance(𝜃)

Agreement(𝜃)

Fig. 1: Outline of the proposed method.

algorithm and model selection process. They apply Bayesian optimization and

meta-learning procedures [7,13] to select the most appropriate algorithm and its

parametrization according to a metric, and within a user-deﬁned budget. Other

methods aim to use data characterizations to obtain insights into what kind of

data mining algorithm is suitable for a dataset. These characterizations can be

statistical and information-theoretical measures to be employed as input, and

then the aim is to learn their association with the algorithms’ performance for

the data. Users can interpret these methods, e.g. by producing decision rules

using the C5.0 algorithm [1], or a self-organizing map to cluster various datasets

according to their characteristics [14]. Other, more complex, data characteriza-

tions (or meta-features) have been proposed to describe the problem at hand

[9,12]. Closer to our work, the approach of [11] extracts meta-features from a

dataset’s inducted decision tree, which attempt to capture learning complexity

of the dataset. These methods, however, require a database of use-cases (datasets

and their associated preferred algorithms) whose clustering would provide gen-

eral guidance on model selection, and might need to be retrained when adding

new use-cases.

The aforementioned existing methods aim mostly at deciding among candi-

date models, and eventually train a good performing ﬁnal model. In this sense,

they do not provide the user with direct insights regarding the complexity of

the underlying structure of the data itself, which is something generally less

studied in the literature, and it is exactly the main point of focus of this work.

Speciﬁcally, we propose an exploratory procedure to help the user assessing the

expressive power needed for producing eﬃcient classiﬁcation boundaries for a

given dataset. This procedure is meant to be followed to better understand the

dataset, prior to selecting the set of models to be further explored through model

selection techniques. The procedure can be directly applied to a dataset and does

not require prior knowledge for the input data or processing of external data.

More speciﬁcally, our idea for assessing the expressive power needed for a

dataset is to challenge the decision boundaries produced by a rigid trained model.

This is achieved by relaxing progressively its decision boundaries, and evaluat-

ing in a controlled way how ﬂexible these need to become so that they ﬁt better

to the data. To realize this idea, we use a typical Decision Tree (DT) for the

initial decision boundaries, as it is a simple, interpretable, and a naturally rigid

model. The proposed procedure is outlined in Fig. 1: it starts with the decision

To tree or not to tree? 3

boundaries produced by a reference DT trained on the input dataset, also called

seed DT. The seed DT initializes a Neural Decision Tree (NDT) [3], which in-

herits the DT’s decision boundaries. By its deﬁnition, an NDT is a special type

of Neural Network that can be initialized by a DT, where the smoothness of the

activation functions can be controlled. What we put forward is the idea that,

by training an NDT, it becomes possible to measure two things: its ‘departure’

from the seed DT in terms of disagreement at the decision level, and the evolu-

tion of any performance metric as a function of the allowed smoothness of the

decision boundaries. We show with experiments on real and synthetic data that

the indicators provided by our data exploration procedure are meaningful for

the classiﬁcation task, and we illustrate with examples how users can interpret

them in practice.

2 Background

In this section, we present the tools we use to build our procedure. First, the

core of the proposed method is the gradual relaxation of the decision boundaries

produced by a rigid model. The algorithm to be used to perform this relaxation

needs to oﬀer the possibility of controlling the expressive power of the ﬁnal

model by tuning a small number of parameters. We propose this to be done

using a Neural Decision Tree [3], which will be presented in Subsec. A. Once a

model with more ﬂexible decision boundaries is obtained, we can measure its

‘departure’ from the initial rigid one. Subsec. B describes metrics to evaluate the

diﬀerence between two models.

A. Neural Decision Trees (NDTs).An NDT is a neural network (NN) whose

architecture and weights initialization are obtained directly from an input DT

[3,2,10], which we call here ‘seed DT’. The NDT variant we use is the one from

[3], which we extend to classiﬁcation tasks. The hyperparameters of this NDT

type allow us to control the smoothness of its activation functions, which in fact

is a proxy for controlling the smoothness of the decision boundaries.

An important NDT feature is that there is no need for the user to search

for the right network architecture for each dataset, i.e. the number of layers

or the number of neural units in each layer. Its generally shallow architecture

may be too restrictive for complex problems, but this feature is seen as an

advantage for our purpose. An NDT is always formed by four layers: an input

layer, two hidden layers, and an output layer. The connections between the

layers encode the information extracted from the seed DT. For a dataset with

dfeatures and a seed DT with Kleaves, we get the following architecture and

weights initialization:

Input layer. As usual, it has dneurons corresponding to the data features.

First hidden layer: It consists of K−1 neurons, each one representing a

split node of the seed DT. A split condition of a node refers to a feature and

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Totreeornottotree?AssessingtheimpactofsmoothingthedecisionboundariesAntheaMerida,ArgyrisKalogeratos,andMathildeMougeotCentreBorelli,ENSParis-Saclay,UniversiteParis-Saclay,Francename.surname@ens-paris-saclay.frAbstract.Whenanalyzingadataset,itcanbeusefultoassesshowsmooththedecisionboundariesneedtob...

展开>> 收起<<

To tree or not to tree Assessing the impact of smoothing the decision boundaries Anthea M erida Argyris Kalogeratos and Mathilde Mougeot.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

To tree or not to tree Assessing the impact of smoothing the decision boundaries Anthea M erida Argyris Kalogeratos and Mathilde Mougeot

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: