To tree or not to tree Assessing the impact of smoothing the decision boundaries Anthea M erida Argyris Kalogeratos and Mathilde Mougeot

2025-05-06 0 0 812.83KB 12 页 10玖币
侵权投诉
To tree or not to tree? Assessing the impact of
smoothing the decision boundaries
Anthea M´erida, Argyris Kalogeratos, and Mathilde Mougeot
Centre Borelli,ENS Paris-Saclay, Universit´e Paris-Saclay, France
name.surname@ens-paris-saclay.fr
Abstract. When analyzing a dataset, it can be useful to assess how
smooth the decision boundaries need to be for a model to better fit the
data. This paper addresses this question by proposing the quantifica-
tion of how much should the ‘rigid’ decision boundaries, produced by an
algorithm that naturally finds such solutions, be relaxed to obtain a per-
formance improvement. The approach we propose starts with the rigid
decision boundaries of a seed Decision Tree (seed DT), which is used
to initialize a Neural DT (NDT). The initial boundaries are challenged
by relaxing them progressively through training the NDT. During this
process, we measure the NDT’s performance and decision agreement to
its seed DT. We show how these two measures can help the user in figur-
ing out how expressive his model should be, before exploring it further
via model selection. The validity of our approach is demonstrated with
experiments on simulated and benchmark datasets.
Keywords: Decision trees, neural decision trees, neural networks, model
family selection, model selection, interpretability, data exploration.
1 Introduction
During the exploratory phase of data analysis, and before choosing a model to
fit it to the data, it is interesting to know whether the data can be sufficiently
summarized with decision boundaries composed of a set of ‘hard’ or ‘rigid’ rules,
which can be interpreted by humans. Rejecting this assumption would mean that
it is necessary to have a certain degree of smoothness to the decision boundaries
in order to capture better the structure of the dataset. Typically, one would
simply compare members of different model families and select one through a
procedure such as cross-validation (CV). In machine learning, where it is more
usual to focus on the prediction capacity of models, CV is a widespread procedure
both for model and algorithm selection [6]. Indeed, CV is easy to implement and
simplifies the comparison of different models based on the variability of a chosen
performance metric.
From a higher level point-of-view, automated machine learning (auto-ML) or
meta-learning methods can be used to first decide which type of algorithm can
be suitable for a dataset, and then to train the final model on it. Indeed, pack-
ages such as Auto-WEKA [8] and Auto-sklearn [7] provide tools to automate the
arXiv:2210.03672v1 [cs.LG] 7 Oct 2022
2 M´erida et al.
Data
Seed DT NDT(𝜃)
find 𝜃*
train
Performance(𝜃)
Agreement(𝜃)
train
translate
structure
measure
𝜃*
Curves 𝜃:
Performance(𝜃)
Agreement(𝜃)
output
input
𝜃*
Performance(𝜃*)
Agreement(𝜃*)
Curves 𝜃:
Performance(𝜃)
Agreement(𝜃)
Fig. 1: Outline of the proposed method.
algorithm and model selection process. They apply Bayesian optimization and
meta-learning procedures [7,13] to select the most appropriate algorithm and its
parametrization according to a metric, and within a user-defined budget. Other
methods aim to use data characterizations to obtain insights into what kind of
data mining algorithm is suitable for a dataset. These characterizations can be
statistical and information-theoretical measures to be employed as input, and
then the aim is to learn their association with the algorithms’ performance for
the data. Users can interpret these methods, e.g. by producing decision rules
using the C5.0 algorithm [1], or a self-organizing map to cluster various datasets
according to their characteristics [14]. Other, more complex, data characteriza-
tions (or meta-features) have been proposed to describe the problem at hand
[9,12]. Closer to our work, the approach of [11] extracts meta-features from a
dataset’s inducted decision tree, which attempt to capture learning complexity
of the dataset. These methods, however, require a database of use-cases (datasets
and their associated preferred algorithms) whose clustering would provide gen-
eral guidance on model selection, and might need to be retrained when adding
new use-cases.
The aforementioned existing methods aim mostly at deciding among candi-
date models, and eventually train a good performing final model. In this sense,
they do not provide the user with direct insights regarding the complexity of
the underlying structure of the data itself, which is something generally less
studied in the literature, and it is exactly the main point of focus of this work.
Specifically, we propose an exploratory procedure to help the user assessing the
expressive power needed for producing efficient classification boundaries for a
given dataset. This procedure is meant to be followed to better understand the
dataset, prior to selecting the set of models to be further explored through model
selection techniques. The procedure can be directly applied to a dataset and does
not require prior knowledge for the input data or processing of external data.
More specifically, our idea for assessing the expressive power needed for a
dataset is to challenge the decision boundaries produced by a rigid trained model.
This is achieved by relaxing progressively its decision boundaries, and evaluat-
ing in a controlled way how flexible these need to become so that they fit better
to the data. To realize this idea, we use a typical Decision Tree (DT) for the
initial decision boundaries, as it is a simple, interpretable, and a naturally rigid
model. The proposed procedure is outlined in Fig. 1: it starts with the decision
To tree or not to tree? 3
boundaries produced by a reference DT trained on the input dataset, also called
seed DT. The seed DT initializes a Neural Decision Tree (NDT) [3], which in-
herits the DT’s decision boundaries. By its definition, an NDT is a special type
of Neural Network that can be initialized by a DT, where the smoothness of the
activation functions can be controlled. What we put forward is the idea that,
by training an NDT, it becomes possible to measure two things: its ‘departure’
from the seed DT in terms of disagreement at the decision level, and the evolu-
tion of any performance metric as a function of the allowed smoothness of the
decision boundaries. We show with experiments on real and synthetic data that
the indicators provided by our data exploration procedure are meaningful for
the classification task, and we illustrate with examples how users can interpret
them in practice.
2 Background
In this section, we present the tools we use to build our procedure. First, the
core of the proposed method is the gradual relaxation of the decision boundaries
produced by a rigid model. The algorithm to be used to perform this relaxation
needs to offer the possibility of controlling the expressive power of the final
model by tuning a small number of parameters. We propose this to be done
using a Neural Decision Tree [3], which will be presented in Subsec. A. Once a
model with more flexible decision boundaries is obtained, we can measure its
‘departure’ from the initial rigid one. Subsec. B describes metrics to evaluate the
difference between two models.
A. Neural Decision Trees (NDTs).An NDT is a neural network (NN) whose
architecture and weights initialization are obtained directly from an input DT
[3,2,10], which we call here ‘seed DT’. The NDT variant we use is the one from
[3], which we extend to classification tasks. The hyperparameters of this NDT
type allow us to control the smoothness of its activation functions, which in fact
is a proxy for controlling the smoothness of the decision boundaries.
An important NDT feature is that there is no need for the user to search
for the right network architecture for each dataset, i.e. the number of layers
or the number of neural units in each layer. Its generally shallow architecture
may be too restrictive for complex problems, but this feature is seen as an
advantage for our purpose. An NDT is always formed by four layers: an input
layer, two hidden layers, and an output layer. The connections between the
layers encode the information extracted from the seed DT. For a dataset with
dfeatures and a seed DT with Kleaves, we get the following architecture and
weights initialization:
Input layer. As usual, it has dneurons corresponding to the data features.
First hidden layer: It consists of K1 neurons, each one representing a
split node of the seed DT. A split condition of a node refers to a feature and
摘要:

Totreeornottotree?AssessingtheimpactofsmoothingthedecisionboundariesAntheaMerida,ArgyrisKalogeratos,andMathildeMougeotCentreBorelli,ENSParis-Saclay,UniversiteParis-Saclay,Francename.surname@ens-paris-saclay.frAbstract.Whenanalyzingadataset,itcanbeusefultoassesshowsmooththedecisionboundariesneedtob...

展开>> 收起<<
To tree or not to tree Assessing the impact of smoothing the decision boundaries Anthea M erida Argyris Kalogeratos and Mathilde Mougeot.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:812.83KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注