Supervised Contrastive Learning with Tree-Structured Parzen Estimator Bayesian Optimization for Imbalanced Tabular Data

2025-05-02 0 0 3.28MB 28 页 10玖币
侵权投诉
Supervised Contrastive Learning with Tree-Structured
Parzen Estimator Bayesian Optimization for
Imbalanced Tabular Data
Shuting Tao, Peng Peng, Qi Li, Hongwei Wang
Zhejiang University - University of Illinois at Urbana-Champaign Institute, Zhejiang
Univeristy, 718 East Haizhou Road, Haining, 314400, China
Abstract
Class imbalance has a detrimental effect on the predictive performance of
most supervised learning algorithms as the imbalanced distribution can lead
to a bias preferring the majority class. To solve this problem, we propose a
Supervised Contrastive Learning (SCL) method with Tree-structured Parzen
Estimator (TPE) technique for imbalanced tabular datasets. Contrastive
learning (CL) can extract the information hidden in data even without labels
and has shown some potential for imbalanced learning tasks. SCL further
considers the label information based on CL, which also addresses the insuffi-
cient data augmentation techniques of tabular data. Therefore, in this work,
we propose to use SCL to learn a discriminative representation of imbalanced
tabular data. Additionally, the hyper-parameter temperature τof SCL has
a decisive influence on the performance and is difficult to tune. We intro-
duce TPE, a well-known Bayesian optimization technique, to automatically
select the best τ. Experiments are conducted on both binary and multi-class
imbalanced tabular datasets. As shown in the results obtained, TPE out-
performs three other hyper-parameter optimization (HPO) methods such as
grid search, random search, and genetic algorithm. More importantly, the
proposed SCL-TPE method achieves much-improved performance compared
with the state-of-the-art methods.
Corresponding author
Email addresses: 12121105@zju.edu.cn (Shuting Tao),
pengp17@mails.tsinghua.edu.cn (Peng Peng), liqi177@zju.edu.cn (Qi Li),
hongweiwang@intl.zju.edu.cn (Hongwei Wang)
Preprint submitted to Elsevier October 27, 2022
arXiv:2210.10824v2 [cs.LG] 26 Oct 2022
Keywords: Imbalanced learning, Supervised contrastive learning,
Tree-Structured Parzen Estimator, Representation learning, Deep learning
1. Introduction
With excellent performance on uniformly distributed data, supervised
learning has become the most popular method for data classification. How-
ever, uneven distribution of data, i.e., class imbalance, is very common in
most datasets collected from real-world scenarios, which inevitably under-
mines the effectiveness of supervised algorithm. This class imbalance makes
it intractable for the supervised models to represent the distribution char-
acteristics of skewed data correctly, and thus results in very low prediction
accuracy for the minority classes. A well-known example is the mammogra-
phy dataset [1], in which positive samples only account for 2.3% of the total
samples. While the prediction accuracy of the positive class is crucial in this
case, the traditional supervised learning-based classifiers tend to predict that
all samples are negative.
Improving classification accuracy of both the majority and minority classes
has become a great challenge. To address this challenge, considerable solu-
tions have been put forward and they can be broadly divided into three
categories: data preprocessing, [2, 3, 4, 5], feature learning [6, 7], and classi-
fier design[8, 9, 10]. For the solutions based on data preprocessing, scholars
attempt to rebalance data distribution through data sampling. In terms of
classifier design, there are two kinds of methods. Specifically, the algorithm-
level methods modify algorithms to increase the low accuracy of the minor-
ity class - the most popular one is cost-sensitive learning (CSL) that uses
a weighted cost for different classes. Another kind is the model-level meth-
ods which combine the classification results from multiple base models like
ensemble learning.
However, the methods mentioned above have some inherent drawbacks.
For example, the data-level methods may result in losing useful information
[11] or overestimating the minority data [12]. For CSL, it is difficult to
set the value of misclassification cost which in most cases is unknown from
the data and cannot be given by experts [13]. These issues have motivated
researchers to develop strategies based on feature learning, and the existing
methods consider using autoencoders to learn imbalanced data features [7].
In this paper, we propose to use supervised contrastive learning (SCL) [14]
to extract features from imbalanced tabular datasets.
2
Contrastive learning (CL), a kind of self-supervised learning (SSL) [15],
has shown the ability to represent the hidden features that are not condi-
tioned on data labels in the image domain [16, 17]. CL aims to group an
anchor and a “positive” sample together in the embedding space, and di-
verse the anchor far from “negative” samples. Here “positive” sample refers
to data augmented from the anchor, while “negative” samples are randomly
chosen from small batches. It is noted that the success achieved by CL in
feature learning of images is closely related to data augmentation techniques,
such as rotation [18], colorization [19], and jigsaw puzzle solving [20]. And
most of these techniques are not applicable to general tabular data because
they heavily rely on the unique structure of the domain datasets. This brings
a great difficulty when using CL for imbalanced tabular learning.
In this work, we fill this gap by adopting SCL to learn the representation
of imbalanced tabular data. SCL considers many positives per anchor rather
than using only a single positive. These positives are selected from samples
belonging to the same class as the anchor, instead of from data augmenta-
tions of the anchor. Embeddings of the same class are pulled closer together
than those from other classes. The utilization of label information can alle-
viate the lack of data augmentation strategy of tabular data. So the success
of contrastive loss in the image domain loss can be extended to the tabu-
lar domain using some domain-independent augmentations such as gaussian
blur.
Furthermore, SCL requires fixing the hyper-parameter temperature τ
before model training, which is a crucial hyper-parameter to control the
strength of penalties on negative samples. As evident in [21], a good choice
of τcan significantly improve the quality of feature representation. That is
to say, a good selection of τcan make the SCL achieve better performance in
imbalanced learning. However, hyper-parameter tuning is often challenging
and time-consuming. And among the current studies, little consideration
has been provided to details of the hyper-parameter tuning of τ. In this pa-
per, we demonstrate that the setting of the τsubstantially influences SCL’s
performance. We further propose to develop a flexible approach that en-
ables hyper-parameter optimization (HPO) to be conducted as an automatic
process.
Classic HPO methods include grid search (GS), random search (RS), ge-
netic algorithm (GA), and Bayesian optimization (BO). GS defines a search
space as a grid of hyper-parameter values and assesses every position in the
grid. RS defines a search space as a bounded domain of hyper-parameter
3
values and randomly samples points in that domain. GA is based on the
concepts of biological evolution, which considers a set of possible candidate
solutions that evolves and gives a better result [22]. Compared with unin-
formed search GS and RS, BO considers the previously explored information
in each step, which reduces the search space and improves the search ef-
ficiency. Compared with GA, BO requires fewer computational resources.
GA needs to train the model on multiple hyper-parameters to go from one
generation to the next. In contrast, BO trains a single model and updates
the posterior information, shortening training time and not requiring many
computational resources. In general, BO has two implementations: Gaus-
sian Process (GP) and Tree-structured Parzen estimator (TPE) [23]. TPE
has been proven superior to GP since TPE’s modeling of previously explored
observations is more accurate than GP’s [23]. Therefore, we choose TPE to
select the SCL model’s best τin our work. More empirical work is shown in
Section 4 to confirm our choice. More specifically, the main contributions of
this paper are listed below:
SCL is proposed to learn an embedding space in which samples of the
same class pairs stay close to each other while samples belonging to
different classes are far apart. For imbalanced tabular datasets, we
believe that SCL will outperform traditional supervised methods - the
reason for this is that, in addition to employing the label information,
SCL better captures data features by learning the intrinsic properties
from the data itself based on contrastive loss. Therefore, SCL will
not suffer from a significant performance drop due to the “label bias”
caused by imbalanced data.
TPE is first used to select the best hyper-parameter temperature τfor
SCL automatically. In this paper, we demonstrate that τis critical to
SCL’s performance, and TPE is proven to produce better results than
other algorithms for hyper-parameter optimization.
Extensive experiments are conducted to demonstrate the effectiveness
of our method. We compare SCL-TPE’s performance with ten com-
petitive data sampling methods on fifteen imbalanced tabular datasets
covering binary and multi-class tasks. We further carry out an ablation
study to analyze the performance improvement of each component in
SCL-TPE.
4
The rest of this article is organized as follows. In Section 2, a brief
review of previous research targeting the imbalanced learning problem is
described. We also describe SCL and TPE Bayesian optimization as the
theoretical foundation of the proposed SCL-TPE method. Section 3 presents
the proposed method in detail. Section 4 evaluates the proposed method by
conducting experiments on some highly imbalanced datasets. Finally, the
main conclusions of this work are drawn and discussed in Section 5.
2. Related work and background theory
Figure 1: The three methods of imbalanced learning.
2.1. Methods for imbalanced learning
This subsection briefly reviews the related work on imbalanced learning
methods. As shown in Fig. 1, countermeasures to mitigating class imbalance
issues can be divided into three categories: methods based on data prepro-
cessing, methods based on feature learning, and methods based on classifier
design.
5
摘要:

SupervisedContrastiveLearningwithTree-StructuredParzenEstimatorBayesianOptimizationforImbalancedTabularDataShutingTao,PengPeng,QiLi,HongweiWangZhejiangUniversity-UniversityofIllinoisatUrbana-ChampaignInstitute,ZhejiangUniveristy,718EastHaizhouRoad,Haining,314400,ChinaAbstractClassimbalancehasadetr...

展开>> 收起<<
Supervised Contrastive Learning with Tree-Structured Parzen Estimator Bayesian Optimization for Imbalanced Tabular Data.pdf

共28页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:28 页 大小:3.28MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 28
客服
关注