Supervised Contrastive Learning with Tree-Structured Parzen Estimator Bayesian Optimization for Imbalanced Tabular Data

2025-05-02 0 0 3.28MB 28 页 10玖币

侵权投诉

Supervised Contrastive Learning with Tree-Structured

Parzen Estimator Bayesian Optimization for

Imbalanced Tabular Data

Shuting Tao, Peng Peng∗, Qi Li, Hongwei Wang∗

Zhejiang University - University of Illinois at Urbana-Champaign Institute, Zhejiang

Univeristy, 718 East Haizhou Road, Haining, 314400, China

Abstract

Class imbalance has a detrimental eﬀect on the predictive performance of

most supervised learning algorithms as the imbalanced distribution can lead

to a bias preferring the majority class. To solve this problem, we propose a

Supervised Contrastive Learning (SCL) method with Tree-structured Parzen

Estimator (TPE) technique for imbalanced tabular datasets. Contrastive

learning (CL) can extract the information hidden in data even without labels

and has shown some potential for imbalanced learning tasks. SCL further

considers the label information based on CL, which also addresses the insuﬃ-

cient data augmentation techniques of tabular data. Therefore, in this work,

we propose to use SCL to learn a discriminative representation of imbalanced

tabular data. Additionally, the hyper-parameter temperature τof SCL has

a decisive inﬂuence on the performance and is diﬃcult to tune. We intro-

duce TPE, a well-known Bayesian optimization technique, to automatically

select the best τ. Experiments are conducted on both binary and multi-class

imbalanced tabular datasets. As shown in the results obtained, TPE out-

performs three other hyper-parameter optimization (HPO) methods such as

grid search, random search, and genetic algorithm. More importantly, the

proposed SCL-TPE method achieves much-improved performance compared

with the state-of-the-art methods.

∗Corresponding author

Email addresses: 12121105@zju.edu.cn (Shuting Tao),

pengp17@mails.tsinghua.edu.cn (Peng Peng), liqi177@zju.edu.cn (Qi Li),

hongweiwang@intl.zju.edu.cn (Hongwei Wang)

Preprint submitted to Elsevier October 27, 2022

arXiv:2210.10824v2 [cs.LG] 26 Oct 2022

Keywords: Imbalanced learning, Supervised contrastive learning,

Tree-Structured Parzen Estimator, Representation learning, Deep learning

1. Introduction

With excellent performance on uniformly distributed data, supervised

learning has become the most popular method for data classiﬁcation. How-

ever, uneven distribution of data, i.e., class imbalance, is very common in

most datasets collected from real-world scenarios, which inevitably under-

mines the eﬀectiveness of supervised algorithm. This class imbalance makes

it intractable for the supervised models to represent the distribution char-

acteristics of skewed data correctly, and thus results in very low prediction

accuracy for the minority classes. A well-known example is the mammogra-

phy dataset [1], in which positive samples only account for 2.3% of the total

samples. While the prediction accuracy of the positive class is crucial in this

case, the traditional supervised learning-based classiﬁers tend to predict that

all samples are negative.

Improving classiﬁcation accuracy of both the majority and minority classes

has become a great challenge. To address this challenge, considerable solu-

tions have been put forward and they can be broadly divided into three

categories: data preprocessing, [2, 3, 4, 5], feature learning [6, 7], and classi-

ﬁer design[8, 9, 10]. For the solutions based on data preprocessing, scholars

attempt to rebalance data distribution through data sampling. In terms of

classiﬁer design, there are two kinds of methods. Speciﬁcally, the algorithm-

level methods modify algorithms to increase the low accuracy of the minor-

ity class - the most popular one is cost-sensitive learning (CSL) that uses

a weighted cost for diﬀerent classes. Another kind is the model-level meth-

ods which combine the classiﬁcation results from multiple base models like

ensemble learning.

However, the methods mentioned above have some inherent drawbacks.

For example, the data-level methods may result in losing useful information

[11] or overestimating the minority data [12]. For CSL, it is diﬃcult to

set the value of misclassiﬁcation cost which in most cases is unknown from

the data and cannot be given by experts [13]. These issues have motivated

researchers to develop strategies based on feature learning, and the existing

methods consider using autoencoders to learn imbalanced data features [7].

In this paper, we propose to use supervised contrastive learning (SCL) [14]

to extract features from imbalanced tabular datasets.

Contrastive learning (CL), a kind of self-supervised learning (SSL) [15],

has shown the ability to represent the hidden features that are not condi-

tioned on data labels in the image domain [16, 17]. CL aims to group an

anchor and a “positive” sample together in the embedding space, and di-

verse the anchor far from “negative” samples. Here “positive” sample refers

to data augmented from the anchor, while “negative” samples are randomly

chosen from small batches. It is noted that the success achieved by CL in

feature learning of images is closely related to data augmentation techniques,

such as rotation [18], colorization [19], and jigsaw puzzle solving [20]. And

most of these techniques are not applicable to general tabular data because

they heavily rely on the unique structure of the domain datasets. This brings

a great diﬃculty when using CL for imbalanced tabular learning.

In this work, we ﬁll this gap by adopting SCL to learn the representation

of imbalanced tabular data. SCL considers many positives per anchor rather

than using only a single positive. These positives are selected from samples

belonging to the same class as the anchor, instead of from data augmenta-

tions of the anchor. Embeddings of the same class are pulled closer together

than those from other classes. The utilization of label information can alle-

viate the lack of data augmentation strategy of tabular data. So the success

of contrastive loss in the image domain loss can be extended to the tabu-

lar domain using some domain-independent augmentations such as gaussian

blur.

Furthermore, SCL requires ﬁxing the hyper-parameter temperature τ

before model training, which is a crucial hyper-parameter to control the

strength of penalties on negative samples. As evident in [21], a good choice

of τcan signiﬁcantly improve the quality of feature representation. That is

to say, a good selection of τcan make the SCL achieve better performance in

imbalanced learning. However, hyper-parameter tuning is often challenging

and time-consuming. And among the current studies, little consideration

has been provided to details of the hyper-parameter tuning of τ. In this pa-

per, we demonstrate that the setting of the τsubstantially inﬂuences SCL’s

performance. We further propose to develop a ﬂexible approach that en-

ables hyper-parameter optimization (HPO) to be conducted as an automatic

process.

Classic HPO methods include grid search (GS), random search (RS), ge-

netic algorithm (GA), and Bayesian optimization (BO). GS deﬁnes a search

space as a grid of hyper-parameter values and assesses every position in the

grid. RS deﬁnes a search space as a bounded domain of hyper-parameter

values and randomly samples points in that domain. GA is based on the

concepts of biological evolution, which considers a set of possible candidate

solutions that evolves and gives a better result [22]. Compared with unin-

formed search GS and RS, BO considers the previously explored information

in each step, which reduces the search space and improves the search ef-

ﬁciency. Compared with GA, BO requires fewer computational resources.

GA needs to train the model on multiple hyper-parameters to go from one

generation to the next. In contrast, BO trains a single model and updates

the posterior information, shortening training time and not requiring many

computational resources. In general, BO has two implementations: Gaus-

sian Process (GP) and Tree-structured Parzen estimator (TPE) [23]. TPE

has been proven superior to GP since TPE’s modeling of previously explored

observations is more accurate than GP’s [23]. Therefore, we choose TPE to

select the SCL model’s best τin our work. More empirical work is shown in

Section 4 to conﬁrm our choice. More speciﬁcally, the main contributions of

this paper are listed below:

•SCL is proposed to learn an embedding space in which samples of the

same class pairs stay close to each other while samples belonging to

diﬀerent classes are far apart. For imbalanced tabular datasets, we

believe that SCL will outperform traditional supervised methods - the

reason for this is that, in addition to employing the label information,

SCL better captures data features by learning the intrinsic properties

from the data itself based on contrastive loss. Therefore, SCL will

not suﬀer from a signiﬁcant performance drop due to the “label bias”

caused by imbalanced data.

•TPE is ﬁrst used to select the best hyper-parameter temperature τfor

SCL automatically. In this paper, we demonstrate that τis critical to

SCL’s performance, and TPE is proven to produce better results than

other algorithms for hyper-parameter optimization.

•Extensive experiments are conducted to demonstrate the eﬀectiveness

of our method. We compare SCL-TPE’s performance with ten com-

petitive data sampling methods on ﬁfteen imbalanced tabular datasets

covering binary and multi-class tasks. We further carry out an ablation

study to analyze the performance improvement of each component in

SCL-TPE.

The rest of this article is organized as follows. In Section 2, a brief

review of previous research targeting the imbalanced learning problem is

described. We also describe SCL and TPE Bayesian optimization as the

theoretical foundation of the proposed SCL-TPE method. Section 3 presents

the proposed method in detail. Section 4 evaluates the proposed method by

conducting experiments on some highly imbalanced datasets. Finally, the

main conclusions of this work are drawn and discussed in Section 5.

2. Related work and background theory

Figure 1: The three methods of imbalanced learning.

2.1. Methods for imbalanced learning

This subsection brieﬂy reviews the related work on imbalanced learning

methods. As shown in Fig. 1, countermeasures to mitigating class imbalance

issues can be divided into three categories: methods based on data prepro-

cessing, methods based on feature learning, and methods based on classiﬁer

design.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SupervisedContrastiveLearningwithTree-StructuredParzenEstimatorBayesianOptimizationforImbalancedTabularDataShutingTao,PengPeng,QiLi,HongweiWangZhejiangUniversity-UniversityofIllinoisatUrbana-ChampaignInstitute,ZhejiangUniveristy,718EastHaizhouRoad,Haining,314400,ChinaAbstractClassimbalancehasadetr...

展开>> 收起<<

Supervised Contrastive Learning with Tree-Structured Parzen Estimator Bayesian Optimization for Imbalanced Tabular Data.pdf

共28页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Supervised Contrastive Learning with Tree-Structured Parzen Estimator Bayesian Optimization for Imbalanced Tabular Data

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: