TABMIXER EXCA VATING LABEL DISTRIBUTION LEARNING WITH SMALL-SCALE FEATURES Weiyi Cong Zhuoran Zheng and Xiuyi Jia

2025-05-02 0 0 287.75KB 6 页 10玖币

侵权投诉

TABMIXER: EXCAVATING LABEL DISTRIBUTION LEARNING WITH SMALL-SCALE

FEATURES

Weiyi Cong, Zhuoran Zheng and Xiuyi Jia∗

CSE, Nanjing University of Science and Technology

ABSTRACT

Label distribution learning (LDL) differs from multi-label

learning which aims at representing the polysemy of in-

stances by transforming single-label values into descriptive

degrees. Unfortunately, the feature space of the label distri-

bution dataset is affected by human factors and the inductive

bias of the feature extractor causing uncertainty in the feature

space. Especially, for datasets with small-scale feature spaces

(the feature space dimension ≈the label space), the existing

LDL algorithms do not perform well. To address this issue,

we seek to model the uncertainty augmentation of the feature

space to alleviate the problem in LDL tasks. Speciﬁcally, we

start with augmenting each feature value in the feature vector

of a sample into a vector (sampling on a Gaussian distribution

function). Which, the variance parameter of the Gaussian dis-

tribution function is learned by using a sub-network, and the

mean parameter is ﬁlled by this feature value. Then, each fea-

ture vector is augmented to a matrix which is fed into a mixer

with local attention (TabMixer) to extract the latent feature.

Finally, the latent feature is squeezed to yield an accurate

label distribution via a squeezed network. Extensive experi-

ments verify that our proposed algorithm can be competitive

compared to other LDL algorithms on several benchmarks.

Index Terms—Label distribution learning, uncertainty

augmenting, Gaussian distribution function, TabMixer.

1. INTRODUCTION

During the development of machine learning tasks, label dis-

tribution learning (LDL) [1] is an important machine learning

paradigm that leverages a function to map a single instance

to a set of labels (labels are represented in the form of de-

scriptive degrees and the sum of the descriptive degrees is

1). Unlike the multi-label learning paradigm, the LDL con-

veys a richer semantic content in terms of characterizing the

instance’s emotions [2,3] and estimating the learning task’s

uncertainty [4–6].

Although several classical LDL algorithms [1,7–16] are

proposed to tackle the task of modeling the feature space into

the label space, these algorithms usually favor an accurate and

ample feature space. Brieﬂy, these algorithms expect to con-

duct a process of condensing the representation space rather

∗Corresponding authors.

than augmenting it. Here, we deﬁne a lemma that a feature

space of dimension ≈the label space is a small-scale fea-

ture space. A shred of evidence is that almost all the pro-

posed LDL algorithms report weak performance on bench-

mark datasets with a large number of labels in many stud-

ies. So far, we draw two questions about this: 1) For the la-

bel space, is it difﬁcult for the comparatively small amount

of feature information to provide the algorithm with effec-

tive features to regress an accurate label distribution? 2)

For the feature space, are there artiﬁcial reasons and un-

certainty of the feature extractor that cause the low quality

of the feature space? Unfortunately, we cannot parse the

existing LDL dataset because the details of feature process-

ing are blind-boxed. Further, we want to boost the feature

dimension and infer the uncertainty of the feature space by

tapping into expert knowledge is costly. To solve the above

two problems, we propose a feature augmentation technique

with uncertainty awareness enforced on TabMixer (Tabular

MLP-Mixer) to learn an LDL dataset with small-scale fea-

tures. Note that our network treats tabular data equally and

does not distinguish between logical and continuous values.

Overall, our approach can be grouped into the following

learning cohorts. First, to augment the feature space, an MLP-

based sub-network (Learner) is created to learn the variance

of a Gaussian function. This Learner inputs the raw feature

vectors and then assigns a unique variance value to each of

the feature values in the raw feature vectors. Combining the

above, we can design a Gaussian function for each element

in the raw feature space by taking the feature value as the

mean value and using Gaussian sampling to obtain a vector

to replace that element (the time seed is ﬁxed in the model

training phase). By now, our input pattern is evolved from

1D to 2D and can be pseudo-considered as a grayscale map.

Subsequently, the augmented feature information is fed into

TabMixer, where each linear layer shortcut in TabMixer is

a convolution operator to capture the local characteristics of

the features. Finally, the output feature map is squeezed by

the squeezed network to obtain an accurate label distribution,

where the ﬂoodgates of the network use softmax. The net-

work is trained using the loss function of only L1 and K-L di-

vergence. We use two standard and a synthetic benchmark to

evaluate our approach and other comparative algorithms, and

the experimental results verify that the proposed algorithm is

arXiv:2210.13852v1 [cs.LG] 25 Oct 2022

Input

feature

Learner

New feature space

Attention

Linear

TabMixer

Prediction

Target

Gaussian

sampling

Softmax

Feature augmentation with

uncertainty awareness

Learning label distribution

LMResidual

Target

MLP

LMResidual

Enhanced feature space

Fig. 1.Our architecture. Our algorithm aims to regress the label distribution of a sample using TabMixer, where there are

two key approaches, one is to augment the feature space by modeling uncertainty, another one is to obtain an accurate labeling

distribution by mixed learning, and in addition, randomness is also considered.

still robust under fully supervised and noisy conditions. Fur-

thermore, since there are random sessions in the network con-

tent, we considered a pre-training manner to eliminate this

random consistency. This paper has two key contributions,i)

We propose a novel one-stop feature augmentation-learning

solution executed on LDL datasets with small-scale features.

ii) We develop a deep network (TabMixer) that takes into ac-

count both local and global information and a new synthetic

dataset.

2. RELATED WORK

Label distribution learning. Geng et al. [1] pioneered a new

machine learning paradigm: LDL, which conveys richer se-

mantics by converting labels into descriptive degrees. Sub-

sequently, numerous studies [1,4–16] are opened for LDL

tasks, which involve both applications and pure theory. One

of the papers [4] is very interested in modeling the uncertainty

of the label distribution values via deep networks. Inspired by

this, we address the feature space at small scales to model un-

certainty to offer richer materials for downstream models.

Tabular Learning. Recently there is extensive work [17–21]

being proposed to model on tabular datasets. However, these

methods are usually known for the characteristics of the ta-

ble’s attributes. Inspired by TransTab [21], we seek to use

MLP to globally model on tabular datasets. Furthermore, to

enhance the modeling capability of the whole model, induc-

tive bias based on convolutional operators is also fused in the

network. The architecture of the whole model is thanks to

MLP-Mixer [22].

3. PROPOSED METHOD

The architecture of our approach is shown in Fig. 1. Our

approach can be described as a two-stage tactic in an end-

to-end manner. The ﬁrst stage is feature augmentation with

uncertainty awareness, which aims at the re-representation of

the input information by embedding prior knowledge. The

purpose of the second stage is to learn the label distribution

with the help of an TabMixer in a new feature space. Fur-

thermore, we introduce the training strategy (loss functions)

of the model and a regularization scheme (elimination of the

random consistency) at the end of this section.

Feature augmentation with uncertainty awareness. Given

an input feature space X ∈ Rm×n(mis the number of in-

stances and nis the dimension of features), we assume the

existence of Gaussian noise Nin this space [23]. In other

words, we augment a single feature value and must consider

that the source of this value may be a Gaussian distribution.

The Gaussian function has two key parameters (µand σ),

which can be formalized as:

N=1

√2πσ e−(x−µ)2

2σ2, x ∼ X.

So far, our feature augmentation method with uncertainty

is based on this a priori assumption to provide more mate-

rial for the downstream network. The following describes the

pipeline for this method.

For a single sample V, we develop a Learner to adapt a

variance σito each element Viin this sample. Learner is

consist of three linear layers and three activation layers, each

of which utilizes the ReLU operator except for the last layer.

The last layer of the network layer uses sigmoid and the di-

mensionality of the output layer is the same as the input layer.

Next, a sampling action is conducted where we need to con-

struct a Gaussian distribution function Nifor each element

Vi. We construct the two parameters of the Gaussian function

Niusing the studied variance σiand the feature value Viof

this sample, respectively. Then, the execution adopts oper-

ations on these Gaussian functions, the number of sampling

points is consistent with the dimensionality of the samples,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TABMIXER:EXCAVATINGLABELDISTRIBUTIONLEARNINGWITHSMALL-SCALEFEATURESWeiyiCong,ZhuoranZhengandXiuyiJiaCSE,NanjingUniversityofScienceandTechnologyABSTRACTLabeldistributionlearning(LDL)differsfrommulti-labellearningwhichaimsatrepresentingthepolysemyofin-stancesbytransformingsingle-labelvaluesintodescri...

展开>> 收起<<

TABMIXER EXCA VATING LABEL DISTRIBUTION LEARNING WITH SMALL-SCALE FEATURES Weiyi Cong Zhuoran Zheng and Xiuyi Jia.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

TABMIXER EXCA VATING LABEL DISTRIBUTION LEARNING WITH SMALL-SCALE FEATURES Weiyi Cong Zhuoran Zheng and Xiuyi Jia

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: