TABMIXER EXCA VATING LABEL DISTRIBUTION LEARNING WITH SMALL-SCALE FEATURES Weiyi Cong Zhuoran Zheng and Xiuyi Jia

2025-05-02 0 0 287.75KB 6 页 10玖币
侵权投诉
TABMIXER: EXCAVATING LABEL DISTRIBUTION LEARNING WITH SMALL-SCALE
FEATURES
Weiyi Cong, Zhuoran Zheng and Xiuyi Jia
CSE, Nanjing University of Science and Technology
ABSTRACT
Label distribution learning (LDL) differs from multi-label
learning which aims at representing the polysemy of in-
stances by transforming single-label values into descriptive
degrees. Unfortunately, the feature space of the label distri-
bution dataset is affected by human factors and the inductive
bias of the feature extractor causing uncertainty in the feature
space. Especially, for datasets with small-scale feature spaces
(the feature space dimension the label space), the existing
LDL algorithms do not perform well. To address this issue,
we seek to model the uncertainty augmentation of the feature
space to alleviate the problem in LDL tasks. Specifically, we
start with augmenting each feature value in the feature vector
of a sample into a vector (sampling on a Gaussian distribution
function). Which, the variance parameter of the Gaussian dis-
tribution function is learned by using a sub-network, and the
mean parameter is filled by this feature value. Then, each fea-
ture vector is augmented to a matrix which is fed into a mixer
with local attention (TabMixer) to extract the latent feature.
Finally, the latent feature is squeezed to yield an accurate
label distribution via a squeezed network. Extensive experi-
ments verify that our proposed algorithm can be competitive
compared to other LDL algorithms on several benchmarks.
Index TermsLabel distribution learning, uncertainty
augmenting, Gaussian distribution function, TabMixer.
1. INTRODUCTION
During the development of machine learning tasks, label dis-
tribution learning (LDL) [1] is an important machine learning
paradigm that leverages a function to map a single instance
to a set of labels (labels are represented in the form of de-
scriptive degrees and the sum of the descriptive degrees is
1). Unlike the multi-label learning paradigm, the LDL con-
veys a richer semantic content in terms of characterizing the
instance’s emotions [2,3] and estimating the learning task’s
uncertainty [46].
Although several classical LDL algorithms [1,716] are
proposed to tackle the task of modeling the feature space into
the label space, these algorithms usually favor an accurate and
ample feature space. Briefly, these algorithms expect to con-
duct a process of condensing the representation space rather
Corresponding authors.
than augmenting it. Here, we define a lemma that a feature
space of dimension the label space is a small-scale fea-
ture space. A shred of evidence is that almost all the pro-
posed LDL algorithms report weak performance on bench-
mark datasets with a large number of labels in many stud-
ies. So far, we draw two questions about this: 1) For the la-
bel space, is it difficult for the comparatively small amount
of feature information to provide the algorithm with effec-
tive features to regress an accurate label distribution? 2)
For the feature space, are there artificial reasons and un-
certainty of the feature extractor that cause the low quality
of the feature space? Unfortunately, we cannot parse the
existing LDL dataset because the details of feature process-
ing are blind-boxed. Further, we want to boost the feature
dimension and infer the uncertainty of the feature space by
tapping into expert knowledge is costly. To solve the above
two problems, we propose a feature augmentation technique
with uncertainty awareness enforced on TabMixer (Tabular
MLP-Mixer) to learn an LDL dataset with small-scale fea-
tures. Note that our network treats tabular data equally and
does not distinguish between logical and continuous values.
Overall, our approach can be grouped into the following
learning cohorts. First, to augment the feature space, an MLP-
based sub-network (Learner) is created to learn the variance
of a Gaussian function. This Learner inputs the raw feature
vectors and then assigns a unique variance value to each of
the feature values in the raw feature vectors. Combining the
above, we can design a Gaussian function for each element
in the raw feature space by taking the feature value as the
mean value and using Gaussian sampling to obtain a vector
to replace that element (the time seed is fixed in the model
training phase). By now, our input pattern is evolved from
1D to 2D and can be pseudo-considered as a grayscale map.
Subsequently, the augmented feature information is fed into
TabMixer, where each linear layer shortcut in TabMixer is
a convolution operator to capture the local characteristics of
the features. Finally, the output feature map is squeezed by
the squeezed network to obtain an accurate label distribution,
where the floodgates of the network use softmax. The net-
work is trained using the loss function of only L1 and K-L di-
vergence. We use two standard and a synthetic benchmark to
evaluate our approach and other comparative algorithms, and
the experimental results verify that the proposed algorithm is
arXiv:2210.13852v1 [cs.LG] 25 Oct 2022
Input
feature
Learner
New feature space
Attention
Linear
TabMixer
Prediction
Target
Gaussian
sampling
Softmax
Feature augmentation with
uncertainty awareness
Learning label distribution
LMResidual
Target
L1
KL
LA
MLP
X
X
LMResidual
Enhanced feature space
Fig. 1.Our architecture. Our algorithm aims to regress the label distribution of a sample using TabMixer, where there are
two key approaches, one is to augment the feature space by modeling uncertainty, another one is to obtain an accurate labeling
distribution by mixed learning, and in addition, randomness is also considered.
still robust under fully supervised and noisy conditions. Fur-
thermore, since there are random sessions in the network con-
tent, we considered a pre-training manner to eliminate this
random consistency. This paper has two key contributions,i)
We propose a novel one-stop feature augmentation-learning
solution executed on LDL datasets with small-scale features.
ii) We develop a deep network (TabMixer) that takes into ac-
count both local and global information and a new synthetic
dataset.
2. RELATED WORK
Label distribution learning. Geng et al. [1] pioneered a new
machine learning paradigm: LDL, which conveys richer se-
mantics by converting labels into descriptive degrees. Sub-
sequently, numerous studies [1,416] are opened for LDL
tasks, which involve both applications and pure theory. One
of the papers [4] is very interested in modeling the uncertainty
of the label distribution values via deep networks. Inspired by
this, we address the feature space at small scales to model un-
certainty to offer richer materials for downstream models.
Tabular Learning. Recently there is extensive work [1721]
being proposed to model on tabular datasets. However, these
methods are usually known for the characteristics of the ta-
ble’s attributes. Inspired by TransTab [21], we seek to use
MLP to globally model on tabular datasets. Furthermore, to
enhance the modeling capability of the whole model, induc-
tive bias based on convolutional operators is also fused in the
network. The architecture of the whole model is thanks to
MLP-Mixer [22].
3. PROPOSED METHOD
The architecture of our approach is shown in Fig. 1. Our
approach can be described as a two-stage tactic in an end-
to-end manner. The first stage is feature augmentation with
uncertainty awareness, which aims at the re-representation of
the input information by embedding prior knowledge. The
purpose of the second stage is to learn the label distribution
with the help of an TabMixer in a new feature space. Fur-
thermore, we introduce the training strategy (loss functions)
of the model and a regularization scheme (elimination of the
random consistency) at the end of this section.
Feature augmentation with uncertainty awareness. Given
an input feature space X Rm×n(mis the number of in-
stances and nis the dimension of features), we assume the
existence of Gaussian noise Nin this space [23]. In other
words, we augment a single feature value and must consider
that the source of this value may be a Gaussian distribution.
The Gaussian function has two key parameters (µand σ),
which can be formalized as:
N=1
2πσ e(xµ)2
2σ2, x ∼ X.
So far, our feature augmentation method with uncertainty
is based on this a priori assumption to provide more mate-
rial for the downstream network. The following describes the
pipeline for this method.
For a single sample V, we develop a Learner to adapt a
variance σito each element Viin this sample. Learner is
consist of three linear layers and three activation layers, each
of which utilizes the ReLU operator except for the last layer.
The last layer of the network layer uses sigmoid and the di-
mensionality of the output layer is the same as the input layer.
Next, a sampling action is conducted where we need to con-
struct a Gaussian distribution function Nifor each element
Vi. We construct the two parameters of the Gaussian function
Niusing the studied variance σiand the feature value Viof
this sample, respectively. Then, the execution adopts oper-
ations on these Gaussian functions, the number of sampling
points is consistent with the dimensionality of the samples,
摘要:

TABMIXER:EXCAVATINGLABELDISTRIBUTIONLEARNINGWITHSMALL-SCALEFEATURESWeiyiCong,ZhuoranZhengandXiuyiJiaCSE,NanjingUniversityofScienceandTechnologyABSTRACTLabeldistributionlearning(LDL)differsfrommulti-labellearningwhichaimsatrepresentingthepolysemyofin-stancesbytransformingsingle-labelvaluesintodescri...

展开>> 收起<<
TABMIXER EXCA VATING LABEL DISTRIBUTION LEARNING WITH SMALL-SCALE FEATURES Weiyi Cong Zhuoran Zheng and Xiuyi Jia.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:287.75KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注