Improving Imbalanced Text Classification with Dynamic Curriculum Learning Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao

2025-05-08 0 0 618.76KB 6 页 10玖币
侵权投诉
Improving Imbalanced Text Classification with
Dynamic Curriculum Learning
Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Ping An Technology (Shenzhen) Co., Ltd.
Abstract—Recent advances in pre-trained language models
have improved the performance for text classification tasks.
However, little attention is paid to the priority scheduling strategy
on the samples during training. Humans acquire knowledge
gradually from easy to complex concepts, and the difficulty of
the same material can also vary significantly in different learning
stages. Inspired by this insights, we proposed a novel self-paced
dynamic curriculum learning (SPDCL) method for imbalanced
text classification, which evaluates the sample difficulty by both
linguistic character and model capacity. Meanwhile, rather than
using static curriculum learning as in the existing research, our
SPDCL can reorder and resample training data by difficulty
criterion with an adaptive from easy to hard pace. The extensive
experiments on several classification tasks show the effectiveness
of SPDCL strategy, especially for the imbalanced dataset.
Index Terms—efficient curriculum learning, imbalanced text
classification, self-paced learning, data augmentation, nuclear-
norm
I. INTRODUCTION
Recent attention has been devoted to various training tasks
for large neural networks pre-training, deep model architectures
and model compression [
1
5
]. Imbalanced training data is
common in real applications, which poses challenges for text
classifiers. In this work, we study how to leverage curriculum
learning (CL) to tackle the imbalanced text classification.
CL [
6
], which mimics the human learning process from
easy to hard concepts, can improve the generalization ability
and convergence rate of deep neural networks. However,
such strategies have been largely ignored for imbalanced text
classification tasks due to the fact that difficulty metrics like
entropy are static and fixed. So these metrics ignore that the
difficulty of data samples always vary while training. Therefore
the sample importance weights should be adaptive rather than
fixed. To address this challenge, we proposed the SPDCL
framework, which utilizes both the nuclear-norm and model
capacity to evaluate the difficulty of training samples. Nuclear-
norm, the sum of singular values, is used to constrain the low
state of the matrix. From linear algebra, a matrix is low-norm
and contains a large amount of data information, which can be
used to restore data and extract features. From our experiments,
the nuclear norm of the training sample is a process of first
decreasing and then increasing until stable fluctuation from
pre-training to fine-tuning stage.
That is to say, for the sample, it is the process of firstly
removing the noise and then learning the deeper semantic
features of the specific scene data. For samples whose noise
Corresponding author: Jianzong Wang, jzwang@188.com.
points are far from the normal value or whose feature points are
well distinguished, the nuclear norm changes more drastically,
on the contrary, the nuclear-norm changes more smoothly.
Specifically, simple examples are easier to recognize along with
slight changes in sentence features during different training
time, the nuclear-norm of which change more drastically.
For different training epochs, we dynamically adopt their
corresponding nuclear-norm production to calculate a difficulty
score for each example in the train set. In each training phase,
we will calculate the change in the kernel norm of the samples
based on the relative position at the current moment, and
update the samples as a new difficulty level. After reordering
the samples in order from simple to complex in each round,
we resampled the data to achieve the goal of learning simple
samples first and then gradually increasing the difficulty of
learning.
Our contribution can be summarized as follows: 1) We
proposed an adaptive sentence difficulty criterion, consisting
of both linguistically motivated features and learning and
task-dependent features. 2)We proposed a novel dynamic CL
strategy that consists of re-calculating the difficulty criterion
and resampling each epoch. 3)We prove the effectiveness of
SPDCL in the context of fine-tuning language model for a
series of text classification tasks including multi classification,
multi label classification and text matching.
II. METHODOLOGY
To elaborate our methods, we utilize the BERT-Base model
[
7
] as our example. The same method is also compatible to
many other encoders, such as RoBERTa [
8
], etc. Following
BERT, the output vector of the
[CLS]
token can be used for
classification and other tasks since it can represent the kernel
feature of all the sentence. So, we just add a linear layer on the
[CLS]
sentence representation, then fine-tune the entire model
with our CL strategy. But before each iteration, we would
calculate the difficulty metric on the whole tokens along with
each epoch rather than only on
[CLS]
token to measure the
difficulty score of each input sample.
Formally, when we input the text
sequence
{[CLS], x1, . . . , xm,[SEP]}
or
{[CLS], x11, . . . , x1m,[SEP], x21, . . . , x2n,[SEP]}
into the model, we can get the output
token matrix
{hCLS, hx1, . . . , hxm, hCLS}
or
{hCLS, hx11 , . . . , hx1m, hSEP, hx21 , . . . , hx2n, hSEP}
. Then,
we compute the nuclear-norm for the whole token-level
representations for each sample per epoch while training that
arXiv:2210.14724v1 [cs.CL] 25 Oct 2022
TABLE I
THE RELATIONSHIP BETWEEN TEXT LENGTH AND NUCLEAR-NORM OF
SENTENCE OF TRAIN DATA ON THE AAPD DATASET OBTAINED FROM
PRE-TRAINED MODEL BEFORE FINE-TUNED PROCEDURE. SENTENCE BINS
ARE SORTED BY SENTENCE DIFFICULTY BASED ON LINGUISTIC
CHARACTER.IT IS OBVIOUS THAT LONGER SENTENCES THAT CONTAINING
MORE LEXICAL-SEMANTIC AND SYNTACTIC INFORMATION TEND TO HAVE
LARGER NUCLEAR-NORM VALUE BEFORE FINE-TUNING PROCESS.
Bins 1 2 3 4 5 6
Avg text length 440 702 881 1052 1260 1594
represents the present information capacity. Before fine-tuning,
the text representation obtained from the pre-trained model
contains some redundant information, especially in some
specific domain datasets. In the fine-tuning phase, the model
further gets better text representation for specific tasks,
specifically a process of removing redundant features to
extract critical information and then further learning more
finely granular features. In fact, in our experiments, we also
find the phenomenon that nuclear-norm of samples firstly
decreased and then gradually increased. Simple samples vary
more dramatically, and complex samples are more difficult to
learn about fine-grained and high-dimensional characteristics.
We calculate the difference between the current nuclear-norm
and the previous nuclear-norm for each sample, and sorted
with an easy-to-difficult fashion per epoch with our curriculum
learning strategy.
Finally, we regard the final layer embedding of first token,
[CLS], as the representation of the whole input sequence. On
the one hand, in sequence classification and text matching
tasks, we put the last layer matrix into a softmax activation
function to predict its category label. On the other hand, in
multi label classification tasks, we take a general approach to
verifying the generality of the CL strategy.
A. Self-paced Dynamic Curriculum Learning
In this section, we would introduce our training strategy, Self-
Paced Dynamic Curriculum Learning (SPDCL),the process of
which is shown in Figure 1. More details about the pseudo-code
can be seen in Algorithm 1. In general, curriculum learning
contains two key modules, difficulty criterion and curriculum
arrangement. In our method, we divide the difficulty criterion
into two modules, linguistic difficulty criterion before training
and difficulty criterion based on model capacity while training.
Since we think the curriculum arrangement should change
along with difficulty criterion. At first, difficulty criterion relies
more on linguistic character when the model has not seen the
data of downstream task and the feedback of the current model
is not reliable. Later, when going into the formal training, the
current state of the model itself should plays a more important
role for difficulty criterion. What’s more, there are always
associations between difficulty criteria that are not independent
in every time period.
Order by lingustic charact er
Order by model capicity
Sampling
Absolut e Value d Algebras of
difference based on re lative
positions
bert
model1
train data
index:
987436125
sample
order
123456789
sample
order
897612345
bert
model2
sample
order
712689345
-bert
model3
sample
order
345712689
-bert
model4
-
SVD
SVD
SVD
pretrained
model SVD
-
Fig. 1. Structure of SPDCL.It contains two key modules, difficulty criterion
and curriculum arrangement. In our method, we divide the difficulty criterion
into two modules, linguistic difficulty criterion before training and difficulty
criterion based on model capacity while training.
B. Linguistic difficulty criterion
The difficulty of sample sentence shows that it can be divided
into two parts: heuristics metrics like sentence length, word
frequency, and model based metrics of the specific task, which
can be loss, accuracy, precision and F1 score, etc. [
9
,
10
].
However, these methods divide linguistic character and model
based character of a textual sample that both vital to a textual
sample in different stages. For example, in the early stage, the
teacher can rely more on linguistic character. According to
the linguistic character, most NLP systems have been taking
advantage of distributed word embeddings to capture the
syntactic features of a word both in the direction and the
norm of a word vector. While, the norm of the sentence matrix
is rarely considered and explored in the computation of the
difficulty metric for a textual sample. In contrast, the traditional
word based difficulty metric does not have a particularly good
way to directly obtain the difficulty metric of sentence-level
representations, which generally average or sum all word norm,
but this is not equivalent to the metric of representation of the
modeled sentences displayed. Taking into account the polysemy
and location of the word in the sentence and the deeper syntactic
structure, semantic relationships, we proposed nuclear-norm
based sentence difficulty criterion.
The nuclear-norm is the sum of the singular values of the
matrix, a classic representation of the amount of information
that plays a vital role in natural language processing tasks[
11
].In
our previous experiment, we also find consistent phenomenon
that the nuclear-norm of the sentence is related to the length
of sentence shown in Table I. Specially, longer sentences tend
to have larger nuclear-norm values. At the initial learning
stage when the model weights obtained from pre-trained model
that has not yet seen the train dataset for downstream tasks,
linguistic characters of a textual sample itself matters how hard
it is for the downstream task. In our opinion, it is easier to
train and predict the sentences with smaller nuclear-norm, as
those sentences consist less information and easier syntactic
structures. Compared with those sentences with smaller nuclear-
norm, those harder sentences always with larger nuclear-norm
that require being able to recognize the semantics of their
component parts, it is also necessary to identify the syntactic
摘要:

ImprovingImbalancedTextClassicationwithDynamicCurriculumLearningXulongZhang,JianzongWang,NingCheng,JingXiaoPingAnTechnology(Shenzhen)Co.,Ltd.Abstract—Recentadvancesinpre-trainedlanguagemodelshaveimprovedtheperformancefortextclassicationtasks.However,littleattentionispaidtothepriorityschedulingstr...

展开>> 收起<<
Improving Imbalanced Text Classification with Dynamic Curriculum Learning Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:618.76KB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注