SAT Improving Semi-Supervised Text Classification with Simple Instance-Adaptive Self-Training Hui Chen Wei Han Soujanya Poria

2025-05-03 0 0 630.29KB 6 页 10玖币
侵权投诉
SAT: Improving Semi-Supervised Text Classification with Simple
Instance-Adaptive Self-Training
Hui Chen Wei Han Soujanya Poria
Singapore University of Technology and Design
{hui_chen, wei_han}@mymail.sutd.edu.sg
sporia@sutd.edu.sg
Abstract
Self-training methods have been explored in
recent years and have exhibited great perfor-
mance in improving semi-supervised learn-
ing. This work presents a Simple instance-
Adaptive self-Training method (SAT) for semi-
supervised text classification. SAT first gener-
ates two augmented views for each unlabeled
data and then trains a meta-learner to auto-
matically identify the relative strength of aug-
mentations based on the similarity between the
original view and the augmented views. The
weakly-augmented view is fed to the model
to produce a pseudo-label and the strongly-
augmented view is used to train the model to
predict the same pseudo-label. We conducted
extensive experiments and analyses on three
text classification datasets and found that with
varying sizes of labeled training data, SAT
consistently shows competitive performance
compared to existing semi-supervised learning
methods. Our code can be found at https:
//github.com/declare-lab/SAT.git.
1 Introduction
Pretrained language models have achieved ex-
tremely good performance in a wide range of nat-
ural language understanding tasks (Devlin et al.,
2019). However, such performance often has a
strong dependence on large-scale high-quality su-
pervision. Since labeled linguistic data needs large
amounts of time, money, and expertise to obtain,
improving models’ performance in few-shot sce-
narios (i.e., there are only a few training examples
per class) has become a challenging research topic.
Semi-supervised learning in NLP has received
increasing attention in improving performance in
few-shot scenarios, where both labeled data and
unlabeled data are utilized (Berthelot et al.,2019b;
Sohn et al.,2020;Li et al.,2021). Recently, sev-
eral self-training methods have been explored to
obtain task-specific information in unlabeled data.
UDA (Xie et al.,2020) applied data augmentations
to unlabeled data and proposed an unsupervised
consistency loss that minimizes the divergence be-
tween different unlabeled augmented views. To
give self-training more supervision, MixText (Chen
et al.,2020a;Berthelot et al.,2019b) employed
Mixup (Zhang et al.,2018;Chen et al.,2022) to
learn an intermediate representation of labeled and
unlabeled data. Both UDA and MixText utilized
consistency regularization and confirmed that such
regularization exhibits outstanding performance in
semi-supervised learning. To simplify the consis-
tency regularization process, FixMatch (Sohn et al.,
2020) classified two unlabeled augmented views
into a weak view and a strong view, and then mini-
mized the divergence between the probability dis-
tribution of the strong view and the pseudo label
of the weak view. However, in NLP, it is hard to
distinguish the relative strength of augmented text
by observation, and randomly assigning an aug-
mentation strength will limit the performance of
FixMatch on text.
To tackle this problem in FixMatch, our paper in-
troduces an instance-adaptive self-training method
SAT, where we propose two criteria based on a
classifier and a scorer to automatically identify the
relative strength of augmentations on text. Our
main contributions are:
First, we apply popular data augmentation
techniques to generate different views of un-
labeled data and design two novel criteria to
calculate the similarity between the original
view and the augmented view of unlabeled
data in FixMatch, boosting its performance
on text.
We then conduct empirical experiments and
analyses on three few-shot text classification
datasets. Experimental results confirm the
efficacy of our SAT method.
arXiv:2210.12653v1 [cs.CL] 23 Oct 2022
2 Method
2.1 Problem Setting
In this work, we learn a model to map an input
x
X
onto a label
y∈ Y
in text classification tasks.
In semi-supervised learning, we use both labeled
examples and unlabeled examples during training.
Let
X={(xb, yb) : b(1, ..., B)}
be a batch
of
B
labeled examples, where
xb
are the training
examples and
yb
are labels. Let
U={ub:b
(1, ..., µB)}
be a batch of
µB
unlabeled examples,
where
µ
is a hyperparameter which determines the
relative sizes of Xand U.
2.2 SAT
The entire process of SAT is illustrated in Algo-
rithm 1. Similar to common semi-supervised learn-
ing methods, our approach consists of a supervised
part and an unsupervised part. Our supervised part
minimizes the cross-entropy loss between the la-
beled data and their targets. Our unsupervised part
first generates two unlabeled augmented views,
then applies an augmentation choice network to
determine the relative augmentation strength, and
finally calculates a consistency loss between the
probability distribution of the strongly-augmented
view and the pseudo label of the weakly-augmented
view. Since the relative augmentation strength in
our SAT method has no direct correlation to the aug-
mentation techniques, our semi-supervised learn-
ing process can be more adaptive to the training
data, compared to FixMatch.
The augmentation choice network is trained by
the labeled data and we design two criteria to train
it where (1) one is based on a
classifier
and (2) the
other is based on a
scorer
. Line 2to Line 7in Al-
gorithm 1shows how we train the augmentation
choice network. For each labeled data, we first cal-
culate the similarity between the original data and
its augmented variants, respectively, and then rank
the augmented samples according to the similarity
scores. In our classifier-based criterion, we employ
a
cross-entropy loss
to measure the distance, while
in our scorer-based criterion, we calculate the
co-
sine similarity
. Afterward, we define the one with
a higher similarity score as the weakly-augmented
sample and use it to train the augmentation choice
network. For our classifier-based method, we ap-
ply a
cross-entropy loss
as the training objective.
For our scorer-based method, we use a
contrastive
loss
(Chen et al.,2020b) to update the network.
Finally, the trained augmentation choice network
is used to automatically identify the augmentation
strength in unlabeled data.
Algorithm 1:
SAT: Simple Instance-
Adaptive Self-Training
Input: Dtrain ={X ,U} where
X={(xb, yb) : b(1, ..., B)}and
U={ub:b(1, ..., µB)};
augmentation methods
α1
,
α2
; main
network f(; θ)with parameters θ
and its probability distribution p;
augmentation choice network
G(; θG)
with parameters
θG
; criteria
C,Γ; cross-entropy loss H;
unlabeled loss weight λu;
confidence threshold τ; learning
rates β, η
Output: Updated network weights θ
// Calculate supervised loss
1ls=1
BPB
b=1 H(yb, p(y|xb))
2for (xb, yb)∈ X do
3ib
1, ib
2=
C(α1(xb), xb, yb),C(α2(xb), xb, yb)
4ib
w, ib
s= Descending(ib
1, ib
2)
5end
// Update the augmentation choice
network
6laug_choice =
1
BPB
b=1 Γ(xb, α1(xb), α2(xb), ib
w)
7θG=θGβlaug_choice
8for each ub∈ U do
9ˆ
ib
w,ˆ
ib
s=G(ub, α1(ub), α2(ub); θG)
10 end
// Calculate unsupervised loss
11 lu=1
µB PµB
b=1 1{max(p(y|αˆ
ib
w(ub))) >
τ}H(argmax(p(y|αˆ
ib
w(ub))), p(y|αˆ
ib
s(ub)))
// Total loss: add up supervised
loss and unsupervised loss
12 ltotal =ls+λulu
// Update the main network
13 θ=θηltotal
3 Experimental Setup
We conducted empirical experiments to compare
our approach with a couple of existing semi-
supervised learning methods on a variety of text
classification benchmark datasets.
摘要:

SAT:ImprovingSemi-SupervisedTextClassicationwithSimpleInstance-AdaptiveSelf-TrainingHuiChenWeiHanSoujanyaPoriaSingaporeUniversityofTechnologyandDesign{hui_chen,wei_han}@mymail.sutd.edu.sgsporia@sutd.edu.sgAbstractSelf-trainingmethodshavebeenexploredinrecentyearsandhaveexhibitedgreatperfor-manceinim...

展开>> 收起<<
SAT Improving Semi-Supervised Text Classification with Simple Instance-Adaptive Self-Training Hui Chen Wei Han Soujanya Poria.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:630.29KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注