Uncertainty Sentence Sampling by Virtual Adversarial Perturbation Hanshan Zhang and Zhen Zhang and Hongfei Jiang and Yang Song Zuoyebang Education Technology Beijing Co. Ltd

2025-05-06 0 0 2.2MB 10 页 10玖币
侵权投诉
Uncertainty Sentence Sampling by Virtual Adversarial Perturbation
Hanshan Zhang and Zhen Zhang and Hongfei Jiang and Yang Song
Zuoyebang Education Technology (Beijing) Co., Ltd
{zhanghanshan,zhangzhen,jianghongfei,songyang}@zuoyebang.com
Abstract
Active learning for sentence understanding at-
tempts to reduce the annotation cost by iden-
tifying the most informative examples. Com-
mon methods for active learning use either
uncertainty or diversity sampling in the pool-
based scenario. In this work, to incorpo-
rate both predictive uncertainty and sample di-
versity, we propose Virtual Adversarial Per-
turbation for Active Learning (VAPAL) , an
uncertainty-diversity combination framework,
using virtual adversarial perturbation (Miyato
et al.,2019) as model uncertainty representa-
tion. VAPAL consistently performs equally
well or even better than the strong baselines
on four sentence understanding datasets: AG-
NEWS, IMDB, PUBMED, and SST-2, offer-
ing a potential option for active learning on
sentence understanding tasks.
1 Introduction
In recent years, deep neural networks have made
a significant achievement in natural language pro-
cessing (Yang et al.,2019;Devlin et al.,2019;Raf-
fel et al.,2020;He et al.,2020). These neural
models usually need a large amount of data which
have to be labeled and trained. Active learning
is an effective way to reduce both computational
costs and human labor by selecting the most critical
examples to label.
Two samplings, uncertainty sampling and diver-
sity sampling are often used in common active
learning methods. Uncertainty sampling (Lewis
and Gale,1994) selects difficult examples based on
the model confidence score. In a batch setting, the
sampled data points are near-identical (Ash et al.,
2019). This phenomenon suggests that we might
need to take diversity into account besides uncer-
tainty. A naive combination of uncertainty and
diversity, however, negatively affects the test accu-
racy (Hsu and Lin,2015). Ash et al. (2019) presents
a practical framework, BADGE, which combines
uncertainty and diversity. BADGE measures uncer-
tainty by gradient embedding and achieves diver-
sity by clustering.
However, BADGE relies on model confidence
scores, which require a model warm-start to cali-
brate. Specifically, the correctness likelihoods do
not increase consistently with higher model confi-
dence scores. To avoid the warm-start requirement,
Yuan et al. (2020) presents ALPS, a cold-start
approach that uses self-supervised loss (Masked
Language Model Loss) as sentence representation.
Nevertheless, MLM loss can be seen as a language
model perplexity, not a direct downstream task-
related measurement.
From another point of view, deep learning mod-
els are vulnerable to adversarial examples (Good-
fellow et al.,2014;Kurakin et al.,2016), indicating
that measuring uncertainty based on model confi-
dence scores is overconfident. Virtual adversarial
training (Miyato et al.,2015,2019,2016) modifies
inputs with special perturbations, virtual adversar-
ial perturbation, which can change the output dis-
tribution of the model in the most significant way
in the sense of KL-divergence. It is tested valid
in industry-scale semi-supervised learning setting
(Chen et al.,2021).
We propose VAPAL (Virtual Adversarial Pertur-
bation for Active Learning) in this work. VAPAL
computes perturbation for each data point in an un-
labeled pool to measure model uncertainty. VAPAL
clusters the data points to achieve diversity like
BADGE and ALPS with perturbations acquired.
Since virtual adversarial perturbation could be cal-
culated without label information, our method VA-
PAL is also advantageous over BADGE in that it
does not require a warm-start. Unlike ALPS, our
method does not rely on a special self-supervised
loss. In other words, VAPAL could be applied to
any differentiable model.
We use four datasets (AGNEWS, IMDB,
PUBMED, and SST-2) to evaluate VAPAL through
arXiv:2210.14576v2 [cs.CL] 27 Oct 2022
two tasks, namely sentiment analysis and topic
classification. Baselines cover uncertainty, diver-
sity, and two SOTA hybrid active learning methods
(BADGE, ALPS).
Our main contributions are as follows:
We take Virtual Adversarial Perturbation to
measure model uncertainty in sentence under-
standing tasks for the first time. The local
smoothness is treated as model uncertainty,
which relies less on the poorly calibrated
model confidence scores.
We present VAPAL (Virtual Adversarial Per-
turbation for Active Learning) to combine un-
certainty and diversity in a combination frame-
work.
We show that VAPAL method is equally good
or better than the baselines in four tasks. Our
methods can successfully replace the gradient-
based representation of BADGE. Furthermore,
it does not rely on specific self-supervised loss,
unlike Masked Language Model Loss used in
ALPS.
2 Related Work
To reduce the cost of labeling, active learning seeks
to select the most informative data points from the
unlabeled data to require humans to obtain labels.
We then train the learner model on the new labeled
data and repeats. The prior active learning sam-
pling methods primarily focus on uncertainty or di-
versity. The uncertainty sampling methods are the
most popular and widely used strategies which se-
lect the difcult examples to label (Lewis and Gale,
1994;Joshi et al.,2009;Houlsby et al.,2011). Di-
versity sampling selects a subset of data points from
the pool to effectively represent the whole pool dis-
tribution (Geifman and El-Yaniv,2017;Sener and
Savarese,2017;Gissin and Shalev-Shwartz,2019).
A successful active learning method requires the
incorporation of both aspects, but the exact imple-
mentation is still open for discussion.
Recently, hybrid approaches that combine uncer-
tainty and diversity sampling have also been pro-
posed. Naive combination frameworks are shown
to be harmful to the test accuracy and rely on hy-
perparameters (Hsu and Lin,2015). Aiming for
sophisticated combination frameworks, Ash et al.
(2019) propose Batch Active Learning By Diverse
Gradient Embeddings (BADGE), and Yuan et al.
(2020) propose Active Learning by Processing Sur-
prisal (ALPS). They follow the same framework
that first builds uncertainty representations for un-
labeled data and clustering for diversity. BADGE
measures data uncertainty as the gradient magni-
tude with respect to parameters in the final (out-
put) layer and forms a gradient embedding based
data representation. However, according to Yuan
et al. (2020), BADGE has two main issues: reliance
on warm-starting and computational inefficiency.
ALPS builds data embeddings from self-supervised
loss (Masked Language Model loss) (Yuan et al.,
2020). Nevertheless, the MLM loss is an indirect
proxy for model uncertainty in the downstream
classification tasks, and ALPS might work only
with a pre-trained language model using MLM.
What else can be used as a model uncertainty
representation and can be efficiently combined to
achieve diversity sampling? Virtual adversarial per-
turbation from virtual adversarial training (Miyato
et al.,2019) is a promising option. Deep learning
methods often face possible over-fitting in model
generalization, especially when the training set is
relatively small. In adversarial training, adversar-
ial attacks are utilized to approximate the small-
est perturbation for a given latent state to cross
the decision boundary (Goodfellow et al.,2014;
Kurakin et al.,2016). It has proven to be an im-
portant proxy for assessing the robustness of the
model. Moreover, the labeled data is scarce. Vir-
tual adversarial training (VAT) does not require true
label information, thus fully using the unlabeled
data. VAT can be seen as a regularization method
based on Local Distributional Smoothness (LDS)
loss. LDS is defined as a negative measure of the
smoothness of the model distribution facing local
perturbation around input data points in the sense
of KL-divergence (Miyato et al.,2019). The vir-
tual adversarial perturbation can be crafted without
the label information, which can help alleviate the
warm-starting issue BADGE faces. Yu and Pao
(2020) roughly rank grouped examples of model
prediction by LDS score. Our method is inspired
by the same research line (Miyato et al.,2019) but
different from it in many folds. Our method aims
to project data into a model smoothness representa-
tion space rather than a rough scalar score, so it is
more effective. We introduce the virtual adversarial
perturbation as sentence representations by which
model uncertainty is inherently expressed. Further-
more, we consider both uncertainty and diversity
摘要:

UncertaintySentenceSamplingbyVirtualAdversarialPerturbationHanshanZhangandZhenZhangandHongfeiJiangandYangSongZuoyebangEducationTechnology(Beijing)Co.,Ltd{zhanghanshan,zhangzhen,jianghongfei,songyang}@zuoyebang.comAbstractActivelearningforsentenceunderstandingat-temptstoreducetheannotationcostbyiden-...

展开>> 收起<<
Uncertainty Sentence Sampling by Virtual Adversarial Perturbation Hanshan Zhang and Zhen Zhang and Hongfei Jiang and Yang Song Zuoyebang Education Technology Beijing Co. Ltd.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:2.2MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注