Uncertainty Sentence Sampling by Virtual Adversarial Perturbation Hanshan Zhang and Zhen Zhang and Hongfei Jiang and Yang Song Zuoyebang Education Technology Beijing Co. Ltd

2025-05-06 0 0 2.2MB 10 页 10玖币

侵权投诉

Uncertainty Sentence Sampling by Virtual Adversarial Perturbation

Hanshan Zhang and Zhen Zhang and Hongfei Jiang and Yang Song

Zuoyebang Education Technology (Beijing) Co., Ltd

{zhanghanshan,zhangzhen,jianghongfei,songyang}@zuoyebang.com

Abstract

Active learning for sentence understanding at-

tempts to reduce the annotation cost by iden-

tifying the most informative examples. Com-

mon methods for active learning use either

uncertainty or diversity sampling in the pool-

based scenario. In this work, to incorpo-

rate both predictive uncertainty and sample di-

versity, we propose Virtual Adversarial Per-

turbation for Active Learning (VAPAL) , an

uncertainty-diversity combination framework,

using virtual adversarial perturbation (Miyato

et al.,2019) as model uncertainty representa-

tion. VAPAL consistently performs equally

well or even better than the strong baselines

on four sentence understanding datasets: AG-

NEWS, IMDB, PUBMED, and SST-2, offer-

ing a potential option for active learning on

sentence understanding tasks.

1 Introduction

In recent years, deep neural networks have made

a signiﬁcant achievement in natural language pro-

cessing (Yang et al.,2019;Devlin et al.,2019;Raf-

fel et al.,2020;He et al.,2020). These neural

models usually need a large amount of data which

have to be labeled and trained. Active learning

is an effective way to reduce both computational

costs and human labor by selecting the most critical

examples to label.

Two samplings, uncertainty sampling and diver-

sity sampling are often used in common active

learning methods. Uncertainty sampling (Lewis

and Gale,1994) selects difﬁcult examples based on

the model conﬁdence score. In a batch setting, the

sampled data points are near-identical (Ash et al.,

2019). This phenomenon suggests that we might

need to take diversity into account besides uncer-

tainty. A naive combination of uncertainty and

diversity, however, negatively affects the test accu-

racy (Hsu and Lin,2015). Ash et al. (2019) presents

a practical framework, BADGE, which combines

uncertainty and diversity. BADGE measures uncer-

tainty by gradient embedding and achieves diver-

sity by clustering.

However, BADGE relies on model conﬁdence

scores, which require a model warm-start to cali-

brate. Speciﬁcally, the correctness likelihoods do

not increase consistently with higher model conﬁ-

dence scores. To avoid the warm-start requirement,

Yuan et al. (2020) presents ALPS, a cold-start

approach that uses self-supervised loss (Masked

Language Model Loss) as sentence representation.

Nevertheless, MLM loss can be seen as a language

model perplexity, not a direct downstream task-

related measurement.

From another point of view, deep learning mod-

els are vulnerable to adversarial examples (Good-

fellow et al.,2014;Kurakin et al.,2016), indicating

that measuring uncertainty based on model conﬁ-

dence scores is overconﬁdent. Virtual adversarial

training (Miyato et al.,2015,2019,2016) modiﬁes

inputs with special perturbations, virtual adversar-

ial perturbation, which can change the output dis-

tribution of the model in the most signiﬁcant way

in the sense of KL-divergence. It is tested valid

in industry-scale semi-supervised learning setting

(Chen et al.,2021).

We propose VAPAL (Virtual Adversarial Pertur-

bation for Active Learning) in this work. VAPAL

computes perturbation for each data point in an un-

labeled pool to measure model uncertainty. VAPAL

clusters the data points to achieve diversity like

BADGE and ALPS with perturbations acquired.

Since virtual adversarial perturbation could be cal-

culated without label information, our method VA-

PAL is also advantageous over BADGE in that it

does not require a warm-start. Unlike ALPS, our

method does not rely on a special self-supervised

loss. In other words, VAPAL could be applied to

any differentiable model.

We use four datasets (AGNEWS, IMDB,

PUBMED, and SST-2) to evaluate VAPAL through

arXiv:2210.14576v2 [cs.CL] 27 Oct 2022

two tasks, namely sentiment analysis and topic

classiﬁcation. Baselines cover uncertainty, diver-

sity, and two SOTA hybrid active learning methods

(BADGE, ALPS).

Our main contributions are as follows:

•

We take Virtual Adversarial Perturbation to

measure model uncertainty in sentence under-

standing tasks for the ﬁrst time. The local

smoothness is treated as model uncertainty,

which relies less on the poorly calibrated

model conﬁdence scores.

•

We present VAPAL (Virtual Adversarial Per-

turbation for Active Learning) to combine un-

certainty and diversity in a combination frame-

work.

•

We show that VAPAL method is equally good

or better than the baselines in four tasks. Our

methods can successfully replace the gradient-

based representation of BADGE. Furthermore,

it does not rely on speciﬁc self-supervised loss,

unlike Masked Language Model Loss used in

ALPS.

2 Related Work

To reduce the cost of labeling, active learning seeks

to select the most informative data points from the

unlabeled data to require humans to obtain labels.

We then train the learner model on the new labeled

data and repeats. The prior active learning sam-

pling methods primarily focus on uncertainty or di-

versity. The uncertainty sampling methods are the

most popular and widely used strategies which se-

lect the difﬁcult examples to label (Lewis and Gale,

1994;Joshi et al.,2009;Houlsby et al.,2011). Di-

versity sampling selects a subset of data points from

the pool to effectively represent the whole pool dis-

tribution (Geifman and El-Yaniv,2017;Sener and

Savarese,2017;Gissin and Shalev-Shwartz,2019).

A successful active learning method requires the

incorporation of both aspects, but the exact imple-

mentation is still open for discussion.

Recently, hybrid approaches that combine uncer-

tainty and diversity sampling have also been pro-

posed. Naive combination frameworks are shown

to be harmful to the test accuracy and rely on hy-

perparameters (Hsu and Lin,2015). Aiming for

sophisticated combination frameworks, Ash et al.

(2019) propose Batch Active Learning By Diverse

Gradient Embeddings (BADGE), and Yuan et al.

(2020) propose Active Learning by Processing Sur-

prisal (ALPS). They follow the same framework

that ﬁrst builds uncertainty representations for un-

labeled data and clustering for diversity. BADGE

measures data uncertainty as the gradient magni-

tude with respect to parameters in the ﬁnal (out-

put) layer and forms a gradient embedding based

data representation. However, according to Yuan

et al. (2020), BADGE has two main issues: reliance

on warm-starting and computational inefﬁciency.

ALPS builds data embeddings from self-supervised

loss (Masked Language Model loss) (Yuan et al.,

2020). Nevertheless, the MLM loss is an indirect

proxy for model uncertainty in the downstream

classiﬁcation tasks, and ALPS might work only

with a pre-trained language model using MLM.

What else can be used as a model uncertainty

representation and can be efﬁciently combined to

achieve diversity sampling? Virtual adversarial per-

turbation from virtual adversarial training (Miyato

et al.,2019) is a promising option. Deep learning

methods often face possible over-ﬁtting in model

generalization, especially when the training set is

relatively small. In adversarial training, adversar-

ial attacks are utilized to approximate the small-

est perturbation for a given latent state to cross

the decision boundary (Goodfellow et al.,2014;

Kurakin et al.,2016). It has proven to be an im-

portant proxy for assessing the robustness of the

model. Moreover, the labeled data is scarce. Vir-

tual adversarial training (VAT) does not require true

label information, thus fully using the unlabeled

data. VAT can be seen as a regularization method

based on Local Distributional Smoothness (LDS)

loss. LDS is deﬁned as a negative measure of the

smoothness of the model distribution facing local

perturbation around input data points in the sense

of KL-divergence (Miyato et al.,2019). The vir-

tual adversarial perturbation can be crafted without

the label information, which can help alleviate the

warm-starting issue BADGE faces. Yu and Pao

(2020) roughly rank grouped examples of model

prediction by LDS score. Our method is inspired

by the same research line (Miyato et al.,2019) but

different from it in many folds. Our method aims

to project data into a model smoothness representa-

tion space rather than a rough scalar score, so it is

more effective. We introduce the virtual adversarial

perturbation as sentence representations by which

model uncertainty is inherently expressed. Further-

more, we consider both uncertainty and diversity

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UncertaintySentenceSamplingbyVirtualAdversarialPerturbationHanshanZhangandZhenZhangandHongfeiJiangandYangSongZuoyebangEducationTechnology(Beijing)Co.,Ltd{zhanghanshan,zhangzhen,jianghongfei,songyang}@zuoyebang.comAbstractActivelearningforsentenceunderstandingat-temptstoreducetheannotationcostbyiden-...

展开>> 收起<<

Uncertainty Sentence Sampling by Virtual Adversarial Perturbation Hanshan Zhang and Zhen Zhang and Hongfei Jiang and Yang Song Zuoyebang Education Technology Beijing Co. Ltd.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Uncertainty Sentence Sampling by Virtual Adversarial Perturbation Hanshan Zhang and Zhen Zhang and Hongfei Jiang and Yang Song Zuoyebang Education Technology Beijing Co. Ltd

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: