Discriminative Language Model as Semantic Consistency Scorer for Prompt-based Few-Shot Text Classiﬁcation

2025-05-05 0 0 7.7MB 20 页 10玖币

侵权投诉

Discriminative Language Model as Semantic

Consistency Scorer for Prompt-based Few-Shot

Text Classiﬁcation

Zhipeng Xie and Yahe Li

School of Computer Science

Fudan University, Shanghai, China

xiezp@fudan.edu.cn

October 25, 2022

Abstract

This paper proposes a novel prompt-based ﬁnetuning method (called DLM-

SCS) for few-shot text classiﬁcation by utilizing the discriminative language

model ELECTRA that is pretrained to distinguish whether a token is original

or generated. The underlying idea is that the prompt instantiated with the

true label should have higher semantic consistency score than other prompts

with false labels. Since a prompt usually consists of several components

(or parts), its semantic consistency can be decomposed accordingly. The

semantic consistency of each component is then computed by making use

of the pretrained ELECTRA model, without introducing extra parameters.

Extensive experiments have shown that our model outperforms several state-

of-the-art prompt-based few-shot methods.

1 Introduction

Nowadays, with the upsurge of interest in a wide range of pretrained language

models, the pretraining-ﬁnetuning paradigm [

] has become a de facto standard

for various downstream NLU and NLG tasks. Different language models usually

have different scopes of application. Auto-regressive language models (ARLM)

such as GPT-3 [

] and Ernie-3 [

] predict the next token based on all the previous

ones, usually in the left-to-right order. Since it is trained to encode a uni-directional

context, it is not effective at downstream NLU tasks that often require bidirectional

context information. In addition, these models are large and costly to ﬁnetune, or

arXiv:2210.12763v1 [cs.CL] 23 Oct 2022

even not available publicly, which makes them impossible to use in the pretraining-

ﬁnetuning paradigm. Masked language models (MLM) such as BERT [

] and

RoBERTa [

] mask some tokens in inputs and are trained to reconstruct the original

tokens based on their bidirectional surrounding context, which is often preferable in

NLU tasks such as text classiﬁcation but not applicable in NLG.

Conventional ﬁnetuning method for downstream text classiﬁcation task usually

builds up a classiﬁcation head together with additional parameters on top of the

special

[CLS]

token from scratch and ﬁne-tunes the whole model. Such models

work well with abundant training examples in rich data regimes, but will be cornered

in the few-shot scenario, not to mention the zero-shot, because of the gap between

the pretraining and the downstream tasks.

Initiated by the in-context learning of the GPT series [

], prompt-based

method was ﬁrst developed for zero-shot learning, and then studied by PET and

iPET [

] for ﬁnetuning. After that, prompt-based learning methods have become

increasingly popular, and have been proven to work effectively under few-shot or

even zero-shot setting. To bridge the gap between the downstream task and the

pretrained task, these methods transform downstream tasks into the same (or similar)

form as the pretraining tasks solved during the original LM training with the help of

textual prompts. Most existing prompt-based methods are using generative prompts

that contain answer slots for various pretrained language models to ﬁll in [

]. As

to the downstream text classiﬁcation tasks, most works have been directed against

pretrained masked language models (MLMs) by formulating downstream tasks as

a masked language modeling task [

]. A template converts the original

input example

xin

into a textual string (called prompt)

that contains an unﬁlled

[MASK]

slot. A verbalizer is used to represent each class with a label word from

the vocabulary. The model makes the prediction according to the probabilities of

ﬁlling the

[MASK]

token with the label words. Such a prompt is called a generative

prompt, which usually contains an an unﬁlled

[MASK]

as the answer slot, and the

pretrained masked language model is ﬁnetuned to generate a correct label name to

ﬁll this answer slot. A simple prompt-based framework that treats MLM as masked

token predictor for text classiﬁcation is illustrated in Figure 1(b).

Until the very recent, two prompt-based ﬁnetuning methods [

] have

been proposed to exploit the pretrained ELECTRA [

] which is a discriminative

language model (DLM). In contrast to the generative prompts, they use the prompts

that contains no answer slot, which we call “discriminative prompts” and can be

seen as the unmasked prompts which use label word(s) to ﬁll in the

[MASK]

generative prompts. The pretrained ELECTRA model is then applied on these

discriminative prompts and tells us which label word is the original token (i.e.,

not a replaced token). However, these methods conﬁne themselves only on the

label word(s) in the discriminative prompts and expect the discriminative model

to identify the semantic inconsistency incurred by the incorrect label words. This

limited evidence is far from what can be obtained from the discriminative language

model, and some available evidence is missing (Please refer to Section 3.2 for a

simple motivating example).

The work done in this paper follows the thread of prompting the discriminative

language model for few-shot text classiﬁcation. The basic idea is that the DLM

head can detect the discrepancy between inputs and label words. On the one hand,

given an input example (a sentence or a sentence pair) and its true label, the DLM

head is expected to assign low scores (or logits) to the salient tokens in the input

example and the true label word. If a false label is given, it is desirable that the

DLM head will assign high scores to both the false label word and the salient tokens

in the input example.

To squeeze the most out of a pretrained language model such that it works

best on a downstream few-shot learning task, three prerequisites are considered in

designing a prompt-based method of ﬁnetuning a pretrained discriminative language

model:

•Prerequisite 1 (Task Compatibility):

As stated by most prompt tuning

methods, the downstream task should be transformed into the same (or highly

similar) form as the pretraining task, such that no (or few) additional parame-

ters need introduced.

•Prerequisite 2 (Input Compatibility):

The prompt template should be in

the same form as the training data of the pretrained language model, such that

the discriminative prompts are as much natural as possible. As a consequence,

the pretrained language model can process them well and easily, without

having to be tuned too far away.

•Prerequisite 3 (Evidence/Information Abundance):

Last but not the least,

the method should try its best to obtain and aggregate as much evidence and/or

information as possible for decision making. Due to the nature of few-shot

learning, it is unstable and has a big variance, and thus the aggregation of

more evidence would be helpful in reducing the variance.

The contribution of this paper is threefold: (1) We propose a novel framework

DLM-SCS

for few-shot text classiﬁcation, which uses the pretrained discriminative

language model ELECTRA as the semantic consistency scorer. (2) We design a

method to measure the semantic consistency of a subsequence in the input prompt

on the basis of the discriminative head of ELECTRA which can only measure

We shall publicize all the source code and related resources once the paper gets accepted or

published.

Figure 1: A schematic illustration of (a) our proposed DLM-SCS (Discriminative

Language Model as Semantic Consistency Scorer), comparing to that of (b) tra-

ditional prompt-based model that uses masked language model as masked token

predictor.

the semantic inconsistency of each single token, and then use it to instantiate the

framework into a concrete prompt-based ﬁnetuning model (also called DLM-SCS).

(3) The proposed method has achieved the state-of-the-art performance on a variety

of downstream sentence classiﬁcation and sentence-pair classiﬁcation tasks.

2 Related Work

2.1 Prompting MLM for Text Classiﬁcation

Existing prompt-based learning methods for text classiﬁcation usually reformulate

the downstream text classiﬁcation task into a cloze question task, and then ﬁnetune

a pretrained masked language model to generate the most likely label word in

the unﬁlled

[MASK]

position of the generative prompt [

]. A lot of research

effort has been devoted to the automatic construction of prompt templates and

label words. Schick et al.

[12]

and Schick and Sch

utze

[13]

studied the automatic

identiﬁcation of label words. Gao et al.

[5]

made use of the pretrained seq2seq

model T5 [

] to generate template tokens in the template search process. Motivated

by idea of in-context learning from GPT series [

], Gao et al.

[5]

used a single

unmasked example prompt (called a demonstration) as additional context, which

can boost the performance of prompt-based few-shot text classiﬁcation task. Park

et al.

[8]

made two extensions by multiple demonstrations and soft demonstration

memory. In addition, Zhang et al.

[18]

proposed the DART method that optimizes

the differentiable prompt template and label words by error backpropagation.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DiscriminativeLanguageModelasSemanticConsistencyScorerforPrompt-basedFew-ShotTextClassicationZhipengXieandYaheLiSchoolofComputerScienceFudanUniversity,Shanghai,Chinaxiezp@fudan.edu.cnOctober25,2022AbstractThispaperproposesanovelprompt-basednetuningmethod(calledDLM-SCS)forfew-shottextclassicationb...

展开>> 收起<<

Discriminative Language Model as Semantic Consistency Scorer for Prompt-based Few-Shot Text Classiﬁcation.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Discriminative Language Model as Semantic Consistency Scorer for Prompt-based Few-Shot Text Classiﬁcation

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: