Discriminative Language Model as Semantic Consistency Scorer for Prompt-based Few-Shot Text Classification

2025-05-05 0 0 7.7MB 20 页 10玖币
侵权投诉
Discriminative Language Model as Semantic
Consistency Scorer for Prompt-based Few-Shot
Text Classification
Zhipeng Xie and Yahe Li
School of Computer Science
Fudan University, Shanghai, China
xiezp@fudan.edu.cn
October 25, 2022
Abstract
This paper proposes a novel prompt-based finetuning method (called DLM-
SCS) for few-shot text classification by utilizing the discriminative language
model ELECTRA that is pretrained to distinguish whether a token is original
or generated. The underlying idea is that the prompt instantiated with the
true label should have higher semantic consistency score than other prompts
with false labels. Since a prompt usually consists of several components
(or parts), its semantic consistency can be decomposed accordingly. The
semantic consistency of each component is then computed by making use
of the pretrained ELECTRA model, without introducing extra parameters.
Extensive experiments have shown that our model outperforms several state-
of-the-art prompt-based few-shot methods.
1 Introduction
Nowadays, with the upsurge of interest in a wide range of pretrained language
models, the pretraining-finetuning paradigm [
9
,
4
] has become a de facto standard
for various downstream NLU and NLG tasks. Different language models usually
have different scopes of application. Auto-regressive language models (ARLM)
such as GPT-3 [
1
] and Ernie-3 [
15
] predict the next token based on all the previous
ones, usually in the left-to-right order. Since it is trained to encode a uni-directional
context, it is not effective at downstream NLU tasks that often require bidirectional
context information. In addition, these models are large and costly to finetune, or
1
arXiv:2210.12763v1 [cs.CL] 23 Oct 2022
even not available publicly, which makes them impossible to use in the pretraining-
finetuning paradigm. Masked language models (MLM) such as BERT [
3
] and
RoBERTa [
7
] mask some tokens in inputs and are trained to reconstruct the original
tokens based on their bidirectional surrounding context, which is often preferable in
NLU tasks such as text classification but not applicable in NLG.
Conventional finetuning method for downstream text classification task usually
builds up a classification head together with additional parameters on top of the
special
[CLS]
token from scratch and fine-tunes the whole model. Such models
work well with abundant training examples in rich data regimes, but will be cornered
in the few-shot scenario, not to mention the zero-shot, because of the gap between
the pretraining and the downstream tasks.
Initiated by the in-context learning of the GPT series [
9
,
10
,
1
], prompt-based
method was first developed for zero-shot learning, and then studied by PET and
iPET [
13
] for finetuning. After that, prompt-based learning methods have become
increasingly popular, and have been proven to work effectively under few-shot or
even zero-shot setting. To bridge the gap between the downstream task and the
pretrained task, these methods transform downstream tasks into the same (or similar)
form as the pretraining tasks solved during the original LM training with the help of
textual prompts. Most existing prompt-based methods are using generative prompts
that contain answer slots for various pretrained language models to fill in [
6
]. As
to the downstream text classification tasks, most works have been directed against
pretrained masked language models (MLMs) by formulating downstream tasks as
a masked language modeling task [
13
,
14
,
5
]. A template converts the original
input example
xin
into a textual string (called prompt)
˜
x
that contains an unfilled
[MASK]
slot. A verbalizer is used to represent each class with a label word from
the vocabulary. The model makes the prediction according to the probabilities of
filling the
[MASK]
token with the label words. Such a prompt is called a generative
prompt, which usually contains an an unfilled
[MASK]
as the answer slot, and the
pretrained masked language model is finetuned to generate a correct label name to
fill this answer slot. A simple prompt-based framework that treats MLM as masked
token predictor for text classification is illustrated in Figure 1(b).
Until the very recent, two prompt-based finetuning methods [
17
,
16
] have
been proposed to exploit the pretrained ELECTRA [
2
] which is a discriminative
language model (DLM). In contrast to the generative prompts, they use the prompts
that contains no answer slot, which we call “discriminative prompts” and can be
seen as the unmasked prompts which use label word(s) to fill in the
[MASK]
of
generative prompts. The pretrained ELECTRA model is then applied on these
discriminative prompts and tells us which label word is the original token (i.e.,
not a replaced token). However, these methods confine themselves only on the
label word(s) in the discriminative prompts and expect the discriminative model
2
to identify the semantic inconsistency incurred by the incorrect label words. This
limited evidence is far from what can be obtained from the discriminative language
model, and some available evidence is missing (Please refer to Section 3.2 for a
simple motivating example).
The work done in this paper follows the thread of prompting the discriminative
language model for few-shot text classification. The basic idea is that the DLM
head can detect the discrepancy between inputs and label words. On the one hand,
given an input example (a sentence or a sentence pair) and its true label, the DLM
head is expected to assign low scores (or logits) to the salient tokens in the input
example and the true label word. If a false label is given, it is desirable that the
DLM head will assign high scores to both the false label word and the salient tokens
in the input example.
To squeeze the most out of a pretrained language model such that it works
best on a downstream few-shot learning task, three prerequisites are considered in
designing a prompt-based method of finetuning a pretrained discriminative language
model:
Prerequisite 1 (Task Compatibility):
As stated by most prompt tuning
methods, the downstream task should be transformed into the same (or highly
similar) form as the pretraining task, such that no (or few) additional parame-
ters need introduced.
Prerequisite 2 (Input Compatibility):
The prompt template should be in
the same form as the training data of the pretrained language model, such that
the discriminative prompts are as much natural as possible. As a consequence,
the pretrained language model can process them well and easily, without
having to be tuned too far away.
Prerequisite 3 (Evidence/Information Abundance):
Last but not the least,
the method should try its best to obtain and aggregate as much evidence and/or
information as possible for decision making. Due to the nature of few-shot
learning, it is unstable and has a big variance, and thus the aggregation of
more evidence would be helpful in reducing the variance.
The contribution of this paper is threefold: (1) We propose a novel framework
DLM-SCS
1
for few-shot text classification, which uses the pretrained discriminative
language model ELECTRA as the semantic consistency scorer. (2) We design a
method to measure the semantic consistency of a subsequence in the input prompt
on the basis of the discriminative head of ELECTRA which can only measure
1
We shall publicize all the source code and related resources once the paper gets accepted or
published.
3
Figure 1: A schematic illustration of (a) our proposed DLM-SCS (Discriminative
Language Model as Semantic Consistency Scorer), comparing to that of (b) tra-
ditional prompt-based model that uses masked language model as masked token
predictor.
the semantic inconsistency of each single token, and then use it to instantiate the
framework into a concrete prompt-based finetuning model (also called DLM-SCS).
(3) The proposed method has achieved the state-of-the-art performance on a variety
of downstream sentence classification and sentence-pair classification tasks.
2 Related Work
2.1 Prompting MLM for Text Classification
Existing prompt-based learning methods for text classification usually reformulate
the downstream text classification task into a cloze question task, and then finetune
a pretrained masked language model to generate the most likely label word in
the unfilled
[MASK]
position of the generative prompt [
13
]. A lot of research
effort has been devoted to the automatic construction of prompt templates and
label words. Schick et al.
[12]
and Schick and Sch
¨
utze
[13]
studied the automatic
identification of label words. Gao et al.
[5]
made use of the pretrained seq2seq
model T5 [
11
] to generate template tokens in the template search process. Motivated
by idea of in-context learning from GPT series [
1
], Gao et al.
[5]
used a single
unmasked example prompt (called a demonstration) as additional context, which
can boost the performance of prompt-based few-shot text classification task. Park
et al.
[8]
made two extensions by multiple demonstrations and soft demonstration
memory. In addition, Zhang et al.
[18]
proposed the DART method that optimizes
the differentiable prompt template and label words by error backpropagation.
4
摘要:

DiscriminativeLanguageModelasSemanticConsistencyScorerforPrompt-basedFew-ShotTextClassicationZhipengXieandYaheLiSchoolofComputerScienceFudanUniversity,Shanghai,Chinaxiezp@fudan.edu.cnOctober25,2022AbstractThispaperproposesanovelprompt-basednetuningmethod(calledDLM-SCS)forfew-shottextclassicationb...

展开>> 收起<<
Discriminative Language Model as Semantic Consistency Scorer for Prompt-based Few-Shot Text Classification.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:7.7MB 格式:PDF 时间:2025-05-05

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注