Multilingual Relation Classification via Efficient and Effective Prompting Yuxuan Chen David Harbecke Leonhard Hennig German Research Center for Artificial Intelligence DFKI

2025-05-02 0 0 2.71MB 17 页 10玖币
侵权投诉
Multilingual Relation Classification via Efficient and Effective Prompting
Yuxuan Chen David Harbecke Leonhard Hennig
German Research Center for Artificial Intelligence (DFKI)
Speech and Language Technology Lab
{yuxuan.chen, david.harbecke, leonhard.hennig}@dfki.de
Abstract
Prompting pre-trained language models has
achieved impressive performance on various
NLP tasks, especially in low data regimes. De-
spite the success of prompting in monolingual
settings, applying prompt-based methods in
multilingual scenarios has been limited to a
narrow set of tasks, due to the high cost of
handcrafting multilingual prompts. In this pa-
per, we present the first work on prompt-based
multilingual relation classification (RC), by in-
troducing an efficient and effective method
that constructs prompts from relation triples
and involves only minimal translation for the
class labels. We evaluate its performance in
fully supervised, few-shot and zero-shot sce-
narios, and analyze its effectiveness across
14 languages, prompt variants, and English-
task training in cross-lingual settings. We
find that in both fully supervised and few-shot
scenarios, our prompt method beats compet-
itive baselines: fine-tuning XLM-REM and
null prompts. It also outperforms the ran-
dom baseline by a large margin in zero-shot
experiments. Our method requires little in-
language knowledge and can be used as a
strong baseline for similar multilingual classi-
fication tasks.
1 Introduction
Relation classification (RC) is a crucial task in in-
formation extraction (IE), aiming to identify the
relation between entities in a text (Alt et al.,2019).
Extending RC to multilingual settings has recently
received increased interest (Zou et al.,2018;Kol-
luru et al.,2022), but the majority of prior work still
focuses on English (Baldini Soares et al.,2019;Lyu
and Chen,2021). A main bottleneck for multilin-
gual RC is the lack of supervised resources, compa-
rable in size to large English datasets (Riedel et al.,
2010;Zhang et al.,2017). The SMiLER dataset
(Seganti et al.,2021) provides a starting point to
test fully supervised and more efficient approaches
due to different resource availability for different
languages.
Previous studies have shown the promising per-
formance of prompting PLMs compared to the data-
hungry fine-tuning, especially in low-resource sce-
narios (Gao et al.,2021;Le Scao and Rush,2021;
Lu et al.,2022). Multilingual pre-trained language
models (Conneau et al.,2020;Xue et al.,2021)
further enable multiple languages to be represented
in a shared semantic space, thus making prompting
in multilingual scenarios feasible. However, the
study of prompting for multilingual tasks so far
remains limited to a small range of tasks such as
text classification (Winata et al.,2021) and natural
language inference (Lin et al.,2022). To our knowl-
edge, the effectiveness of prompt-based methods
for multilingual RC is still unexplored.
To analyse this gap, we pose two research ques-
tions for multilingual RC with prompts:
RQ1.
What is the most effective way to prompt?
We investigate whether prompting should be done
in English or the target language and whether to
use soft prompt tokens.
RQ2.
How well do prompts perform in different
data regimes and languages? We investigate the
effectiveness of our prompting approach in three
scenarios: fully supervised, few-shot and zero-shot.
We explore to what extent the results are related to
the available language resources.
We present an efficient and effective prompt
method for multilingual RC (see Figure 1) that
derives prompts from relation triplets (see Sec-
tion 3.1). The derived prompts include the original
sentence and entities and are supposed to be filled
with the relation label. We evaluate the prompts
with three variants, two of which require no transla-
tion, and one of which requires minimal translation,
i.e., of the relation labels only. We find that our
method outperforms fine-tuning and a strong task-
agnostic prompt baseline in fully supervised and
few-shot scenarios, especially for relatively low-
1
arXiv:2210.13838v2 [cs.CL] 26 Oct 2022
Titanic is directed by
Cameron .
Titanic is directed by Cameron .
Titanic _______ Cameron .
Goethe schrieb die
Tragödie Faust .
Goethe schrieb die Tragödie
Faust . Faust _______ Goethe .
的联合
创始人之一.
. _______ .
Encoder-
Decoder
film has director
has author
hat Autor
founded
创立了
Prompt input Plain text Pre-trained
Language Model Answer
de
zh
en
Relation verbalizations
Template: . ____
: head entity, tail entity
:code-switch (CS) prompting
: in-language (IL) prompting
Translate
- film has director
- has author
- founded
- …
en
- Film hat Regisseur
- hat Autor
- gründete
- …
de
-导演是
-作者是
-创立了
- …
zh
Figure 1: Overview of our approach. Given a plain text xcontaining head entity ehand tail entity etfrom language
L, we first apply the template T(x) = “x.eh____ et” and yield the prompt input with a blank. Then the PLM
aims to fill in the relation at the blank. In code-switch prompting, the target sequence is the English relation
verbalization. In in-language prompting, the target is the relation name translated into L.
resource languages. Our method also improves
over the random baseline in zero-shot settings, and
achieves promising cross-lingual performance. The
main contributions of this work hence are:
We propose a simple but efficient prompt
method for multilingual RC, which is, to the
best of our knowledge, the first work to ap-
ply prompt-based methods to multilingual RC
(Section 3).
We evaluate our method on the largest multi-
lingual RC dataset, SMiLER (Seganti et al.,
2021), and compare our method with strong
baselines in all three scenarios. We also inves-
tigate the effects of different prompt variants,
including insertion of soft tokens, prompt lan-
guage, and the word order of prompting (Sec-
tions 4&5).
2 Preliminaries
We first give a formal definition of the relation
classification task, and then introduce fine-tuning
and prompting paradigms to perform RC.
2.1 Relation Classification Task Definition
Relation classification is the task of classifying the
relationship such as date_of_birth,founded_by or
parents between pairs of entities in a given context.
Formally, given a relation set
R
and a text
x= [x1, x2, . . . , xn]
(where
x1,· · · , xn
are to-
kens) with two disjoint spans
eh
and
et
denot-
ing the head and tail entity, RC aims to predict
the relation
r∈ R
between
eh
and
et
, or give a
no_relation prediction if no relation in
R
holds.
RC is a multilingual task if the token sequences
come from different languages.
2.2 Fine-tuning for Relation Classification
In fine-tuning, a task-specific linear classifier is
added on top of the PLM. Fine-tuning hence intro-
duces a different scenario from pre-training, since
language model (LM) pre-training is usually for-
malized as a cloze-style task to predict target tokens
at
[MASK]
(Devlin et al.,2019;Liu et al.,2019) or
a corrupted span (Raffel et al.,2020;Lewis et al.,
2020). For the RC task, the classifier aims to pre-
dict the target class
r
at
[CLS]
or at the entity spans
denoted by MARKER (Baldini Soares et al.,2019).
2.3 Prompting for Relation Classification
Prompting is proposed to bridge the gap between
pre-training and fine-tuning (Liu et al.,2022;Gu
et al.,2022). The essence of prompting is, by ap-
pending extra text to the original text according
to a task-specific template
T(·)
, to reformulate the
downstream task to an LM pre-training task such
as masked language modeling (MLM), and apply
the same training objective during the task-specific
training. For the RC task, to identify the relation
between “Angela Merkel” and “Joachim Sauer” in
the text “Angela Merkel’s current husband is quan-
tum chemist Joachim Sauer,” an intuitive template
for prompting can be “The relation between An-
gela Merkel and Joachim Sauer is
[MASK]
,” and the
LM is supposed to assign a higher likelihood to the
term couple than to e.g. friends or colleagues at
2
Prompt input Target Example
Input Target
null prompts x.____ φEN (r)Goethe schrieb Faust. ____ has author
CS x.eh____ etφEN (r)Goethe schrieb Faust.Faust ____ Goethe has author
SP x.[v1]eh[v2]____ [v3]etφEN (r)Goethe schrieb Faust.[v1]Faust [v2]____ [v3]Goethe has author
IL x.eh____ etφL(r)Goethe schrieb Faust.Faust ____ Goethe hat Autor
Table 1: Overview of the prompts, including null prompts (baseline), and ours with its variants. For each prompt
or its variant, we list (1) the prompt input and the target; (2) an example based on the plain text in German “Goethe
schrieb Faust.[vi]: learnable soft tokens. φE N (r): the original (English) relation verbalization. φL(r): the
translated relation verbalization into the target language L.
[MASK]
. This “fill-in the blank” paradigm is well
aligned with the pre-training scenario, and enables
prompting to better coax the PLMs for pre-trained
knowledge (Petroni et al.,2019).
3 Methods
We now present our method, as shown in Figure 1.
We introduce its template and verbalizer, and pro-
pose several variants of the prompt. Lastly, we
explain the training and inference process.
3.1 Template
For prompting (Liu et al.,2022), a prompt often
consists of a template
T(·)
and a verbalizer
V
.
Given a plain text
x
, the template
T
adds task-
related instruction to
x
to yield the prompt input
xprompt =T(x).(1)
Following Chen et al. (2022) and Han et al.
(2021), we treat relations as predicates and use
the cloze “
eh{relation}et
” for the LM to fill in.
Our template is formulated as
T(x) := x.eh____ et.(2)
In the template
T(x)
,
x
is the original text and the
two entities
eh
and
et
come from
x
. Therefore,
our template does not introduce extra tokens, thus
involves no translation at all.
3.2 Verbalizer
After being prompted by
xprompt
, the PLM
M
predicts the masked text
y
at the blank. To com-
plete an NLP classification task, a verbalizer
φ
is
required to bridge the set of labels
Y
and the set of
predicted texts (verbalizations
V)
. For the simplic-
ity of our prompt, we use the one-to-one verbalizer:
φ:Y → V, r 7→ φ(r),(3)
where
r
is a relation, and
φ(r)
is the simple ver-
balization of
r
.
φ(·)
normally only involves split-
ting
r
by “-” or “_” and replacing abbreviations
such as org with organization. E.g., the relation
org-has-member corresponds to the verbalization
“organization has member”. Then the prediction is
formalized as
p(r|x)p(y=φ(r)|xprompt;θM),(4)
where
θM
denotes the parameters of model
M
.
p(r|x)
is normalized by the likelihood sum over
all relations.
3.3 Variants
To find the optimal way to prompt, we investigate
three variants as follows.
Hard prompt vs soft prompt (SP)
Hard
prompts (a.k.a. discrete prompts) (Liu et al.,2022)
are entirely formulated in natural language. Soft
prompts (a.k.a. continuous prompts) consist of
learnable tokens (Lester et al.,2021) that are not
contained in the PLM vocabulary. Following Han
et al. (2021), we insert soft tokens before entities
and blanks as shown for SP in Table 1.
Code-switch (CS) vs in-language (IL)
Rela-
tion labels are in English across almost all RC
datasets. Given a text from a non-English input
L
with a blank, the recovered text is code-mixed
after being completed with an English verbaliza-
tion, corresponding to code-switch prompting. It
is probably more reasonable for the PLM to fill
in the blank in language
L
. Inspired by Lin et al.
(2022) and Zhao and Schütze (2021), we machine-
translate the English verbalizers into the other lan-
guages.
1
Table 1visualizes both code-switch (CS)
1
See Appendix Bfor more examples of translated verbal-
izations. To translate the verbalizer of the SMiLER dataset,
we use DeepL by default and Google Translate when the target
language is not supported by DeepL (in case of AR, FA, KO
and UK).
3
Task Dataset #Class Verbalizations # Token in Verb.
Mean Std.
LA CoLA (Warstadt et al.,2019) 2 correct, incorrect. (Gao et al.,2021) 1 0
NER CoNLL03 (Tjong Kim Sang and De Meulder,2003) 5 location, person, not an, ... (Cui et al.,2021) 1.2 0.4
NLI MNLI (Williams et al.,2018) 3 yes, no, maybe. (Fu et al.,2022) 1 0
NLI XNLI (Conneau et al.,2018) 3 yes, no, maybe; Evet, ... (Zhao and Schütze,2021) 1 0
PI PAWS-X (Yang et al.,2019) 2 yes, no. (Qi et al.,2022) 1 0
TC MARC (Keung et al.,2020) 2 good, {average, bad}. (Huang et al.,2022) 1 0
RC
TACRED (Zhang et al.,2017) 42 founded by, city of birth, country of death, ... 3.23 1.99
SemEval (Hendrickx et al.,2010) 10 cause effect, entity origin, product producer, ... 2.50 0.81
NYT (Riedel et al.,2010) 24 ethnicity, major shareholder of, religion, ... 2.10 1.01
SCIERC (Luan et al.,2018) 6 conjuction, feature of, part of, used for, ... 2.17 0.69
SMiLER (EN) (Seganti et al.,2021) 36 birth place, starring, won award, ... 2.58 0.68
SMiLER (ALL) (Seganti et al.,2021) 36 hat Genre, chef d’organisation, del país, ... 3.66 1.44
Table 2: Statistics of the lengths of the verbalizations over several classification tasks. The lengths for non-RC
tasks depend on the tokenizers from the respective PLMs in the cited work. The lengths for RC tasks are based
on the mT5BASE tokenizer. Mean and std. show that the label space of the RC task is more complex than most
few-class classification tasks. The verbalizations of RC datasets are listed in Appendix B. For SemEval, the two
possible directions of a relation are combined. For NYT, we use the version from Zeng et al. (2018). For SMiLER,
"EN" is the English split; "ALL" contains all data from 14 languages.
and in-language (IL) prompting. For English, CS-
and IL- prompting are equivalent, since
L
is En-
glish itself.
Word order of prompting
For the RC task,
head-relation-tail triples involve three elements.
Therefore, deriving natural language prompts from
them requires handling where to put the predicate
(relation). In the case of SOV languages, filling
in a relation that occurs between
eh
and
et
seems
less intuitive. Therefore, to investigate if the word
order of prompting affects prediction accuracy,
we swap the entities and the blank in the SVO-
template “
x.eh____ et
” and get “
x.ehet____
as the SOV-template.
3.4 Training and Inference
The training and inference setups depend on the em-
ployed model. Prompting autoencoding language
models requires the verbalizations to be of fixed
length, since the length of masks, which is identical
with verbalization length, is unknown during infer-
ence. Encoder-decoders can handle verbalizations
of varying length by nature (Han et al.,2022;Du
et al.,2022). Han et al. (2021) adjust all the ver-
balizations in TACRED to a length of 3, to enable
prompting with RoBERTa for RC. We argue that
for multilingual RC, this fix is largely infeasible,
because: (1) in case of in-language prompting on
SMiLER, the variance of the length of the verbal-
izations increases from 0.68 to 1.44 after translation
(see Table 2), and surpasses most of listed mono-
lingual RC datasets (SemEval, NYT and SCIERC),
making it harder to unify the length; (2) manually
adjusting the translated prompts requires manual
effort per target language, making it much more ex-
pensive than adjusting only English verbalizations.
Therefore, we suggest using an encoder-decoder
PLM for prompting (Song et al.,2022).
Training objective
For an encoder-decoder
PLM
M
, given the prompt input
T(x)
and the
target sequence
φ(r)
(i.e. label verbalization), we
denote the output sequence as
y
. The probability of
an exact-match decoding is calculated as follows:
|φ(r)|
Y
t=1
Pθ(yt=φt(r)|y<t, T (x)) ,(5)
where
yt
,
φt(r)
denote the
t
-th token of
y
and
φ(r)
,
respectively.
y<t
denotes the decoded sequence
on the left.
θ
represents the set of all the learn-
able parameters, including those of the PLM
θM
,
and those of the soft tokens
θsp
in case of vari-
ant “soft prompt”. Hence, the final objective over
the training set
X
is to minimize the negative log-
likelihood:
argmin
θ
1
|X | X
x∈X
|φ(r)|
X
t=1
log Pθ(yt=φt(r)|y<t, T (x)) .
(6)
Inference
We collect the output logits of the
decoder,
LR|VL
, where
|V|
is the vocabulary
size of
M
, and
L
is the maximum decode length.
For each relation
r∈ R
, its score is given by (Han
et al.,2022):
scoreθ(r) := 1
|φ(r)|
|φ(r)|
X
t=1
Pθ(yt=φt(r)),(7)
4
摘要:

MultilingualRelationClassicationviaEfcientandEffectivePromptingYuxuanChenDavidHarbeckeLeonhardHennigGermanResearchCenterforArticialIntelligence(DFKI)SpeechandLanguageTechnologyLab{yuxuan.chen,david.harbecke,leonhard.hennig}@dfki.deAbstractPromptingpre-trainedlanguagemodelshasachievedimpressiveper...

展开>> 收起<<
Multilingual Relation Classification via Efficient and Effective Prompting Yuxuan Chen David Harbecke Leonhard Hennig German Research Center for Artificial Intelligence DFKI.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:2.71MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注