Multilingual Relation Classiﬁcation via Efﬁcient and Effective Prompting Yuxuan Chen David Harbecke Leonhard Hennig German Research Center for Artiﬁcial Intelligence DFKI

2025-05-02 1 0 2.71MB 17 页 10玖币

侵权投诉

Multilingual Relation Classiﬁcation via Efﬁcient and Effective Prompting

Yuxuan Chen David Harbecke Leonhard Hennig

German Research Center for Artiﬁcial Intelligence (DFKI)

Speech and Language Technology Lab

{yuxuan.chen, david.harbecke, leonhard.hennig}@dfki.de

Abstract

Prompting pre-trained language models has

achieved impressive performance on various

NLP tasks, especially in low data regimes. De-

spite the success of prompting in monolingual

settings, applying prompt-based methods in

multilingual scenarios has been limited to a

narrow set of tasks, due to the high cost of

handcrafting multilingual prompts. In this pa-

per, we present the ﬁrst work on prompt-based

multilingual relation classiﬁcation (RC), by in-

troducing an efﬁcient and effective method

that constructs prompts from relation triples

and involves only minimal translation for the

class labels. We evaluate its performance in

fully supervised, few-shot and zero-shot sce-

narios, and analyze its effectiveness across

14 languages, prompt variants, and English-

task training in cross-lingual settings. We

ﬁnd that in both fully supervised and few-shot

scenarios, our prompt method beats compet-

itive baselines: ﬁne-tuning XLM-REM and

null prompts. It also outperforms the ran-

dom baseline by a large margin in zero-shot

experiments. Our method requires little in-

language knowledge and can be used as a

strong baseline for similar multilingual classi-

ﬁcation tasks.

1 Introduction

Relation classiﬁcation (RC) is a crucial task in in-

formation extraction (IE), aiming to identify the

relation between entities in a text (Alt et al.,2019).

Extending RC to multilingual settings has recently

received increased interest (Zou et al.,2018;Kol-

luru et al.,2022), but the majority of prior work still

focuses on English (Baldini Soares et al.,2019;Lyu

and Chen,2021). A main bottleneck for multilin-

gual RC is the lack of supervised resources, compa-

rable in size to large English datasets (Riedel et al.,

2010;Zhang et al.,2017). The SMiLER dataset

(Seganti et al.,2021) provides a starting point to

test fully supervised and more efﬁcient approaches

due to different resource availability for different

languages.

Previous studies have shown the promising per-

formance of prompting PLMs compared to the data-

hungry ﬁne-tuning, especially in low-resource sce-

narios (Gao et al.,2021;Le Scao and Rush,2021;

Lu et al.,2022). Multilingual pre-trained language

models (Conneau et al.,2020;Xue et al.,2021)

further enable multiple languages to be represented

in a shared semantic space, thus making prompting

in multilingual scenarios feasible. However, the

study of prompting for multilingual tasks so far

remains limited to a small range of tasks such as

text classiﬁcation (Winata et al.,2021) and natural

language inference (Lin et al.,2022). To our knowl-

edge, the effectiveness of prompt-based methods

for multilingual RC is still unexplored.

To analyse this gap, we pose two research ques-

tions for multilingual RC with prompts:

RQ1.

What is the most effective way to prompt?

We investigate whether prompting should be done

in English or the target language and whether to

use soft prompt tokens.

RQ2.

How well do prompts perform in different

data regimes and languages? We investigate the

effectiveness of our prompting approach in three

scenarios: fully supervised, few-shot and zero-shot.

We explore to what extent the results are related to

the available language resources.

We present an efﬁcient and effective prompt

method for multilingual RC (see Figure 1) that

derives prompts from relation triplets (see Sec-

tion 3.1). The derived prompts include the original

sentence and entities and are supposed to be ﬁlled

with the relation label. We evaluate the prompts

with three variants, two of which require no transla-

tion, and one of which requires minimal translation,

i.e., of the relation labels only. We ﬁnd that our

method outperforms ﬁne-tuning and a strong task-

agnostic prompt baseline in fully supervised and

few-shot scenarios, especially for relatively low-

arXiv:2210.13838v2 [cs.CL] 26 Oct 2022

Titanic is directed by

Cameron .

Titanic is directed by Cameron .

Titanic _______ Cameron .

Goethe schrieb die

Tragödie Faust .

Goethe schrieb die Tragödie

Faust . Faust _______ Goethe .

乔布斯是苹果公司的联合

创始人之一.

乔布斯是苹果公司的联合创始人之

一. 乔布斯 _______苹果公司 .

Encoder-

Decoder

……

film has director

has author

hat Autor

founded

创立了

Prompt input Plain text Pre-trained

Language Model Answer

Relation verbalizations

Template: . ____

: head entity, tail entity

:code-switch (CS) prompting

: in-language (IL) prompting

Translate

- film has director

- has author

- founded

- …

- Film hat Regisseur

- hat Autor

- gründete

- …

-导演是

-作者是

-创立了

- …

…

Figure 1: Overview of our approach. Given a plain text xcontaining head entity ehand tail entity etfrom language

L, we ﬁrst apply the template T(x) = “x.eh____ et” and yield the prompt input with a blank. Then the PLM

aims to ﬁll in the relation at the blank. In code-switch prompting, the target sequence is the English relation

verbalization. In in-language prompting, the target is the relation name translated into L.

resource languages. Our method also improves

over the random baseline in zero-shot settings, and

achieves promising cross-lingual performance. The

main contributions of this work hence are:

•

We propose a simple but efﬁcient prompt

method for multilingual RC, which is, to the

best of our knowledge, the ﬁrst work to ap-

ply prompt-based methods to multilingual RC

(Section 3).

•

We evaluate our method on the largest multi-

lingual RC dataset, SMiLER (Seganti et al.,

2021), and compare our method with strong

baselines in all three scenarios. We also inves-

tigate the effects of different prompt variants,

including insertion of soft tokens, prompt lan-

guage, and the word order of prompting (Sec-

tions 4&5).

2 Preliminaries

We ﬁrst give a formal deﬁnition of the relation

classiﬁcation task, and then introduce ﬁne-tuning

and prompting paradigms to perform RC.

2.1 Relation Classiﬁcation Task Deﬁnition

Relation classiﬁcation is the task of classifying the

relationship such as date_of_birth,founded_by or

parents between pairs of entities in a given context.

Formally, given a relation set

and a text

x= [x1, x2, . . . , xn]

(where

x1,· · · , xn

are to-

kens) with two disjoint spans

and

denot-

ing the head and tail entity, RC aims to predict

the relation

r∈ R

between

and

, or give a

no_relation prediction if no relation in

holds.

RC is a multilingual task if the token sequences

come from different languages.

2.2 Fine-tuning for Relation Classiﬁcation

In ﬁne-tuning, a task-speciﬁc linear classiﬁer is

added on top of the PLM. Fine-tuning hence intro-

duces a different scenario from pre-training, since

language model (LM) pre-training is usually for-

malized as a cloze-style task to predict target tokens

[MASK]

(Devlin et al.,2019;Liu et al.,2019) or

a corrupted span (Raffel et al.,2020;Lewis et al.,

2020). For the RC task, the classiﬁer aims to pre-

dict the target class

[CLS]

or at the entity spans

denoted by MARKER (Baldini Soares et al.,2019).

2.3 Prompting for Relation Classiﬁcation

Prompting is proposed to bridge the gap between

pre-training and ﬁne-tuning (Liu et al.,2022;Gu

et al.,2022). The essence of prompting is, by ap-

pending extra text to the original text according

to a task-speciﬁc template

T(·)

, to reformulate the

downstream task to an LM pre-training task such

as masked language modeling (MLM), and apply

the same training objective during the task-speciﬁc

training. For the RC task, to identify the relation

between “Angela Merkel” and “Joachim Sauer” in

the text “Angela Merkel’s current husband is quan-

tum chemist Joachim Sauer,” an intuitive template

for prompting can be “The relation between An-

gela Merkel and Joachim Sauer is

[MASK]

,” and the

LM is supposed to assign a higher likelihood to the

term couple than to e.g. friends or colleagues at

Prompt input Target Example

Input Target

null prompts x.____ φEN (r)Goethe schrieb Faust. ____ has author

CS x.eh____ etφEN (r)Goethe schrieb Faust.Faust ____ Goethe has author

SP x.[v1]eh[v2]____ [v3]etφEN (r)Goethe schrieb Faust.[v1]Faust [v2]____ [v3]Goethe has author

IL x.eh____ etφL(r)Goethe schrieb Faust.Faust ____ Goethe hat Autor

Table 1: Overview of the prompts, including null prompts (baseline), and ours with its variants. For each prompt

or its variant, we list (1) the prompt input and the target; (2) an example based on the plain text in German “Goethe

schrieb Faust.” [vi]: learnable soft tokens. φE N (r): the original (English) relation verbalization. φL(r): the

translated relation verbalization into the target language L.

[MASK]

. This “ﬁll-in the blank” paradigm is well

aligned with the pre-training scenario, and enables

prompting to better coax the PLMs for pre-trained

knowledge (Petroni et al.,2019).

3 Methods

We now present our method, as shown in Figure 1.

We introduce its template and verbalizer, and pro-

pose several variants of the prompt. Lastly, we

explain the training and inference process.

3.1 Template

For prompting (Liu et al.,2022), a prompt often

consists of a template

T(·)

and a verbalizer

Given a plain text

, the template

adds task-

related instruction to

to yield the prompt input

xprompt =T(x).(1)

Following Chen et al. (2022) and Han et al.

(2021), we treat relations as predicates and use

the cloze “

eh{relation}et

” for the LM to ﬁll in.

Our template is formulated as

T(x) := “x.eh____ et”.(2)

In the template

T(x)

is the original text and the

two entities

and

come from

. Therefore,

our template does not introduce extra tokens, thus

involves no translation at all.

3.2 Verbalizer

After being prompted by

xprompt

, the PLM

predicts the masked text

at the blank. To com-

plete an NLP classiﬁcation task, a verbalizer

required to bridge the set of labels

and the set of

predicted texts (verbalizations

. For the simplic-

ity of our prompt, we use the one-to-one verbalizer:

φ:Y → V, r 7→ φ(r),(3)

where

is a relation, and

φ(r)

is the simple ver-

balization of

φ(·)

normally only involves split-

ting

by “-” or “_” and replacing abbreviations

such as org with organization. E.g., the relation

org-has-member corresponds to the verbalization

“organization has member”. Then the prediction is

formalized as

p(r|x)∝p(y=φ(r)|xprompt;θM),(4)

where

θM

denotes the parameters of model

p(r|x)

is normalized by the likelihood sum over

all relations.

3.3 Variants

To ﬁnd the optimal way to prompt, we investigate

three variants as follows.

Hard prompt vs soft prompt (SP)

Hard

prompts (a.k.a. discrete prompts) (Liu et al.,2022)

are entirely formulated in natural language. Soft

prompts (a.k.a. continuous prompts) consist of

learnable tokens (Lester et al.,2021) that are not

contained in the PLM vocabulary. Following Han

et al. (2021), we insert soft tokens before entities

and blanks as shown for SP in Table 1.

Code-switch (CS) vs in-language (IL)

Rela-

tion labels are in English across almost all RC

datasets. Given a text from a non-English input

with a blank, the recovered text is code-mixed

after being completed with an English verbaliza-

tion, corresponding to code-switch prompting. It

is probably more reasonable for the PLM to ﬁll

in the blank in language

. Inspired by Lin et al.

(2022) and Zhao and Schütze (2021), we machine-

translate the English verbalizers into the other lan-

guages.

Table 1visualizes both code-switch (CS)

See Appendix Bfor more examples of translated verbal-

izations. To translate the verbalizer of the SMiLER dataset,

we use DeepL by default and Google Translate when the target

language is not supported by DeepL (in case of AR, FA, KO

and UK).

Task Dataset #Class Verbalizations # Token in Verb.

Mean Std.

LA CoLA (Warstadt et al.,2019) 2 correct, incorrect. (Gao et al.,2021) 1 0

NER CoNLL03 (Tjong Kim Sang and De Meulder,2003) 5 location, person, not an, ... (Cui et al.,2021) 1.2 0.4

NLI MNLI (Williams et al.,2018) 3 yes, no, maybe. (Fu et al.,2022) 1 0

NLI XNLI (Conneau et al.,2018) 3 yes, no, maybe; Evet, ... (Zhao and Schütze,2021) 1 0

PI PAWS-X (Yang et al.,2019) 2 yes, no. (Qi et al.,2022) 1 0

TC MARC (Keung et al.,2020) 2 good, {average, bad}. (Huang et al.,2022) 1 0

TACRED (Zhang et al.,2017) 42 founded by, city of birth, country of death, ... 3.23 1.99

SemEval (Hendrickx et al.,2010) 10 cause effect, entity origin, product producer, ... 2.50 0.81

NYT (Riedel et al.,2010) 24 ethnicity, major shareholder of, religion, ... 2.10 1.01

SCIERC (Luan et al.,2018) 6 conjuction, feature of, part of, used for, ... 2.17 0.69

SMiLER (EN) (Seganti et al.,2021) 36 birth place, starring, won award, ... 2.58 0.68

SMiLER (ALL) (Seganti et al.,2021) 36 hat Genre, chef d’organisation, del país, ... 3.66 1.44

Table 2: Statistics of the lengths of the verbalizations over several classiﬁcation tasks. The lengths for non-RC

tasks depend on the tokenizers from the respective PLMs in the cited work. The lengths for RC tasks are based

on the mT5BASE tokenizer. Mean and std. show that the label space of the RC task is more complex than most

few-class classiﬁcation tasks. The verbalizations of RC datasets are listed in Appendix B. For SemEval, the two

possible directions of a relation are combined. For NYT, we use the version from Zeng et al. (2018). For SMiLER,

"EN" is the English split; "ALL" contains all data from 14 languages.

and in-language (IL) prompting. For English, CS-

and IL- prompting are equivalent, since

is En-

glish itself.

Word order of prompting

For the RC task,

head-relation-tail triples involve three elements.

Therefore, deriving natural language prompts from

them requires handling where to put the predicate

(relation). In the case of SOV languages, ﬁlling

in a relation that occurs between

and

seems

less intuitive. Therefore, to investigate if the word

order of prompting affects prediction accuracy,

we swap the entities and the blank in the SVO-

template “

x.eh____ et

” and get “

x.ehet____

”

as the SOV-template.

3.4 Training and Inference

The training and inference setups depend on the em-

ployed model. Prompting autoencoding language

models requires the verbalizations to be of ﬁxed

length, since the length of masks, which is identical

with verbalization length, is unknown during infer-

ence. Encoder-decoders can handle verbalizations

of varying length by nature (Han et al.,2022;Du

et al.,2022). Han et al. (2021) adjust all the ver-

balizations in TACRED to a length of 3, to enable

prompting with RoBERTa for RC. We argue that

for multilingual RC, this ﬁx is largely infeasible,

because: (1) in case of in-language prompting on

SMiLER, the variance of the length of the verbal-

izations increases from 0.68 to 1.44 after translation

(see Table 2), and surpasses most of listed mono-

lingual RC datasets (SemEval, NYT and SCIERC),

making it harder to unify the length; (2) manually

adjusting the translated prompts requires manual

effort per target language, making it much more ex-

pensive than adjusting only English verbalizations.

Therefore, we suggest using an encoder-decoder

PLM for prompting (Song et al.,2022).

Training objective

For an encoder-decoder

PLM

, given the prompt input

T(x)

and the

target sequence

φ(r)

(i.e. label verbalization), we

denote the output sequence as

. The probability of

an exact-match decoding is calculated as follows:

|φ(r)|

t=1

Pθ(yt=φt(r)|y<t, T (x)) ,(5)

where

φt(r)

denote the

-th token of

and

φ(r)

respectively.

y<t

denotes the decoded sequence

on the left.

represents the set of all the learn-

able parameters, including those of the PLM

θM

and those of the soft tokens

θsp

in case of vari-

ant “soft prompt”. Hence, the ﬁnal objective over

the training set

is to minimize the negative log-

likelihood:

argmin

−1

|X | X

x∈X

|φ(r)|

t=1

log Pθ(yt=φt(r)|y<t, T (x)) .

(6)

Inference

We collect the output logits of the

decoder,

L∈R|V|×L

, where

|V|

is the vocabulary

size of

, and

is the maximum decode length.

For each relation

r∈ R

, its score is given by (Han

et al.,2022):

scoreθ(r) := 1

|φ(r)|

t=1

Pθ(yt=φt(r)),(7)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MultilingualRelationClassicationviaEfcientandEffectivePromptingYuxuanChenDavidHarbeckeLeonhardHennigGermanResearchCenterforArticialIntelligence(DFKI)SpeechandLanguageTechnologyLab{yuxuan.chen,david.harbecke,leonhard.hennig}@dfki.deAbstractPromptingpre-trainedlanguagemodelshasachievedimpressiveper...

展开>> 收起<<

Multilingual Relation Classiﬁcation via Efﬁcient and Effective Prompting Yuxuan Chen David Harbecke Leonhard Hennig German Research Center for Artiﬁcial Intelligence DFKI.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Multilingual Relation Classiﬁcation via Efﬁcient and Effective Prompting Yuxuan Chen David Harbecke Leonhard Hennig German Research Center for Artiﬁcial Intelligence DFKI

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: