Adapters for Enhanced Modeling of Multilingual Knowledge and Text Yifan Hou1 Wenxiang Jiao2 Meizhen Liu3 Carl Allen1 Zhaopeng Tu2 Mrinmaya Sachan1 1ETH Zürich2Tencent AI Lab3Shandong University

2025-04-30 0 0 806.4KB 16 页 10玖币

侵权投诉

Adapters for Enhanced Modeling of Multilingual Knowledge and Text

Yifan Hou1, Wenxiang Jiao2, Meizhen Liu3, Carl Allen1, Zhaopeng Tu2, Mrinmaya Sachan1

1ETH Zürich, 2Tencent AI Lab, 3Shandong University

1{yifan.hou, carl.allen, mrinmaya.sachan}@inf.ethz.ch

2{joelwxjiao, zptu}@tencent.com,3meizhen.liu@mail.sdu.edu.cn

Abstract

Large language models appear to learn facts

from the large text corpora they are trained

on. Such facts are encoded implicitly within

their many parameters, making it difﬁcult

to verify or manipulate what knowledge has

been learned. Language models have re-

cently been extended to multilingual language

models (MLLMs), enabling knowledge to be

learned across hundreds of languages. Mean-

while, knowledge graphs contain facts in an

explicit triple format, which require careful

and costly curation and are only available

in a few high-resource languages, restricting

their research and application. To address

these issues, we propose to enhance MLLMs

with knowledge from multilingual knowledge

graphs (MLKGs) so as to tackle language and

knowledge graph tasks across many languages,

including low-resource ones. Speciﬁcally, we

introduce a lightweight adapter set to enhance

MLLMs with cross-lingual entity alignment

and facts from MLKGs for many languages.

Experiments on common benchmarks show

that such enhancement beneﬁts both MLLMs

and MLKGs, achieving: (1) comparable or im-

proved performance for knowledge graph com-

pletion and entity alignment relative to base-

lines, especially for low-resource languages

(for which knowledge graphs are unavailable);

and (2) improved MLLM performance on lan-

guage understanding tasks that require mul-

tilingual factual knowledge; all while main-

taining performance on other general language

tasks.1

1 Introduction

Knowledge graphs serve as a source of explicit fac-

tual information for various NLP tasks. However,

language models (Devlin et al.,2019;Brown et al.,

2020), which capture implicit knowledge from vast

Our code, models, and data (e.g., integration corpus and

extended datasets) are available at https://github.com/yifan-

h/Multilingual_Space.

is located in

si trova a

Zurich

Alain de Boon

was born in

Svizzera

Alain de Boon

位于

属于

English

Italian

Chinese

write

出版

瑞士

Status Anxiety

身份的焦虑

MLLM task:

Relaon extracon

Enty alignment

(it -> en)

Alain de Boon Switzerland

lives in => vive in

Knowledge

triple

(en)

(Alain de Boon, vive in, Svizzera)

(it)

Alain de Boon Svizzera

(it)

1. Sentence:

Alain de Boon vive di recente

in Svizzera.

MLKG

Nonﬁcon

纪实作品

阿兰·德波顿

Switzerland

lives in

出生于

苏黎世

Zurigo

è nato in

2. Using MLKG:

3. Knowledge triple:

Figure 1:

Combining MLLMs and MLKGs beneﬁts both:

MLKGs suffer from incompleteness and are limited to few

languages, which MLLMs can supplement. MLLMs lack

entity alignment and ﬁrm facts, which MLKGs can provide.

text corpora, are already being used in knowledge-

intensive tasks. Recently, language models have

been successfully extended to multilingual lan-

guage models (MLLMs) that integrate information

sourced across hundreds of languages (Devlin et al.,

2019;Conneau and Lample,2019;Conneau et al.,

2020). However, as with most neural networks, the

information is encoded in a diffused and opaque

manner that is difﬁcult to interpret, verify or uti-

lize (AlKhamissi et al.,2022).

Meanwhile, multilingual knowledge graphs

(MLKGs) require careful curation of explicit facts

and annotation of entities that occur in languages

(cross-lingual entity alignment), making knowl-

edge graphs expensive and time-consuming to ex-

tend to new languages, restricting knowledge graph

research to a few high-resource languages. Fur-

ther, open-source MLKGs such as WordNet (Bond

and Foster,2013) and Wikidata (Vrandeˇ

ci´

c and

Krötzsch,2014) suffer from incompleteness as

many true facts (or triples) and entity alignments

are missing (Chen et al.,2017,2020).

In this work, we propose to overcome the above

arXiv:2210.13617v2 [cs.CL] 26 Oct 2022

limitations of each knowledge source by integrat-

ing MLKGs into MLLMs (as shown in Figure 1),

to enable (i) the transfer of MLKG knowledge

from high-resource languages to low-resource lan-

guages; and (ii) explicit knowledge of MLKGs

to supplement MLLMs for knowledge-intensive

language tasks, one of the key challenges in

MLLMs (AlKhamissi et al.,2022).

While this idea seems intuitive, there is no

easy way to incorporate the explicit knowledge of

MLKGs into the parametrically stored information

of MLLMs. Existing knowledge integration meth-

ods utilize language models and knowledge graphs

in two ways: (1) training knowledge graph embed-

dings individually and combining the embeddings

corresponding to linked entities in sentences with

the language model representations (e.g., Know-

BERT (Peters et al.,2019) and ERNIE (Zhang et al.,

2019)); or (2) absorbing the knowledge in knowl-

edge graphs into the language model’s parameters

via joint training (e.g., K-BERT (Liu et al.,2020)

and K-Adapter (Wang et al.,2021)).

The ﬁrst method requires embedding knowl-

edge graph entities and accurately extracting en-

tities in sentences across hundreds of languages,

which is highly challenging. The second method

typically suffers from the curse of multilingual-

ity (Conneau et al.,2020;Doddapaneni et al.,2021;

Jiao et al.,2022) and catastrophic forgetting (Kirk-

patrick et al.,2016) due to limited model capacity.

Most importantly, both methods integrate knowl-

edge implicitly such that it is difﬁcult to access

and extend to low-resource languages (AlKhamissi

et al.,2022). Furthermore, both methods require

large sets of aligned sentences and knowledge

triples, which is costly to gather and accurately

annotate across hundreds of languages.

To address above issues, we ﬁrst collect

and clean multilingual data from Wikidata

and

Wikipedia

for the enhancement, where rich fac-

tual knowledge and cross-lingual alignments are

available. Then, we propose to enhance MLLMs

with the MLKG information by using a set

of adapters (Houlsby et al.,2019), which are

lightweight, collectively having only around 0.5%

extra parameters than the MLLM. Each adapter

integrates information from either MLKG

riples

(i.e. facts) or cross-lingual

ntity alignments, and

is trained on either

hrase or

entence level data.

2https://www.wikidata.org/wiki/Wikidata:Main_Page

3https://en.wikipedia.org/wiki/Main_Page

Each of the resulting four adapters (EP/TP/ES/TS)

is trained individually to learn information sup-

plemental to that already learned by the MLLM.

Adapter outputs are combined by a fusion mecha-

nism (Pfeiffer et al.,2021). Training objectives are

similar to those for MLKG embedding (Chen et al.,

2017) instead of mask language modeling, which

are more efﬁcient with large corpus.

We conduct experiments on various downstream

tasks to demonstrate the effectiveness of our ap-

proach. For MLKG tasks, following the data col-

lection methods of two existing benchmarks (Chen

et al.,2020,2017), we extended them from 2-5

languages to 22 languages, including two rare lan-

guages.

Results show that our method obtains

comparable performance to existing state-of-the-

art baselines on the knowledge graph completion

benchmark, and signiﬁcantly better performance on

the entity alignment benchmark. More importantly,

we can perform these knowledge graph tasks in low-

resource languages for which no knowledge graph

exists, and achieve comparable results to the high-

resource languages. Improvements over baseline

MLLMs are signiﬁcant. The results demonstrate

that our proposed method integrates the explicit

knowledge from MLKGs into MLLMs that can be

used across many languages. Our method also im-

proves existing MLLMs noticeably on knowledge-

intensive language tasks, such as cross-lingual rela-

tion classiﬁcation, whilst maintaining performance

on general language tasks such as named entity

recognition (NER) and question answering (QA).

2 Multilingual Knowledge Integration

In this paper, we fuse knowledge from a MLKG

into a MLLM. Following previous works (Wang

et al.,2021;Liu et al.,2021), we make use of an

entity tagged corpus of text (called a knowledge

integration corpus) for knowledge integration. We

formally introduce these concepts below.

MLLM.

A multilingual LM can be thought of

as an encoder that can represent text in any lan-

guage

in a set of languages

. Let

denote the

shared vocabulary over all languages. Let

tl∈ V

denote a token in language

. A sentence

in a

language

can be denoted as a sequence of tokens:

sl= (tl

1, tl

2, ...)

. The output representations of the

MLLM for

can be denoted by a sequence of

The extended datasets as well as KI corpus are published

with our code implementation.

vectors:

LM(sl)=(h1,h2, ...)

. These vectors cor-

respond to representations for each token in the

sentence, one representation per input token. Var-

ious tokenization schemes such as wordpiece or

BPE might be considered here. We use the aver-

age of the token representations as the representa-

tion of the sentence:

LM(sl) = mean(h1,h2, ...)

Similarly, for a phrase

(starting from the

-th

token and ending in the

-th token in the sentence),

we can obtain its contextualized representation as

LM(sl

ij ) = mean(hi,hi+1, . . . hj).

MLKG.

A multilingual knowledge graph is a

graph with entities and knowledge triples in each

language

l∈ L

. Let

denote the set of entities

and

denote the set of knowledge triples. In a

MLKG, each entity indexed

might appear in sev-

eral languages. Let

denote the entity label of the

-th entity in language

. Furthermore, we denote a

knowledge triple in the MLKG as

(el

i, rl00

k, el0

j)∈ T

where

rl00

is the

kth

relation. Note that since en-

tities (as well as relations) may appear in various

languages under different labels, knowledge triples

can be deﬁned across languages.

Knowledge Integration Corpus.

For knowl-

edge integration, besides the MLKG, we make use

of a corpus of text

(as shown in the right part of

Figure 2). The corpus

comprises of two kinds

of texts. First, we have a set of texts

for the

cross-lingual entity alignment, which comprise of

sentences with mentions of entities in the MLKG.

For example in Figure 2, given the sentence De

Botton spent his early years in Zurich, we have the

aligned entity Zurich and its cross-lingual labels.

The second set of texts

is for the knowledge

triple, which comprises of sentences aligned with

knowledge triples in the MLKG. For example in

Figure 2, given the sentence Zurich is the largest

city in Switzerland, we have its aligned knowledge

triple (Zurich, is located in, Switzerland).

3 Adapters and Adapter Fusion

In this section, we ﬁrst describe how we incorporate

adapters into language models and how they can

be used to enhance them with different sources of

knowledge from knowledge graphs.

Adapter.

Adapters have become a popular

choice for parameter-efﬁcient ﬁnetuning of lan-

guage models on downstream tasks (Houlsby et al.,

2019) due to their ﬂexibility, effectiveness, low

cost and scalability (Pfeiffer et al.,2021). Adapters

are new modules that are added between layers of

language models

, the parameters of which are up-

dated only during ﬁnetuning while the language

model parameters are frozen. An adapter is a bot-

tleneck layer composed of two feed-forward layers

with one non-linear activation function. For

the hidden representation of token

at layer

the adapter acts as

A(hm) = Wup ·σ(Wdown ·hm+bdown) + bup.(1)

Here,

Wdown

and

Wup

are weight matrices,

which map the hidden representations to the low-

dimensional space and then map them back.

bdown

and

bup

are bias parameters, and

is a nonlinear

activation function.

Adapter Fusion.

We follow the architecture

of Pfeiffer et al. (2021), but instead of using

adapters for ﬁnetuning, we use them to enhance

MLLMs with knowledge. Our approach is similar

to Wang et al. (2021), but our adapters supplement

and augment the existing implicit knowledge of

MLLMs (into the explicit geometric properties of

hidden representations), And our approach is more

lightweight, with only c.

0.5%

additional parame-

ters (cf >10% in Wang et al. (2021)).

As shown in Figure 2(left), still considering

the

-th layer, the output representations of the

feedforward layer (denoted

as in Eq. 1) are

input to the adapters. A fusion layer aggregates

all adapter outputs

An(hm)

(

n∈ {1...N}

indexes

each adapter) and the un-adapted representations

with a multiplicative attention mechanism:

Afusion(hm) =

n=0

n·Vm·An(hm),

n=softmax(hmQm⊗An(hm)Km).

Here,

A0(·)

is the identity function;

are parameters in the multiplicative attention mech-

anism; and ⊗is the Hadamard product.

The additional knowledge to be learned by the

adapters comes from knowledge

riples and

ntity

alignments, each provided in both

hrase and

entence format (hence

N= 2 ×2=4

). As

shown in Figure 2(center), for a given entity in

two languages

and

Adapter-EP.

learns to align

the two (multilingual) representations of

and

el0

e.g., Zurich is aligned with Zurigo.

Adapter-TP.

Where to insert adapters is ﬂexible but a common choice

is after the feedforward layer of a transformer layer.

Figure 2:

The architecture of MLLMs with adapters and their roles. We enhance multilingual and factual knowledge in phrase

and sentence levels using different knowledge integration corpus.

learns knowledge triples, e.g., predicting Switzer-

land given entity and relation (Zurich, is located

in,). Besides these non-contextualized settings, en-

tities within context can be considered also (MLLM

corpus). Thus,

Adapter-ES.

and

Adapter-TS.

have the similar objectives but use contextualized

representations from input sentences.

4 Knowledgeable Adapters

Next, we design objectives with corresponding

knowledge integration datasets to train a set of

adapters. Similar to MLKG embedding (Chen et al.,

2017), we aim to encode knowledge into the geo-

metric properties of the adapted MLLM representa-

tions, i.e., the MLLM and adapters collectively act

as an MLKG embedding model. Speciﬁcally, we

use cosine distance within the contrastive learning

loss of InfoNCE (van den Oord et al.,2018):

INCE(x,x0) = log cos(x,x0)

Px00 ∈Xcos(x,x00),

where

is a batch that includes the positive sample

x0and a number of negative samples.6

Adapter-EP.

We use Wikidata (Vrandeˇ

ci´

c and

Krötzsch,2014) to enhance MLLMs with the

knowledge of cross-lingual entity alignments. In-

spired by the idea that languages are aligned im-

plicitly in a universal space in MLLMs (Wu and

Dredze,2019;Wei et al.,2021), we train the

aligned entities to have closer representations. De-

noting the MLLM with this adapter as

LM(·)

, the

We use in-batch negative sampling, where entities (with

labels in any languages) in the batch are randomly selected.

objective used to train EP is:

LEP =X

(el

i,el0

i)∈E

INCELM(el

i),LM(el0

i),

where

LM(·)

means we take the mean of token

representations as the entity representation vector.

Adapter-TP.

We train this adapter using the

knowledge triples in Wikidata. Inspired by pre-

vious knowledge graph embedding algorithms (e.g.

Bordes et al.,2013), for a given fact triple, we

train the (adapted) object entity embedding to be

close to the (adapted) joint embedding of the sub-

ject entity and relation. The objective used to train

TP is quite different from existing mask language

modeling-based ones:

LTP =X

(el

i,rl00

k,el0

j)∈T

INCELM([el

i;rl00

k]),LM(el0

j),

where

[; ]

denotes text concatenation. Note that we

apply code-switching (Liu et al.,2021), and thus

entities and relations can be in different languages.

This is helpful to capture knowledge triples for

low-resource languages.

Adapter-ES.

Entity alignment can also be ap-

plied to contextualized embeddings produced by

the MLLM when entities are input within natural

language sentences. For this purpose, we use sum-

maries taken from multilingual Wikipedia. Speciﬁ-

cally, we ﬁrst align the entity in Wikidata with the

Wikipedia title, and extract sentences that contain

the entity label in its summary. As described ear-

lier, we denoted this corpus as

. Thus, similar

to Adapter-EP, we train ES by aligning contex-

tualized entity representations of cross-lingually

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AdaptersforEnhancedModelingofMultilingualKnowledgeandTextYifanHou1,WenxiangJiao2,MeizhenLiu3,CarlAllen1,ZhaopengTu2,MrinmayaSachan11ETHZürich,2TencentAILab,3ShandongUniversity1{yifan.hou,carl.allen,mrinmaya.sachan}@inf.ethz.ch2{joelwxjiao,zptu}@tencent.com,3meizhen.liu@mail.sdu.edu.cnAbstractLargela...

展开>> 收起<<

Adapters for Enhanced Modeling of Multilingual Knowledge and Text Yifan Hou1 Wenxiang Jiao2 Meizhen Liu3 Carl Allen1 Zhaopeng Tu2 Mrinmaya Sachan1 1ETH Zürich2Tencent AI Lab3Shandong University.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Adapters for Enhanced Modeling of Multilingual Knowledge and Text Yifan Hou1 Wenxiang Jiao2 Meizhen Liu3 Carl Allen1 Zhaopeng Tu2 Mrinmaya Sachan1 1ETH Zürich2Tencent AI Lab3Shandong University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: