
limitations of each knowledge source by integrat-
ing MLKGs into MLLMs (as shown in Figure 1),
to enable (i) the transfer of MLKG knowledge
from high-resource languages to low-resource lan-
guages; and (ii) explicit knowledge of MLKGs
to supplement MLLMs for knowledge-intensive
language tasks, one of the key challenges in
MLLMs (AlKhamissi et al.,2022).
While this idea seems intuitive, there is no
easy way to incorporate the explicit knowledge of
MLKGs into the parametrically stored information
of MLLMs. Existing knowledge integration meth-
ods utilize language models and knowledge graphs
in two ways: (1) training knowledge graph embed-
dings individually and combining the embeddings
corresponding to linked entities in sentences with
the language model representations (e.g., Know-
BERT (Peters et al.,2019) and ERNIE (Zhang et al.,
2019)); or (2) absorbing the knowledge in knowl-
edge graphs into the language model’s parameters
via joint training (e.g., K-BERT (Liu et al.,2020)
and K-Adapter (Wang et al.,2021)).
The first method requires embedding knowl-
edge graph entities and accurately extracting en-
tities in sentences across hundreds of languages,
which is highly challenging. The second method
typically suffers from the curse of multilingual-
ity (Conneau et al.,2020;Doddapaneni et al.,2021;
Jiao et al.,2022) and catastrophic forgetting (Kirk-
patrick et al.,2016) due to limited model capacity.
Most importantly, both methods integrate knowl-
edge implicitly such that it is difficult to access
and extend to low-resource languages (AlKhamissi
et al.,2022). Furthermore, both methods require
large sets of aligned sentences and knowledge
triples, which is costly to gather and accurately
annotate across hundreds of languages.
To address above issues, we first collect
and clean multilingual data from Wikidata
2
and
Wikipedia
3
for the enhancement, where rich fac-
tual knowledge and cross-lingual alignments are
available. Then, we propose to enhance MLLMs
with the MLKG information by using a set
of adapters (Houlsby et al.,2019), which are
lightweight, collectively having only around 0.5%
extra parameters than the MLLM. Each adapter
integrates information from either MLKG
T
riples
(i.e. facts) or cross-lingual
E
ntity alignments, and
is trained on either
P
hrase or
S
entence level data.
2https://www.wikidata.org/wiki/Wikidata:Main_Page
3https://en.wikipedia.org/wiki/Main_Page
Each of the resulting four adapters (EP/TP/ES/TS)
is trained individually to learn information sup-
plemental to that already learned by the MLLM.
Adapter outputs are combined by a fusion mecha-
nism (Pfeiffer et al.,2021). Training objectives are
similar to those for MLKG embedding (Chen et al.,
2017) instead of mask language modeling, which
are more efficient with large corpus.
We conduct experiments on various downstream
tasks to demonstrate the effectiveness of our ap-
proach. For MLKG tasks, following the data col-
lection methods of two existing benchmarks (Chen
et al.,2020,2017), we extended them from 2-5
languages to 22 languages, including two rare lan-
guages.
4
Results show that our method obtains
comparable performance to existing state-of-the-
art baselines on the knowledge graph completion
benchmark, and significantly better performance on
the entity alignment benchmark. More importantly,
we can perform these knowledge graph tasks in low-
resource languages for which no knowledge graph
exists, and achieve comparable results to the high-
resource languages. Improvements over baseline
MLLMs are significant. The results demonstrate
that our proposed method integrates the explicit
knowledge from MLKGs into MLLMs that can be
used across many languages. Our method also im-
proves existing MLLMs noticeably on knowledge-
intensive language tasks, such as cross-lingual rela-
tion classification, whilst maintaining performance
on general language tasks such as named entity
recognition (NER) and question answering (QA).
2 Multilingual Knowledge Integration
In this paper, we fuse knowledge from a MLKG
into a MLLM. Following previous works (Wang
et al.,2021;Liu et al.,2021), we make use of an
entity tagged corpus of text (called a knowledge
integration corpus) for knowledge integration. We
formally introduce these concepts below.
MLLM.
A multilingual LM can be thought of
as an encoder that can represent text in any lan-
guage
l
in a set of languages
L
. Let
V
denote the
shared vocabulary over all languages. Let
tl∈ V
denote a token in language
l
. A sentence
sl
in a
language
l
can be denoted as a sequence of tokens:
sl= (tl
1, tl
2, ...)
. The output representations of the
MLLM for
sl
can be denoted by a sequence of
4
The extended datasets as well as KI corpus are published
with our code implementation.