
the-blank” task. This new formulation helps
trigger pretrained LMs to produce general
knowledge about the downstream tasks. The
new formulation implies that the KG com-
pletion can serve as a valuable knowledge
benchmark for pretrained LMs, in addition
to benchmarks such as LAMA (Petroni et al.,
2019) and KILT (Petroni et al.,2021).
•
We introduce a parameter-lite encoder to spec-
ify general model knowledge to different KG
completion tasks. This encoder contains a few
parameters for providing additional context
and calibrating biased knowledge according
to the task. The module is applicable to other
deep LMs.
•
We obtain state-of-the-art or competitive per-
formance on five KG completion datasets
spanning two tasks: link prediction and triplet
classification. We achieve this via learning
only 1% of the parameters compared to the
fully finetuning approaches. In addition, com-
pared to task-specific KG completion models,
PALT reaches competitiveness with a unified
architecture for all tasks.
2 PALT
We propose parameter-lite transfer learning, called
PALT, as an alternative to finetuning for knowl-
edge graph (KG) completion. Instead of finetuning
which modifies all the language model (LM) pa-
rameters and stores a new copy for each task, this
method is lightweight for KG completion, which
keeps original LM parameters frozen, but only
tunes a small number of newly added parameters.
The intuition is that LMs have stored factual knowl-
edge during the pretraining, and we need to prop-
erly elicit the relevant knowledge for downstream
tasks without much modification to the original
LMs. To do so, PALT first casts KG completion
into a “fill-in-the-blank” task (Sec. 2.1), and then
introduces a parameter-lite encoder consisting of
a few trainable parameters, while parameters of
the original network remain fixed (Sec. 2.2). The
overall architecture of PALT is shown in Figure 1.
2.1 Knowledge Graph Completion as
Fill-in-the-Blank
We reformulate KG completion as a fill-in-the-
blank task. The basic idea of this task formulation
is that pretrained LMs are able to answer questions
Transformer Layers x6
CLS
Transformer Layers x6
Knowledge Prompt Encoder
Positive
is a____________
SEP SEP P0Pn
···
Chaplin
Knowledge Calibration Encoder
BERT
Answer:
screenwriter
Fill-in-the-Blank
Parameter-Lite Encoder
Figure 1: Summary of our approach PALT. Compared
to finetuning, PALT is a parameter-lite alternative to
transfer the knowledge that pretrained language mod-
els know about knowledge graph completion. Our ap-
proach first casts knowledge graph completion into a
fill-in-the-blank task. This formulation enables pre-
trained language models to produce general knowledge
for knowledge graph completion. By introducing a
few trainable parameters via a parameter-lite encoder
(in the dashed box), PALT further adapts the general
knowledge in language models to different knowledge
graph completion tasks without modifying the original
language model parameters (in grey).
formatted in cloze-style statements, and having a
proper context helps to trigger LMs to produce
general knowledge for the task of interest. For ex-
ample, the KG completion task aims to predict the
missing entity in a fact (Chaplin, profession, __),
which is closely related to a cloze statement. We
therefore frame the KG completion as “fill-in-the-
blank” cloze statements. In this case, “Chaplin is
a” provides the proper context for LMs to elicit
the correct answer “screenwriter” that is generally
relevant to the task.
In more detail, a fact is in the form of (head,
relation, tail) or in short (h, r, t). The LM needs
to predict a missing entity. A typical KG com-
pletion task provides a partial fact (h, r, __) and
a set of candidate answers for the missing entity.
To perform this task, at test time, we convert (h,
r, t
0
)into a cloze statement, where t
0
indicates an
answer candidate for filling the blank. For exam-
ple, given a partial fact (Chaplin, profession, __),
an LM needs to fill in the blank of the cloze state-
ment “Charlie is a __” by providing it as the model
input. In our case, a candidate answer (Chaplin,
profession, screenwriter) is given (e.g., “screen-
writer” is one of the candidates), the corresponding
cloze statement will turn into “
[CLS]
Chaplin is a
[SEP]
screenwriter
[SEP]
” (Figure 1). We use this
statement as an input to a pretrained LM.
[CLS]
and
[SEP]
are special tokens of the pretrained LMs,
e.g., BERT. “Chaplin” is the head entity name or
description. “is a” is relation name or description.
“screenwriter” is the candidate tail entity name or
2