PALT Parameter-Lite Transfer of Language Models for Knowledge Graph Completion Jianhao Shen1 Chenguang Wang2y Ye Yuan1 Jiawei Han3

2025-05-06 0 0 2.21MB 15 页 10玖币
侵权投诉
PALT: Parameter-Lite Transfer of Language Models for
Knowledge Graph Completion
Jianhao Shen1, Chenguang Wang2, Ye Yuan1, Jiawei Han3
Heng Ji3,Koushik Sen4,Ming Zhang1,Dawn Song4
1Peking University, 2Washington University in St. Louis,
3University of Illinois at Urbana-Champaign, 4UC Berkeley
{jhshen,yuanye_pku,mzhang_cs}@pku.edu.cn,chenguangwang@wustl.edu,
{hanj,hengji}@illinois.edu,{ksen,dawnsong}@berkeley.edu
Abstract
This paper presents a parameter-lite transfer
learning approach of pretrained language mod-
els (LM) for knowledge graph (KG) comple-
tion. Instead of finetuning, which modifies
all LM parameters, we only tune a few new
parameters while keeping the original LM pa-
rameters fixed. We establish this via reformu-
lating KG completion as a “fill-in-the-blank”
task, and introducing a parameter-lite encoder
on top of the original LMs. We show that,
by tuning far fewer parameters than finetuning,
LMs transfer non-trivially to most tasks and
reach competitiveness with prior state-of-the-
art approaches. For instance, we outperform
the fully finetuning approaches on a KG com-
pletion benchmark by tuning only 1% of the
parameters.1
1 Introduction
Pretrained language models (LM) such as BERT
and GPT-3 have enabled downstream transfer (De-
vlin et al.,2019;Brown et al.,2020). Recent stud-
ies (Petroni et al.,2019;Jiang et al.,2020;He et al.,
2021) show that the implicit knowledge learned
during pretraining is the key to success. Among
different transfer learning techniques (Shin et al.,
2020;Liu et al.,2021a,b;Houlsby et al.,2019;
Devlin et al.,2019), finetuning is the de facto
paradigm to adapt the knowledge to downstream
NLP tasks. Knowledge graph (KG) completion
is a typical knowledge-intensive application. For
example, given a fact (Chaplin, profession, __)
missing an entity, it aims to predict the correct en-
tity “screenwriter”. This task provides a natural
testbed to evaluate the knowledge transfer ability
of different transfer learning approaches.
Finetuning (Yao et al.,2019;Shen et al.,2022)
has been recently adopted to advance the KG com-
Corresponding authors.
1
The code and datasets are available at
https://github.
com/yuanyehome/PALT.
pletion performance. However, it presents two fun-
damental limitations. First, finetuning is computa-
tionally inefficient, requiring updating all param-
eters of the pretrained LMs. This ends up with
an entirely new model for each KG completion
task. For example, storing a full copy of pretrained
BERT
LARGE
(340M parameters) for each task is
non-trivial, not to mention the billion parameter
LMs. Second, the finetuning approaches often
rely on task-specific architectures for various KG
completion tasks. For instance, KG-BERT (Yao
et al.,2019) designs different model architectures
to adapt a pretrained BERT to different tasks. This
restricts its usability in more downstream tasks.
In this work, we enable parameter-lite transfer of
the pretrained LMs to knowledge-intensive tasks,
with a focus on KG completion. As an alternative
to finetuning, our method, namely PALT, tunes
no existing LM parameters. We establish this by
casting the KG completion into a “fill-in-the-blank”
task. This formulation enables eliciting general
knowledge about KG completion from pretrained
LMs. By introducing a parameter-lite encoder con-
sisting of a few trainable parameters, we efficiently
adapt the general model knowledge to downstream
tasks. The parameters of the original LM network
remain fixed during the adaptation process for dif-
ferent KG completion tasks. In contrast to fine-
tuning which modifies all LM parameters, PALT
is lightweight. Instead of designing task-specific
model architectures, PALT stays with the same
model architecture for all KG completion tasks that
we evaluate.
The contributions are as follows:
We propose parameter-lite transfer learning
for pretrained LMs to adapt their knowledge
to KG completion. The reach of the results
is vital for broad knowledge-intensive NLP
applications.
We reformulate KG completion as a “fill-in-
1
arXiv:2210.13715v1 [cs.CL] 25 Oct 2022
the-blank” task. This new formulation helps
trigger pretrained LMs to produce general
knowledge about the downstream tasks. The
new formulation implies that the KG com-
pletion can serve as a valuable knowledge
benchmark for pretrained LMs, in addition
to benchmarks such as LAMA (Petroni et al.,
2019) and KILT (Petroni et al.,2021).
We introduce a parameter-lite encoder to spec-
ify general model knowledge to different KG
completion tasks. This encoder contains a few
parameters for providing additional context
and calibrating biased knowledge according
to the task. The module is applicable to other
deep LMs.
We obtain state-of-the-art or competitive per-
formance on five KG completion datasets
spanning two tasks: link prediction and triplet
classification. We achieve this via learning
only 1% of the parameters compared to the
fully finetuning approaches. In addition, com-
pared to task-specific KG completion models,
PALT reaches competitiveness with a unified
architecture for all tasks.
2 PALT
We propose parameter-lite transfer learning, called
PALT, as an alternative to finetuning for knowl-
edge graph (KG) completion. Instead of finetuning
which modifies all the language model (LM) pa-
rameters and stores a new copy for each task, this
method is lightweight for KG completion, which
keeps original LM parameters frozen, but only
tunes a small number of newly added parameters.
The intuition is that LMs have stored factual knowl-
edge during the pretraining, and we need to prop-
erly elicit the relevant knowledge for downstream
tasks without much modification to the original
LMs. To do so, PALT first casts KG completion
into a “fill-in-the-blank” task (Sec. 2.1), and then
introduces a parameter-lite encoder consisting of
a few trainable parameters, while parameters of
the original network remain fixed (Sec. 2.2). The
overall architecture of PALT is shown in Figure 1.
2.1 Knowledge Graph Completion as
Fill-in-the-Blank
We reformulate KG completion as a fill-in-the-
blank task. The basic idea of this task formulation
is that pretrained LMs are able to answer questions
Transformer Layers x6
CLS
Transformer Layers x6
Knowledge Prompt Encoder
Positive
is a____________
SEP SEP P0Pn
···
Chaplin
Knowledge Calibration Encoder
BERT
Answer:
screenwriter
Fill-in-the-Blank
Parameter-Lite Encoder
Figure 1: Summary of our approach PALT. Compared
to finetuning, PALT is a parameter-lite alternative to
transfer the knowledge that pretrained language mod-
els know about knowledge graph completion. Our ap-
proach first casts knowledge graph completion into a
fill-in-the-blank task. This formulation enables pre-
trained language models to produce general knowledge
for knowledge graph completion. By introducing a
few trainable parameters via a parameter-lite encoder
(in the dashed box), PALT further adapts the general
knowledge in language models to different knowledge
graph completion tasks without modifying the original
language model parameters (in grey).
formatted in cloze-style statements, and having a
proper context helps to trigger LMs to produce
general knowledge for the task of interest. For ex-
ample, the KG completion task aims to predict the
missing entity in a fact (Chaplin, profession, __),
which is closely related to a cloze statement. We
therefore frame the KG completion as “fill-in-the-
blank” cloze statements. In this case, “Chaplin is
a” provides the proper context for LMs to elicit
the correct answer “screenwriter” that is generally
relevant to the task.
In more detail, a fact is in the form of (head,
relation, tail) or in short (h, r, t). The LM needs
to predict a missing entity. A typical KG com-
pletion task provides a partial fact (h, r, __) and
a set of candidate answers for the missing entity.
To perform this task, at test time, we convert (h,
r, t
0
)into a cloze statement, where t
0
indicates an
answer candidate for filling the blank. For exam-
ple, given a partial fact (Chaplin, profession, __),
an LM needs to fill in the blank of the cloze state-
ment “Charlie is a __” by providing it as the model
input. In our case, a candidate answer (Chaplin,
profession, screenwriter) is given (e.g., “screen-
writer” is one of the candidates), the corresponding
cloze statement will turn into “
[CLS]
Chaplin is a
[SEP]
screenwriter
[SEP]
” (Figure 1). We use this
statement as an input to a pretrained LM.
[CLS]
and
[SEP]
are special tokens of the pretrained LMs,
e.g., BERT. “Chaplin” is the head entity name or
description. “is a” is relation name or description.
“screenwriter” is the candidate tail entity name or
2
description. Sec 3.1 includes resources for obtain-
ing the entity or relation descriptions.
2.2 Parameter-Lite Encoder
While the new formulation helps pretrained LMs to
provide general knowledge about the tasks, down-
stream tasks often rely on task-specific or domain-
specific knowledge. To adapt the general knowl-
edge in pretrained LMs to various KG completion
tasks, we introduce a parameter-lite encoder includ-
ing two groups of parameters: (i) a prompt encoder
serving as the additional task-specific context in
the cloze statement, and (ii) contextual calibration
encoders aiming to mitigate model’s bias towards
general answers. The encoder is added on top of
the original LM network whose parameters remain
frozen during tuning.
Knowledge Prompt Encoder
Beyond general
context from the task formulation, we believe that
task-specific context helps better recall the knowl-
edge of interest in pretrained LMs. For example,
if we want the LM to produce the correct answer
“screenwriter” for “Charlie is a __”, a task-specific
prefix such as “profession” in the context will help.
The LM will then assign a higher probability to
“screenwriter” as the correct answer. In other words,
we want to find a task-specific context that better
steers the LM to produce task-specific predictions.
Intuitively, the task-specific tokens influence the
encoding of the context, thus impacting the an-
swer predictions. However, it is non-trivial to find
such task-specific tokens. For example, manually
writing these tokens is not only time consuming,
but also unclear whether it is optimal for our task.
Therefore, we design a learnable and continuous
prompt encoder.
Specifically, we use “virtual” prompt tokens as
continuous word embeddings. As shown in Fig-
ure 1, we append these prompt tokens to differ-
ent positions in the context. The embeddings of
prompt tokens are randomly initialized and are up-
dated during training. To allow more flexibility
in context learning, we add a linear layer with a
skip connection on top of the embedding layer to
project the original token embeddings to another
subspace. This projection enables learning a more
tailored task-specific context that better aligns with
LM’s knowledge. The knowledge prompt encoder
is defined in Eq. 1.
e0
i=Wpei+bp+ei(1)
where
e0
i
denotes the virtual token embedding, and
ei
denotes the input token embedding.
Wp
and
bp
are the tunable weight and bias of the prompt
encoder. The knowledge prompt encoder provides
task-specific context for KG completion as it is
tuned on task-specific training data.
Knowledge Calibration Encoder
Another
main pitfall of pretrained LMs is that they tend
to be biased towards common answers in their
pretraining corpus. For example, the model
prefers “United States” over “Georgia” for the
birth place of a person, which is suboptimal
for KG completion. We actually view this as a
shift between the pretraining distribution and the
distribution of downstream tasks.
We counteract such biases by calibrating the out-
put distribution of pretrained LMs. Concretely, we
introduce task-specific calibration parameters be-
tween Transformer layers of LMs (Figure 1) to
gradually align the pretraining distribution with
the downstream distribution. We choose a linear
encoder with a skip connection to capture the dis-
tribution shifts, as shown in Eq. 2.
h0
i=Wchi+bc+hi(2)
where
h0
i
is the calibrated hidden state, and
hi
is
the hidden state of a Transformer layer.
Wc
and
bc
are the tunable weight and bias of the knowledge
calibration encoder.
Training and Inference
We keep all LM param-
eters fixed and only tune the parameters in the
parameter-lite encoder. After formatting the KG
completion tasks following our formulation, a can-
didate fact is in the standard sentence pair format of
BERT. For example, the candidate (Chaplin, profes-
sion, screenwriter) is formulated as “
[CLS]
Chap-
lin is a
[SEP]
screenwriter
[SEP]
”. “Chaplin is
a” is the first sentence as the cloze-style question,
while the second sentence is “screenwriter” imply-
ing an answer candidate. LM then decides whether
the second sentence is a correct answer to the ques-
tion or not. This naturally aligns with the next
sentence prediction (NSP) task of BERT, which
outputs a positive label if the answer is correct; oth-
erwise negative. Therefore, we directly utilize the
next sentence prediction to perform KG completion
thanks to our formulation.
The training objective is to decide whether the
second sentence is the correct next sentence to the
first sentence. The small number of tunable param-
3
摘要:

PALT:Parameter-LiteTransferofLanguageModelsforKnowledgeGraphCompletionJianhaoShen1,ChenguangWang2y,YeYuan1,JiaweiHan3HengJi3,KoushikSen4,MingZhang1y,DawnSong4y1PekingUniversity,2WashingtonUniversityinSt.Louis,3UniversityofIllinoisatUrbana-Champaign,4UCBerkeley{jhshen,yuanye_pku,mzhang_cs}@pku.edu.cn...

展开>> 收起<<
PALT Parameter-Lite Transfer of Language Models for Knowledge Graph Completion Jianhao Shen1 Chenguang Wang2y Ye Yuan1 Jiawei Han3.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:15 页 大小:2.21MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注