PALT Parameter-Lite Transfer of Language Models for Knowledge Graph Completion Jianhao Shen1 Chenguang Wang2y Ye Yuan1 Jiawei Han3

2025-05-06 0 0 2.21MB 15 页 10玖币

侵权投诉

PALT: Parameter-Lite Transfer of Language Models for

Knowledge Graph Completion

Jianhao Shen1, Chenguang Wang2†, Ye Yuan1, Jiawei Han3

Heng Ji3,Koushik Sen4,Ming Zhang1†,Dawn Song4†

1Peking University, 2Washington University in St. Louis,

3University of Illinois at Urbana-Champaign, 4UC Berkeley

{jhshen,yuanye_pku,mzhang_cs}@pku.edu.cn,chenguangwang@wustl.edu,

{hanj,hengji}@illinois.edu,{ksen,dawnsong}@berkeley.edu

Abstract

This paper presents a parameter-lite transfer

learning approach of pretrained language mod-

els (LM) for knowledge graph (KG) comple-

tion. Instead of ﬁnetuning, which modiﬁes

all LM parameters, we only tune a few new

parameters while keeping the original LM pa-

rameters ﬁxed. We establish this via reformu-

lating KG completion as a “ﬁll-in-the-blank”

task, and introducing a parameter-lite encoder

on top of the original LMs. We show that,

by tuning far fewer parameters than ﬁnetuning,

LMs transfer non-trivially to most tasks and

reach competitiveness with prior state-of-the-

art approaches. For instance, we outperform

the fully ﬁnetuning approaches on a KG com-

pletion benchmark by tuning only 1% of the

parameters.1

1 Introduction

Pretrained language models (LM) such as BERT

and GPT-3 have enabled downstream transfer (De-

vlin et al.,2019;Brown et al.,2020). Recent stud-

ies (Petroni et al.,2019;Jiang et al.,2020;He et al.,

2021) show that the implicit knowledge learned

during pretraining is the key to success. Among

different transfer learning techniques (Shin et al.,

2020;Liu et al.,2021a,b;Houlsby et al.,2019;

Devlin et al.,2019), ﬁnetuning is the de facto

paradigm to adapt the knowledge to downstream

NLP tasks. Knowledge graph (KG) completion

is a typical knowledge-intensive application. For

example, given a fact (Chaplin, profession, __)

missing an entity, it aims to predict the correct en-

tity “screenwriter”. This task provides a natural

testbed to evaluate the knowledge transfer ability

of different transfer learning approaches.

Finetuning (Yao et al.,2019;Shen et al.,2022)

has been recently adopted to advance the KG com-

†Corresponding authors.

The code and datasets are available at

https://github.

com/yuanyehome/PALT.

pletion performance. However, it presents two fun-

damental limitations. First, ﬁnetuning is computa-

tionally inefﬁcient, requiring updating all param-

eters of the pretrained LMs. This ends up with

an entirely new model for each KG completion

task. For example, storing a full copy of pretrained

BERT

LARGE

(340M parameters) for each task is

non-trivial, not to mention the billion parameter

LMs. Second, the ﬁnetuning approaches often

rely on task-speciﬁc architectures for various KG

completion tasks. For instance, KG-BERT (Yao

et al.,2019) designs different model architectures

to adapt a pretrained BERT to different tasks. This

restricts its usability in more downstream tasks.

In this work, we enable parameter-lite transfer of

the pretrained LMs to knowledge-intensive tasks,

with a focus on KG completion. As an alternative

to ﬁnetuning, our method, namely PALT, tunes

no existing LM parameters. We establish this by

casting the KG completion into a “ﬁll-in-the-blank”

task. This formulation enables eliciting general

knowledge about KG completion from pretrained

LMs. By introducing a parameter-lite encoder con-

sisting of a few trainable parameters, we efﬁciently

adapt the general model knowledge to downstream

tasks. The parameters of the original LM network

remain ﬁxed during the adaptation process for dif-

ferent KG completion tasks. In contrast to ﬁne-

tuning which modiﬁes all LM parameters, PALT

is lightweight. Instead of designing task-speciﬁc

model architectures, PALT stays with the same

model architecture for all KG completion tasks that

we evaluate.

The contributions are as follows:

•

We propose parameter-lite transfer learning

for pretrained LMs to adapt their knowledge

to KG completion. The reach of the results

is vital for broad knowledge-intensive NLP

applications.

•

We reformulate KG completion as a “ﬁll-in-

arXiv:2210.13715v1 [cs.CL] 25 Oct 2022

the-blank” task. This new formulation helps

trigger pretrained LMs to produce general

knowledge about the downstream tasks. The

new formulation implies that the KG com-

pletion can serve as a valuable knowledge

benchmark for pretrained LMs, in addition

to benchmarks such as LAMA (Petroni et al.,

2019) and KILT (Petroni et al.,2021).

•

We introduce a parameter-lite encoder to spec-

ify general model knowledge to different KG

completion tasks. This encoder contains a few

parameters for providing additional context

and calibrating biased knowledge according

to the task. The module is applicable to other

deep LMs.

•

We obtain state-of-the-art or competitive per-

formance on ﬁve KG completion datasets

spanning two tasks: link prediction and triplet

classiﬁcation. We achieve this via learning

only 1% of the parameters compared to the

fully ﬁnetuning approaches. In addition, com-

pared to task-speciﬁc KG completion models,

PALT reaches competitiveness with a uniﬁed

architecture for all tasks.

2 PALT

We propose parameter-lite transfer learning, called

PALT, as an alternative to ﬁnetuning for knowl-

edge graph (KG) completion. Instead of ﬁnetuning

which modiﬁes all the language model (LM) pa-

rameters and stores a new copy for each task, this

method is lightweight for KG completion, which

keeps original LM parameters frozen, but only

tunes a small number of newly added parameters.

The intuition is that LMs have stored factual knowl-

edge during the pretraining, and we need to prop-

erly elicit the relevant knowledge for downstream

tasks without much modiﬁcation to the original

LMs. To do so, PALT ﬁrst casts KG completion

into a “ﬁll-in-the-blank” task (Sec. 2.1), and then

introduces a parameter-lite encoder consisting of

a few trainable parameters, while parameters of

the original network remain ﬁxed (Sec. 2.2). The

overall architecture of PALT is shown in Figure 1.

2.1 Knowledge Graph Completion as

Fill-in-the-Blank

We reformulate KG completion as a ﬁll-in-the-

blank task. The basic idea of this task formulation

is that pretrained LMs are able to answer questions

Transformer Layers x6

CLS

Transformer Layers x6

Knowledge Prompt Encoder

Positive

is a____________

SEP SEP P0Pn

···

Chaplin

Knowledge Calibration Encoder

BERT

Answer:

screenwriter

Fill-in-the-Blank

Parameter-Lite Encoder

Figure 1: Summary of our approach PALT. Compared

to ﬁnetuning, PALT is a parameter-lite alternative to

transfer the knowledge that pretrained language mod-

els know about knowledge graph completion. Our ap-

proach ﬁrst casts knowledge graph completion into a

ﬁll-in-the-blank task. This formulation enables pre-

trained language models to produce general knowledge

for knowledge graph completion. By introducing a

few trainable parameters via a parameter-lite encoder

(in the dashed box), PALT further adapts the general

knowledge in language models to different knowledge

graph completion tasks without modifying the original

language model parameters (in grey).

formatted in cloze-style statements, and having a

proper context helps to trigger LMs to produce

general knowledge for the task of interest. For ex-

ample, the KG completion task aims to predict the

missing entity in a fact (Chaplin, profession, __),

which is closely related to a cloze statement. We

therefore frame the KG completion as “ﬁll-in-the-

blank” cloze statements. In this case, “Chaplin is

a” provides the proper context for LMs to elicit

the correct answer “screenwriter” that is generally

relevant to the task.

In more detail, a fact is in the form of (head,

relation, tail) or in short (h, r, t). The LM needs

to predict a missing entity. A typical KG com-

pletion task provides a partial fact (h, r, __) and

a set of candidate answers for the missing entity.

To perform this task, at test time, we convert (h,

r, t

)into a cloze statement, where t

indicates an

answer candidate for ﬁlling the blank. For exam-

ple, given a partial fact (Chaplin, profession, __),

an LM needs to ﬁll in the blank of the cloze state-

ment “Charlie is a __” by providing it as the model

input. In our case, a candidate answer (Chaplin,

profession, screenwriter) is given (e.g., “screen-

writer” is one of the candidates), the corresponding

cloze statement will turn into “

[CLS]

Chaplin is a

[SEP]

screenwriter

[SEP]

” (Figure 1). We use this

statement as an input to a pretrained LM.

[CLS]

and

[SEP]

are special tokens of the pretrained LMs,

e.g., BERT. “Chaplin” is the head entity name or

description. “is a” is relation name or description.

“screenwriter” is the candidate tail entity name or

description. Sec 3.1 includes resources for obtain-

ing the entity or relation descriptions.

2.2 Parameter-Lite Encoder

While the new formulation helps pretrained LMs to

provide general knowledge about the tasks, down-

stream tasks often rely on task-speciﬁc or domain-

speciﬁc knowledge. To adapt the general knowl-

edge in pretrained LMs to various KG completion

tasks, we introduce a parameter-lite encoder includ-

ing two groups of parameters: (i) a prompt encoder

serving as the additional task-speciﬁc context in

the cloze statement, and (ii) contextual calibration

encoders aiming to mitigate model’s bias towards

general answers. The encoder is added on top of

the original LM network whose parameters remain

frozen during tuning.

Knowledge Prompt Encoder

Beyond general

context from the task formulation, we believe that

task-speciﬁc context helps better recall the knowl-

edge of interest in pretrained LMs. For example,

if we want the LM to produce the correct answer

“screenwriter” for “Charlie is a __”, a task-speciﬁc

preﬁx such as “profession” in the context will help.

The LM will then assign a higher probability to

“screenwriter” as the correct answer. In other words,

we want to ﬁnd a task-speciﬁc context that better

steers the LM to produce task-speciﬁc predictions.

Intuitively, the task-speciﬁc tokens inﬂuence the

encoding of the context, thus impacting the an-

swer predictions. However, it is non-trivial to ﬁnd

such task-speciﬁc tokens. For example, manually

writing these tokens is not only time consuming,

but also unclear whether it is optimal for our task.

Therefore, we design a learnable and continuous

prompt encoder.

Speciﬁcally, we use “virtual” prompt tokens as

continuous word embeddings. As shown in Fig-

ure 1, we append these prompt tokens to differ-

ent positions in the context. The embeddings of

prompt tokens are randomly initialized and are up-

dated during training. To allow more ﬂexibility

in context learning, we add a linear layer with a

skip connection on top of the embedding layer to

project the original token embeddings to another

subspace. This projection enables learning a more

tailored task-speciﬁc context that better aligns with

LM’s knowledge. The knowledge prompt encoder

is deﬁned in Eq. 1.

i=Wpei+bp+ei(1)

where

denotes the virtual token embedding, and

denotes the input token embedding.

and

are the tunable weight and bias of the prompt

encoder. The knowledge prompt encoder provides

task-speciﬁc context for KG completion as it is

tuned on task-speciﬁc training data.

Knowledge Calibration Encoder

Another

main pitfall of pretrained LMs is that they tend

to be biased towards common answers in their

pretraining corpus. For example, the model

prefers “United States” over “Georgia” for the

birth place of a person, which is suboptimal

for KG completion. We actually view this as a

shift between the pretraining distribution and the

distribution of downstream tasks.

We counteract such biases by calibrating the out-

put distribution of pretrained LMs. Concretely, we

introduce task-speciﬁc calibration parameters be-

tween Transformer layers of LMs (Figure 1) to

gradually align the pretraining distribution with

the downstream distribution. We choose a linear

encoder with a skip connection to capture the dis-

tribution shifts, as shown in Eq. 2.

i=Wchi+bc+hi(2)

where

is the calibrated hidden state, and

the hidden state of a Transformer layer.

and

are the tunable weight and bias of the knowledge

calibration encoder.

Training and Inference

We keep all LM param-

eters ﬁxed and only tune the parameters in the

parameter-lite encoder. After formatting the KG

completion tasks following our formulation, a can-

didate fact is in the standard sentence pair format of

BERT. For example, the candidate (Chaplin, profes-

sion, screenwriter) is formulated as “

[CLS]

Chap-

lin is a

[SEP]

screenwriter

[SEP]

”. “Chaplin is

a” is the ﬁrst sentence as the cloze-style question,

while the second sentence is “screenwriter” imply-

ing an answer candidate. LM then decides whether

the second sentence is a correct answer to the ques-

tion or not. This naturally aligns with the next

sentence prediction (NSP) task of BERT, which

outputs a positive label if the answer is correct; oth-

erwise negative. Therefore, we directly utilize the

next sentence prediction to perform KG completion

thanks to our formulation.

The training objective is to decide whether the

second sentence is the correct next sentence to the

ﬁrst sentence. The small number of tunable param-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PALT:Parameter-LiteTransferofLanguageModelsforKnowledgeGraphCompletionJianhaoShen1,ChenguangWang2y,YeYuan1,JiaweiHan3HengJi3,KoushikSen4,MingZhang1y,DawnSong4y1PekingUniversity,2WashingtonUniversityinSt.Louis,3UniversityofIllinoisatUrbana-Champaign,4UCBerkeley{jhshen,yuanye_pku,mzhang_cs}@pku.edu.cn...

展开>> 收起<<

PALT Parameter-Lite Transfer of Language Models for Knowledge Graph Completion Jianhao Shen1 Chenguang Wang2y Ye Yuan1 Jiawei Han3.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

PALT Parameter-Lite Transfer of Language Models for Knowledge Graph Completion Jianhao Shen1 Chenguang Wang2y Ye Yuan1 Jiawei Han3

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: