Bi-Link Bridging Inductive Link Predictions from Text via Contrastive Learning of Transformers and Prompts

2025-05-06 0 0 1.15MB 10 页 10玖币

侵权投诉

Bi-Link: Bridging Inductive Link Predictions from Text via

Contrastive Learning of Transformers and Prompts

Bohua Peng

Modelbest

London, UK

bp1119@ic.ac.uk

Shihao Liang

Tsinghua University

Beijing, China

shihaoliang0828@163.com

Mobarakol Islam

University College London

London, UK

mobarakol.islam@ucl.ac.uk

ABSTRACT

Inductive knowledge graph completion requires models to compre-

hend the underlying semantics and logic patterns of relations. With

the advance of pretrained language models, recent research have

designed transformers for link prediction tasks. However, empirical

studies show that linearizing triples aects the learning of relational

patterns, such as inversion and symmetry. In this paper, we pro-

pose Bi-Link, a contrastive learning framework with probabilistic

syntax prompts for link predictions. Using grammatical knowledge

of BERT, we eciently search for relational prompts according

to learnt syntactical patterns that generalize to large knowledge

graphs. To better express symmetric relations, we design a symmet-

ric link prediction model, establishing bidirectional linking between

forward prediction and backward prediction. This bidirectional link-

ing accommodates exible self-ensemble strategies at test time. In

our experiments, Bi-Link outperforms recent baselines on link pre-

diction datasets (WN18RR, FB15K-237, and Wikidata5M). Further-

more, we construct Zeshel-Ind as an in-domain inductive entity link-

ing the environment to evaluate Bi-Link. The experimental results

demonstrate that our method yields robust representations which

can generalize under domain shift. Our code and dataset are publicly

available at https://anonymous.4open.science/r/Bi-Link-2277/.

CCS CONCEPTS

•Information Systems →Information Retrieval.

KEYWORDS

Knowledge graph completion, entity linking, entity description,

PLMs, contrastive learning, prompt tuning.

1 INTRODUCTION

Knowledge graphs are structured fact databases representing en-

tities as nodes and relations as edges. With open-end incoming

data, automatically completing knowledge graphs is an a-billion-

dollar problem for knowledge-intensive tasks, such as question

answering [

] and dialogue systems [

]. As a particularly popu-

lar paradigm, TransE [

] initially proposes an additive model for

knowledge representations. Despite its simplicity, TransE has mod-

elled relational patterns such as inversion and composition. Com-

plEx [

] represents symmetric relations with a real embedding

space and an imaginary embedding space. With a multiplicative

model, RotatE [

] has extended expressiveness to most general

patterns, including antisymmetry and reexiveness.

However, link prediction in the real world is an inductive learn-

ing process, as shown in Figure 1, where models are not only re-

quired to understand logical patterns but also reason on unseen

entities. To perform well in the inductive setting, models should

Louis Pasteur

Louis Pasteur was a French chemist and

microbiologist renowned for his discovery

of the principles of vaccination, microbial

fermentation and pasteurization …

Fermentation Theory

... refers to the historical study of models

of natural fermentation processes,

especially alcoholic and lactic acid ...

France

France, in Western Europe, encompasses

medieval cities, alpine villages and ...

Mediterranean beaches. Paris, its capital,

is famed for its fashion houses ...

Notre-Dame de Paris

Notre-Dame de Paris is a medieval

Catholic cathedral ... This landmark is

in the 4th arrondissement of Paris …

place of burial

country

Charles Darwin

... is an English naturalist, geologist, and

biologist. His proposition that all species

of life have descended from a common

ancestor is a fundamental concept in ...

United Kingdom

The United Kingdom, made up of England

, Scotland, Wales and Northern Ireland,

is an island nation in northwestern Europe

Evolutionary Theory

... change in the heritable characteristics

of biological populations over successive

generations ...

Westminster Abbey

Westminster Abbey is a Gothic abbey

church in the City of Westminster, London

England. It is one of the UK's most notable

religious buildings and a burial site for ...

citizenship

country

place of burial

(a) Training subgraph (b) Inductive test subgraph

published by

citizenship

Figure 1: A toy example of inductive link predictions in

knowledge graphs. The training subgraph and test subgraph

have mutual relations but disjoint entities. Entity descrip-

tions can help structural generalization from entities of the

training subgraph (purple) to unseen entities of the test sub-

graph (blue).

command the relational semantics of knowledge graphs, i.e., the

logical rules among the relations. For example, an intelligent model

should have an induction ability to put entities in logical frames, as

follows

∃𝑌 .(𝑋, 𝑠𝑜𝑛_𝑜 𝑓 , 𝑌 )∧(𝑍, 𝑑𝑎𝑢𝑔𝑡ℎ𝑒𝑟 _𝑜 𝑓 , 𝑌 )

→ (𝑋, 𝑠𝑖𝑏𝑙𝑖𝑛𝑔_𝑜 𝑓 , 𝑍 )(1)

DKRL [

], a pioneering inductive link prediction method, proposes

to ground these logical rules from entity descriptions. KEPLER [

]

incorporates the logic of triples and semantics of entity descrip-

tion encoded with the advance of pre-trained language models

(PLM). The loss function is a linear combination of TransE’s loss

and masked-language modelling loss. Unfortunately, the method

converges unexpectedly slow and shows sensitivity to noise in en-

tity descriptions. With highly ecient bi-encoders, SimKGC [

]

eectively compares knowledge representations using normalized

contrastive loss [

]. With very fast convergence, this simple method

completes link prediction on FB15k-237 with

∼

10min, whereas KE-

PLER needs ∼20h.

However, empirical studies show that the recent contrastive link

prediction method heavily relies on the semantic match and ignores

the underlying graphical structure. We hypothesize awed initial-

ization is the culprit of poor contrastive representation learning.

arXiv:2210.14463v1 [cs.CL] 26 Oct 2022

WWW ’23, April 30–May 04, 2023, Austin, Texas Bohua Peng, Shihao Liang, Mobarakol Islam.

In other words, it is hard for PLMs to estimate the similarity be-

tween linearized relational expression and entity, e.g., comparing

“X inverse sibling of” with “Z”.

This phenomenon reveals at least two problems. First, dier-

ent semantics need dierent ways to express inversion. Motivated

by recent prompt-tuning studies [

], we generate ruled-based

prompts to improve the uency of the relational expression un-

der dierent types of syntax. Using syntax as the principle to nd

prompts not only improves interpretability but also reduces the

diculty of the prompt search, as there are not too many grammar

rules in English. Specically, we netune PLMs on the part-of-

speech tagging task to encode grammar and text into the same

semantics space. Using the embeddings of PLMs, in Figure 2, we

train a multi-layer perceptron to predict the syntactical patterns in

a smooth low-dimensional space. Then a Gaussian mixture model

indexes a pair of reversible prompts for forward and backward

link predictions. We combine the prompts and unnished edges as

relational expressions to better netune bi-encoders for inductive

link predictions. The parameters of the MLP and bi-encoders are

updated with the expectation maximization (EM) algorithm [14].

Another problem is that recent contrastive representation learn-

ing devastatingly aects symmetric relation modelling. Symmetry

has deeply rooted foundations in neural computing [

]. To improve

expressiveness on symmetry, we introduce bidirectional linking

between relational bi-encoders (Figure 3), briey known as Bi-Link.

Given relational expressions, Bi-Link understands a triple from

both directions, getting better comprehension for symmetric rela-

tions like “sibling”. Interestingly, the bidirectional linking learnt

in training accommodates exible self-ensembling options for test-

time augmentation. Bi-Link outperforms recent baselines in our

experiments on transductive and inductive link prediction tasks.

Bi-Link can be applied to knowledge-intensive tasks with mini-

mal modication. With rich lexical and syntactical semantics, the

entity linking task ranks all candidate documents to predict a link

from a named entity mention in context to its referent entity doc-

ument. Recently, Zeshel [

] has greatly supported the zero-shot

evaluation of PLMs with an entity linking corpus beyond factual

knowledge bases intensively used for pretrainig. However, Zeshel

combines the tasks of zero-shot learning and unsupervised domain

adaptation. Technically, this has caused a de facto pain to error

analysis of dierent models and training methods. In this work, we

create an in-domain zero-shot entity linking benchmark, Zeshel-

Ind, with data of Fandom. To adapt our model to zero-shot entity

linking, we use shared soft prompts for mention spans instead of

prompts on both sides. Besides recent contrastive techniques used

in SimCSE [

] and SimKGC [

], we opt to share negative candi-

dates retrieved by BM25 [

] among in-batch samples. Our methods

have achieved competitive results on both Zeshel-Ind and Zeshel.

Our experiments also show interesting behaviour of contrastive

learning toward domain shift.

In summary, our major contributions are threefold: 1.) We propose

a probabilistic syntactic prompt method to verbalize the unnished

edge into natural relational expressions with a generalizable light-

weight model. 2.)We design symmetric relational encoders, Bi-Link,

for text-based knowledge graph link predictions, and adapt it to

entity linking tasks. 3.)We build a new open-source benchmark,

Zeshel-Ind as a fully-inductive reection of Zeshel for in-domain

zero-shot performance evaluation. We extensively validate Bi-Link

with several recent baselines on Zeshel and Zeshel-Ind.

2 RELATED WORK

Contrastive knowledge representation.

Contrastive knowledge

representation Inspired by NCE [

] principle, CPC [

] and Sim-

CLR [

] are particularly popular contrastive learning paradigms

that learn robust representation with noisy negative samples. For

semantic textual similarity tasks (STS), SimCSE [

] signicantly

simplies previous contrastive sentence embedding methods using

bi-transformers. PromptBERT [

] has further improved the results

with template denoising. However, empirical studies [

] show that

SimCSE is not helpful for entity linking. Using bi-transformers to

contrast document representations is still an ongoing research di-

rection. Finding proper negative samples is crucial for contrastive

learning [

]. GPL [

] automatically nd high-quality negative

samples with a pipeline, including T5 [

], dense retriever [

]

and a cross-encoder [

]. In this work, we generalize a lightweight

syntactic prompt generator learnt on a subgraph to a large knowl-

edge graph. The prompts improve the quality of negative samples

by transferring unnished edges to approximate relational expres-

sions.

Retrieval with pretrained transformers.

Bi-encoders indepen-

dently map queries and documents to a shared semantic space

to eciently compute their similarity scores. By contrast, cross-

encoders [

] broadcast a query and concatenate it with all possible

documents, predicting relevance scores with cross-attention be-

tween the query and documents. Previous work [

] has shown

that cross-encoders can produce more robust representations and

achieve better results. But the cumbersome computational over-

head harshly increases the inference time. To address this problem,

ColBERT [

] and TwinBERT [

] concurrently propose hybrid

networks consisting of bi-encoders and cross-attention layers. As a

result, these works have facilitated real-world search engines by

signicantly reducing the computational burden while retaining

the performance. On the ip side, the late-stage interaction requires

additional training data and strategy. Our work is dierent from

the previous work because our prompt-based bi-encoders can be

useful when queries are fragmented words.

3 METHODS

Given a directed knowledge graph

G=(𝑉 , 𝐸, 𝑅)

with

|𝑉|

entities,

|𝐸|

observed edges, and

|𝑅|

relation types, our transductive link

prediction task is to infer a missing edge

𝑒

described as

(ℎ, 𝑟,

)

its inverse version

𝑒𝑖𝑛𝑣

described as

(

, 𝑟, 𝑡 )

. As a logical operator, a

triple

(ℎ, 𝑟, 𝑡 )

consists of a head entity, relation and a tail entity. In

this work, we present Bi-Link, a symmetric framework that learns

knowledge representations from entity description and reversible

relational text. As the example shown in Figure 2, we enrich

with inverse relational textual information phrased by probabilistic

rule-based prompt expressions. Then we encode relational informa-

tion with a siamese network, Bi-Link (Figure 3). The grammatical

module and the Bi-Link network are updated with EM algorithm.

We use this general framework for transductive link prediction,

fully-inductive link prediction [

], inductive named entity linking.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Bi-Link:BridgingInductiveLinkPredictionsfromTextviaContrastiveLearningofTransformersandPromptsBohuaPengModelbestLondon,UKbp1119@ic.ac.ukShihaoLiangTsinghuaUniversityBeijing,Chinashihaoliang0828@163.comMobarakolIslamUniversityCollegeLondonLondon,UKmobarakol.islam@ucl.ac.ukABSTRACTInductiveknowledgegr...

展开>> 收起<<

Bi-Link Bridging Inductive Link Predictions from Text via Contrastive Learning of Transformers and Prompts.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Bi-Link Bridging Inductive Link Predictions from Text via Contrastive Learning of Transformers and Prompts

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: