Bi-Link Bridging Inductive Link Predictions from Text via Contrastive Learning of Transformers and Prompts

2025-05-06 0 0 1.15MB 10 页 10玖币
侵权投诉
Bi-Link: Bridging Inductive Link Predictions from Text via
Contrastive Learning of Transformers and Prompts
Bohua Peng
Modelbest
London, UK
bp1119@ic.ac.uk
Shihao Liang
Tsinghua University
Beijing, China
shihaoliang0828@163.com
Mobarakol Islam
University College London
London, UK
mobarakol.islam@ucl.ac.uk
ABSTRACT
Inductive knowledge graph completion requires models to compre-
hend the underlying semantics and logic patterns of relations. With
the advance of pretrained language models, recent research have
designed transformers for link prediction tasks. However, empirical
studies show that linearizing triples aects the learning of relational
patterns, such as inversion and symmetry. In this paper, we pro-
pose Bi-Link, a contrastive learning framework with probabilistic
syntax prompts for link predictions. Using grammatical knowledge
of BERT, we eciently search for relational prompts according
to learnt syntactical patterns that generalize to large knowledge
graphs. To better express symmetric relations, we design a symmet-
ric link prediction model, establishing bidirectional linking between
forward prediction and backward prediction. This bidirectional link-
ing accommodates exible self-ensemble strategies at test time. In
our experiments, Bi-Link outperforms recent baselines on link pre-
diction datasets (WN18RR, FB15K-237, and Wikidata5M). Further-
more, we construct Zeshel-Ind as an in-domain inductive entity link-
ing the environment to evaluate Bi-Link. The experimental results
demonstrate that our method yields robust representations which
can generalize under domain shift. Our code and dataset are publicly
available at https://anonymous.4open.science/r/Bi-Link-2277/.
CCS CONCEPTS
Information Systems Information Retrieval.
KEYWORDS
Knowledge graph completion, entity linking, entity description,
PLMs, contrastive learning, prompt tuning.
1 INTRODUCTION
Knowledge graphs are structured fact databases representing en-
tities as nodes and relations as edges. With open-end incoming
data, automatically completing knowledge graphs is an a-billion-
dollar problem for knowledge-intensive tasks, such as question
answering [
19
] and dialogue systems [
25
]. As a particularly popu-
lar paradigm, TransE [
2
] initially proposes an additive model for
knowledge representations. Despite its simplicity, TransE has mod-
elled relational patterns such as inversion and composition. Com-
plEx [
30
] represents symmetric relations with a real embedding
space and an imaginary embedding space. With a multiplicative
model, RotatE [
27
] has extended expressiveness to most general
patterns, including antisymmetry and reexiveness.
However, link prediction in the real world is an inductive learn-
ing process, as shown in Figure 1, where models are not only re-
quired to understand logical patterns but also reason on unseen
entities. To perform well in the inductive setting, models should
Louis Pasteur
Louis Pasteur was a French chemist and
microbiologist renowned for his discovery
of the principles of vaccination, microbial
fermentation and pasteurization …
Fermentation Theory
... refers to the historical study of models
of natural fermentation processes,
especially alcoholic and lactic acid ...
France
France, in Western Europe, encompasses
medieval cities, alpine villages and ...
Mediterranean beaches. Paris, its capital,
is famed for its fashion houses ...
Notre-Dame de Paris
Notre-Dame de Paris is a medieval
Catholic cathedral ... This landmark is
in the 4th arrondissement of Paris …
place of burial
country
Charles Darwin
... is an English naturalist, geologist, and
biologist. His proposition that all species
of life have descended from a common
ancestor is a fundamental concept in ...
United Kingdom
The United Kingdom, made up of England
, Scotland, Wales and Northern Ireland,
is an island nation in northwestern Europe
Evolutionary Theory
... change in the heritable characteristics
of biological populations over successive
generations ...
Westminster Abbey
Westminster Abbey is a Gothic abbey
church in the City of Westminster, London
England. It is one of the UK's most notable
religious buildings and a burial site for ...
citizenship
country
place of burial
(a) Training subgraph (b) Inductive test subgraph
published by
published by
citizenship
?
Figure 1: A toy example of inductive link predictions in
knowledge graphs. The training subgraph and test subgraph
have mutual relations but disjoint entities. Entity descrip-
tions can help structural generalization from entities of the
training subgraph (purple) to unseen entities of the test sub-
graph (blue).
command the relational semantics of knowledge graphs, i.e., the
logical rules among the relations. For example, an intelligent model
should have an induction ability to put entities in logical frames, as
follows
𝑌 .(𝑋, 𝑠𝑜𝑛_𝑜 𝑓 , 𝑌 )∧(𝑍, 𝑑𝑎𝑢𝑔𝑡ℎ𝑒𝑟 _𝑜 𝑓 , 𝑌 )
→ (𝑋, 𝑠𝑖𝑏𝑙𝑖𝑛𝑔_𝑜 𝑓 , 𝑍 )(1)
DKRL [
36
], a pioneering inductive link prediction method, proposes
to ground these logical rules from entity descriptions. KEPLER [
34
]
incorporates the logic of triples and semantics of entity descrip-
tion encoded with the advance of pre-trained language models
(PLM). The loss function is a linear combination of TransE’s loss
and masked-language modelling loss. Unfortunately, the method
converges unexpectedly slow and shows sensitivity to noise in en-
tity descriptions. With highly ecient bi-encoders, SimKGC [
33
]
eectively compares knowledge representations using normalized
contrastive loss [
3
]. With very fast convergence, this simple method
completes link prediction on FB15k-237 with
10min, whereas KE-
PLER needs 20h.
However, empirical studies show that the recent contrastive link
prediction method heavily relies on the semantic match and ignores
the underlying graphical structure. We hypothesize awed initial-
ization is the culprit of poor contrastive representation learning.
arXiv:2210.14463v1 [cs.CL] 26 Oct 2022
WWW ’23, April 30–May 04, 2023, Austin, Texas Bohua Peng, Shihao Liang, Mobarakol Islam.
In other words, it is hard for PLMs to estimate the similarity be-
tween linearized relational expression and entity, e.g., comparing
“X inverse sibling of” with “Z”.
This phenomenon reveals at least two problems. First, dier-
ent semantics need dierent ways to express inversion. Motivated
by recent prompt-tuning studies [
6
,
13
], we generate ruled-based
prompts to improve the uency of the relational expression un-
der dierent types of syntax. Using syntax as the principle to nd
prompts not only improves interpretability but also reduces the
diculty of the prompt search, as there are not too many grammar
rules in English. Specically, we netune PLMs on the part-of-
speech tagging task to encode grammar and text into the same
semantics space. Using the embeddings of PLMs, in Figure 2, we
train a multi-layer perceptron to predict the syntactical patterns in
a smooth low-dimensional space. Then a Gaussian mixture model
indexes a pair of reversible prompts for forward and backward
link predictions. We combine the prompts and unnished edges as
relational expressions to better netune bi-encoders for inductive
link predictions. The parameters of the MLP and bi-encoders are
updated with the expectation maximization (EM) algorithm [14].
Another problem is that recent contrastive representation learn-
ing devastatingly aects symmetric relation modelling. Symmetry
has deeply rooted foundations in neural computing [
38
]. To improve
expressiveness on symmetry, we introduce bidirectional linking
between relational bi-encoders (Figure 3), briey known as Bi-Link.
Given relational expressions, Bi-Link understands a triple from
both directions, getting better comprehension for symmetric rela-
tions like “sibling”. Interestingly, the bidirectional linking learnt
in training accommodates exible self-ensembling options for test-
time augmentation. Bi-Link outperforms recent baselines in our
experiments on transductive and inductive link prediction tasks.
Bi-Link can be applied to knowledge-intensive tasks with mini-
mal modication. With rich lexical and syntactical semantics, the
entity linking task ranks all candidate documents to predict a link
from a named entity mention in context to its referent entity doc-
ument. Recently, Zeshel [
16
] has greatly supported the zero-shot
evaluation of PLMs with an entity linking corpus beyond factual
knowledge bases intensively used for pretrainig. However, Zeshel
combines the tasks of zero-shot learning and unsupervised domain
adaptation. Technically, this has caused a de facto pain to error
analysis of dierent models and training methods. In this work, we
create an in-domain zero-shot entity linking benchmark, Zeshel-
Ind, with data of Fandom. To adapt our model to zero-shot entity
linking, we use shared soft prompts for mention spans instead of
prompts on both sides. Besides recent contrastive techniques used
in SimCSE [
7
] and SimKGC [
33
], we opt to share negative candi-
dates retrieved by BM25 [
24
] among in-batch samples. Our methods
have achieved competitive results on both Zeshel-Ind and Zeshel.
Our experiments also show interesting behaviour of contrastive
learning toward domain shift.
In summary, our major contributions are threefold: 1.) We propose
a probabilistic syntactic prompt method to verbalize the unnished
edge into natural relational expressions with a generalizable light-
weight model. 2.)We design symmetric relational encoders, Bi-Link,
for text-based knowledge graph link predictions, and adapt it to
entity linking tasks. 3.)We build a new open-source benchmark,
Zeshel-Ind as a fully-inductive reection of Zeshel for in-domain
zero-shot performance evaluation. We extensively validate Bi-Link
with several recent baselines on Zeshel and Zeshel-Ind.
2 RELATED WORK
Contrastive knowledge representation.
Contrastive knowledge
representation Inspired by NCE [
8
] principle, CPC [
22
] and Sim-
CLR [
3
] are particularly popular contrastive learning paradigms
that learn robust representation with noisy negative samples. For
semantic textual similarity tasks (STS), SimCSE [
7
] signicantly
simplies previous contrastive sentence embedding methods using
bi-transformers. PromptBERT [
10
] has further improved the results
with template denoising. However, empirical studies [
32
] show that
SimCSE is not helpful for entity linking. Using bi-transformers to
contrast document representations is still an ongoing research di-
rection. Finding proper negative samples is crucial for contrastive
learning [
33
]. GPL [
32
] automatically nd high-quality negative
samples with a pipeline, including T5 [
23
], dense retriever [
24
]
and a cross-encoder [
9
]. In this work, we generalize a lightweight
syntactic prompt generator learnt on a subgraph to a large knowl-
edge graph. The prompts improve the quality of negative samples
by transferring unnished edges to approximate relational expres-
sions.
Retrieval with pretrained transformers.
Bi-encoders indepen-
dently map queries and documents to a shared semantic space
to eciently compute their similarity scores. By contrast, cross-
encoders [
37
] broadcast a query and concatenate it with all possible
documents, predicting relevance scores with cross-attention be-
tween the query and documents. Previous work [
9
] has shown
that cross-encoders can produce more robust representations and
achieve better results. But the cumbersome computational over-
head harshly increases the inference time. To address this problem,
ColBERT [
12
] and TwinBERT [
18
] concurrently propose hybrid
networks consisting of bi-encoders and cross-attention layers. As a
result, these works have facilitated real-world search engines by
signicantly reducing the computational burden while retaining
the performance. On the ip side, the late-stage interaction requires
additional training data and strategy. Our work is dierent from
the previous work because our prompt-based bi-encoders can be
useful when queries are fragmented words.
3 METHODS
Given a directed knowledge graph
G=(𝑉 , 𝐸, 𝑅)
with
|𝑉|
entities,
|𝐸|
observed edges, and
|𝑅|
relation types, our transductive link
prediction task is to infer a missing edge
𝑒
described as
(ℎ, 𝑟,
?
)
or
its inverse version
𝑒𝑖𝑛𝑣
described as
(
?
, 𝑟, 𝑡 )
. As a logical operator, a
triple
(ℎ, 𝑟, 𝑡 )
consists of a head entity, relation and a tail entity. In
this work, we present Bi-Link, a symmetric framework that learns
knowledge representations from entity description and reversible
relational text. As the example shown in Figure 2, we enrich
G
with inverse relational textual information phrased by probabilistic
rule-based prompt expressions. Then we encode relational informa-
tion with a siamese network, Bi-Link (Figure 3). The grammatical
module and the Bi-Link network are updated with EM algorithm.
We use this general framework for transductive link prediction,
fully-inductive link prediction [
28
], inductive named entity linking.
摘要:

Bi-Link:BridgingInductiveLinkPredictionsfromTextviaContrastiveLearningofTransformersandPromptsBohuaPengModelbestLondon,UKbp1119@ic.ac.ukShihaoLiangTsinghuaUniversityBeijing,Chinashihaoliang0828@163.comMobarakolIslamUniversityCollegeLondonLondon,UKmobarakol.islam@ucl.ac.ukABSTRACTInductiveknowledgegr...

展开>> 收起<<
Bi-Link Bridging Inductive Link Predictions from Text via Contrastive Learning of Transformers and Prompts.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:1.15MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注