WWW ’23, April 30–May 04, 2023, Austin, Texas Bohua Peng, Shihao Liang, Mobarakol Islam.
In other words, it is hard for PLMs to estimate the similarity be-
tween linearized relational expression and entity, e.g., comparing
“X inverse sibling of” with “Z”.
This phenomenon reveals at least two problems. First, dier-
ent semantics need dierent ways to express inversion. Motivated
by recent prompt-tuning studies [
6
,
13
], we generate ruled-based
prompts to improve the uency of the relational expression un-
der dierent types of syntax. Using syntax as the principle to nd
prompts not only improves interpretability but also reduces the
diculty of the prompt search, as there are not too many grammar
rules in English. Specically, we netune PLMs on the part-of-
speech tagging task to encode grammar and text into the same
semantics space. Using the embeddings of PLMs, in Figure 2, we
train a multi-layer perceptron to predict the syntactical patterns in
a smooth low-dimensional space. Then a Gaussian mixture model
indexes a pair of reversible prompts for forward and backward
link predictions. We combine the prompts and unnished edges as
relational expressions to better netune bi-encoders for inductive
link predictions. The parameters of the MLP and bi-encoders are
updated with the expectation maximization (EM) algorithm [14].
Another problem is that recent contrastive representation learn-
ing devastatingly aects symmetric relation modelling. Symmetry
has deeply rooted foundations in neural computing [
38
]. To improve
expressiveness on symmetry, we introduce bidirectional linking
between relational bi-encoders (Figure 3), briey known as Bi-Link.
Given relational expressions, Bi-Link understands a triple from
both directions, getting better comprehension for symmetric rela-
tions like “sibling”. Interestingly, the bidirectional linking learnt
in training accommodates exible self-ensembling options for test-
time augmentation. Bi-Link outperforms recent baselines in our
experiments on transductive and inductive link prediction tasks.
Bi-Link can be applied to knowledge-intensive tasks with mini-
mal modication. With rich lexical and syntactical semantics, the
entity linking task ranks all candidate documents to predict a link
from a named entity mention in context to its referent entity doc-
ument. Recently, Zeshel [
16
] has greatly supported the zero-shot
evaluation of PLMs with an entity linking corpus beyond factual
knowledge bases intensively used for pretrainig. However, Zeshel
combines the tasks of zero-shot learning and unsupervised domain
adaptation. Technically, this has caused a de facto pain to error
analysis of dierent models and training methods. In this work, we
create an in-domain zero-shot entity linking benchmark, Zeshel-
Ind, with data of Fandom. To adapt our model to zero-shot entity
linking, we use shared soft prompts for mention spans instead of
prompts on both sides. Besides recent contrastive techniques used
in SimCSE [
7
] and SimKGC [
33
], we opt to share negative candi-
dates retrieved by BM25 [
24
] among in-batch samples. Our methods
have achieved competitive results on both Zeshel-Ind and Zeshel.
Our experiments also show interesting behaviour of contrastive
learning toward domain shift.
In summary, our major contributions are threefold: 1.) We propose
a probabilistic syntactic prompt method to verbalize the unnished
edge into natural relational expressions with a generalizable light-
weight model. 2.)We design symmetric relational encoders, Bi-Link,
for text-based knowledge graph link predictions, and adapt it to
entity linking tasks. 3.)We build a new open-source benchmark,
Zeshel-Ind as a fully-inductive reection of Zeshel for in-domain
zero-shot performance evaluation. We extensively validate Bi-Link
with several recent baselines on Zeshel and Zeshel-Ind.
2 RELATED WORK
Contrastive knowledge representation.
Contrastive knowledge
representation Inspired by NCE [
8
] principle, CPC [
22
] and Sim-
CLR [
3
] are particularly popular contrastive learning paradigms
that learn robust representation with noisy negative samples. For
semantic textual similarity tasks (STS), SimCSE [
7
] signicantly
simplies previous contrastive sentence embedding methods using
bi-transformers. PromptBERT [
10
] has further improved the results
with template denoising. However, empirical studies [
32
] show that
SimCSE is not helpful for entity linking. Using bi-transformers to
contrast document representations is still an ongoing research di-
rection. Finding proper negative samples is crucial for contrastive
learning [
33
]. GPL [
32
] automatically nd high-quality negative
samples with a pipeline, including T5 [
23
], dense retriever [
24
]
and a cross-encoder [
9
]. In this work, we generalize a lightweight
syntactic prompt generator learnt on a subgraph to a large knowl-
edge graph. The prompts improve the quality of negative samples
by transferring unnished edges to approximate relational expres-
sions.
Retrieval with pretrained transformers.
Bi-encoders indepen-
dently map queries and documents to a shared semantic space
to eciently compute their similarity scores. By contrast, cross-
encoders [
37
] broadcast a query and concatenate it with all possible
documents, predicting relevance scores with cross-attention be-
tween the query and documents. Previous work [
9
] has shown
that cross-encoders can produce more robust representations and
achieve better results. But the cumbersome computational over-
head harshly increases the inference time. To address this problem,
ColBERT [
12
] and TwinBERT [
18
] concurrently propose hybrid
networks consisting of bi-encoders and cross-attention layers. As a
result, these works have facilitated real-world search engines by
signicantly reducing the computational burden while retaining
the performance. On the ip side, the late-stage interaction requires
additional training data and strategy. Our work is dierent from
the previous work because our prompt-based bi-encoders can be
useful when queries are fragmented words.
3 METHODS
Given a directed knowledge graph
G=(𝑉 , 𝐸, 𝑅)
with
|𝑉|
entities,
|𝐸|
observed edges, and
|𝑅|
relation types, our transductive link
prediction task is to infer a missing edge
𝑒
described as
(ℎ, 𝑟,
?
)
or
its inverse version
𝑒𝑖𝑛𝑣
described as
(
?
, 𝑟, 𝑡 )
. As a logical operator, a
triple
(ℎ, 𝑟, 𝑡 )
consists of a head entity, relation and a tail entity. In
this work, we present Bi-Link, a symmetric framework that learns
knowledge representations from entity description and reversible
relational text. As the example shown in Figure 2, we enrich
G
with inverse relational textual information phrased by probabilistic
rule-based prompt expressions. Then we encode relational informa-
tion with a siamese network, Bi-Link (Figure 3). The grammatical
module and the Bi-Link network are updated with EM algorithm.
We use this general framework for transductive link prediction,
fully-inductive link prediction [
28
], inductive named entity linking.