
Rescue Implicit and Long-tail Cases: Nearest Neighbor Relation
Extraction
Zhen Wan ∗1Qianying Liu ∗1
Zhuoyuan Mao1Fei Cheng1Sadao Kurohashi1Jiwei Li2
1Kyoto University, Japan
2Zhejiang University, China
{zhenwan, ying, zhuoyuanmao}@nlp.ist.i.kyoto-u.ac.jp
{feicheng, kuro}@i.kyoto-u.ac.jp
{jiwei_li}@zju.edu.cn
Abstract
Relation extraction (RE) has achieved remark-
able progress with the help of pre-trained lan-
guage models. However, existing RE models
are usually incapable of handling two situa-
tions: implicit expressions and long-tail rela-
tion types, caused by language complexity and
data sparsity. In this paper, we introduce a sim-
ple enhancement of RE using knearest neigh-
bors (kNN-RE). kNN-RE allows the model to
consult training relations at test time through
a nearest-neighbor search and provides a sim-
ple yet effective means to tackle the two issues
above. Additionally, we observe that kNN-
RE serves as an effective way to leverage dis-
tant supervision (DS) data for RE. Experimen-
tal results show that the proposed kNN-RE
achieves state-of-the-art performances on a va-
riety of supervised RE datasets, i.e., ACE05,
SciERC, and Wiki80, along with outperform-
ing the best model to date on the i2b2 and
Wiki80 datasets in the setting of allowing us-
ing DS. Our code and models are available at:
https://github.com/YukinoWan/kNN-RE.
1 Introduction
Relation extraction (RE) aims to identify the rela-
tionship between entities mentioned in a sentence,
and is beneficial to a variety of downstream tasks
such as question answering and knowledge base
population. Recent studies (Zhang et al.,2020;
Zeng et al.,2020;Lin et al.,2020;Wang and Lu,
2020;Cheng et al.,2020;Zhong and Chen,2021)
in supervised RE take advantage of pre-trained lan-
guage models (PLMs) and achieve SOTA perfor-
mances by fine-tuning PLMs with a relation classi-
fier. However, we observe that existing RE models
are usually incapable of handling two RE-specific
situations :
implicit expressions
and
long-tail re-
lation types.
Implicit expression
refers to the situation where
a relation is expressed as the underlying message
∗This denotes equal contribution.
Nearest Neighbor: He
was the younger bro-
ther of Panagiotis and
Athanasios Sekeris.
Test ex am ple : He is the
youngest son of Liones,
comparing with Samuel
Liones and Henry Liones.
Gold label: sibling to
Implicit Expression Long-tail Relation Types
Model prediction: spouse
Gold label: sibling to
child of
spouse
sibling to
title
country
of birth
employee
of
Test example Decision boundary
Figure 1: Left: the retrieved example has a similar
structure but with the phrase “younger brother”, it be-
comes easier to infer. Right: Referring to the gold
labels of nearest neighbors can reduce the bias. High-
lighted words may directly influence on the relation pre-
diction.
that is not explicitly stated or shown. For exam-
ple, for the relation “sibling to”, a common expres-
sion can be “
He
has a brother
James
”, while an
implicit expression could be “He is the youngest
son of Liones, comparing with
Samuel Liones
and
Henry Liones
.” In the latter case, the rela-
tion “sibling to” between “
Samuel Liones
” and
“
Henry Liones
” is not directly expressed but could
be inferred from them both are brothers of the same
person. Such underlying message can easily con-
fuse the relation classifier. The problem of
long-
tail relation types
is caused by data sparsity in
training. For example, the widely used supervised
RE dataset TACRED (Zhang et al.,2017) includes
41 relation types. The most frequent type “per:title”
has 3,862 training examples, while over 22 types
have less than 300 examples. The majority types
can easily dominate model predictions and lead to
low performance on long-tail types.
Inspired by recent studies (Khandelwal et al.,
2020;Guu et al.,2020;Meng et al.,2021) us-
ing
k
NN to retrieve diverse expressions for lan-
guage generation tasks, we introduce a simple but
effective
k
NN-RE framework to address above-
mentioned two problems. Specifically, we store the
training examples as the memory by a vanilla RE
arXiv:2210.11800v2 [cs.CL] 30 Jan 2023