
et al. (2020) insert both entity type information
and relation descriptions to the model. Dong et al.
(2021) use a relation encoder to generate relation
representation besides the sentence encoder. Han
et al. (2021a) propose a hybrid prototypical net-
work that can generate hybrid prototypes from con-
text sentences and relation descriptions. Nonethe-
less, these methods largely assume that every sup-
port instance is provided with a corresponding tex-
tual label in the support set during both learning
and prediction. We argue that injecting textual la-
bels to all support instances may render the training
task unchallenging, because the model can largely
rely on the textual labels during training, and thus
results in poor performance during testing when
faced with unseen relations and textual labels. Ide-
ally, textual labels should be treated as additional
source of information, such that the model can
work with or without the textual labels, as shown
in the top part in Figure 1.
In this work, we propose a novel approach called
Label Prompt Dropout (LPD). We directly con-
catenate the textual label and the context sentence,
and feed them together to the Transformer encoder
(Vaswani et al.,2017). The textual label serves as
alabel prompt
2
to guide and regularize the Trans-
former encoder to output a label-aware relation
representation through self-attention. During train-
ing, we randomly drop out the prompt tokens to
create a more challenging scenario, such that the
model has to learn to work with and without the
relation descriptions. Experiments show our ap-
proach achieves significant improvement on two
standard FSRE datasets. Extensive ablation stud-
ies are conducted to demonstrate the effectiveness
of our approach. Furthermore, we highlight a po-
tential issue with the evaluation setup of previous
research efforts, in which the pre-training data con-
tains relation types that actually overlap with those
in the test set. We argue that this may not be a de-
sirable setup for few-shot learning, and show that
the performance gain of existing efforts may be
partly due to this “knowledge leakage” issue. We
2
In this work, we use the terms label prompt,relation de-
scription, and textual label interchangeably. However, our
method differs from the conventional prompt-based model
in which a verbalizer (Schick and Schütze,2021) is needed.
We use relation description to construct a natural language
sentence for each instance to better make use of implicit
knowledge acquired by language models during pre-training.
This goal is similar to that of the conventional prompt-based
method. This is why we call call our method label prompt
dropout.
propose to filter out all the overlapping relation
types in the pre-training data and conduct more rig-
orous few-shot evaluation. In summary, we make
the following contributions:
•
We present LPD, a novel label prompt dropout
approach that makes better use of the textual
labels in FSRE. This simple design has signif-
icantly outperformed previous attempts that
fuse the textual label and the context sentence
using complex network structures.
•
We identify the limitation of the previous ex-
perimental setup in the literature and propose
a stricter setup for evaluation in FSRE. For
both setups, we show strong improvements
over the previous state of the art.
2 Related Work
2.1 Few-Shot Relation Extraction
Few-shot relation extraction (FSRE) aims to train
a model that can classify instances into novel re-
lations with only a handful of training examples.
Han et al. (2018) are the first to introduce a large
scale benchmark for FSRE, in which they evalu-
ate a model in
N
-way-
K
-shot settings. Gao et al.
(2019a) propose a hybrid attention-based proto-
typical network to handle the diversity and noise
problem of text data. Qu et al. (2020) model the re-
lationship between different relations via Bayesian
meta-learning on relation graphs. Han et al. (2021a)
apply an adaptive focal loss and hybrid networks
to model the different difficulties of different rela-
tions.
Another line of work focuses on further training
pre-trained language models (PLMs) on the task of
relation extraction (RE). Based on the hypothesis
that sentences with the same entity pairs are likely
to express the same relation, Baldini Soares et al.
(2019) collect a large-scale pre-training dataset
and propose a “matching the blanks” pre-training
paradigm. Peng et al. (2020a) present an entity-
masked contrastive pre-training framework for re-
lation extraction. Dong et al. (2021) introduce a
semantic mapping approach to include relation de-
scriptions in the pre-training phase. Inspired by
these works, we propose a contrastive pre-training
with label prompt dropout approach to use relation
descriptions during pre-training while creating a
more difficult setup by dropping out the relation
descriptions.