
Recently, KGs are widely used as auxiliary information to enhance RE methods [13–
15]. And, the availability of resources such as distantly supervised relation extraction
(DSRE) datasets [16] and extensive knowledge databases [17,18] has facilitated the study
of knowledge-enhanced RE. However, no public RE dataset exists that aligns sentences with
the corresponding knowledge context for training and evaluating knowledge-enhanced RE
methods. Previous researchers in this field tend to construct their own auxiliary KGs, cre-
ate datasets from scratch, and retest prior benchmarks for fair comparisons [14,15,19]. The
lack of public benchmarks makes it challenging to report reproducible results or compare the
performance of existing methods.
To address these issues, we adapt three widely-used RE datasets for knowledge-enhanced
RE tasks to curate the Knowledge-Enhanced Relation Extraction Dataset (KERED). KERED
improves the data quality of previous datasets. Also, with information from external knowl-
edge bases, KERED constructs auxiliary KGs for entities in the dataset. We believe KERED
will foster the development of knowledge-enhanced RE in the future.
We commenced our work by examining the original DSRE datasets based on Wikidata
[17] or Freebase [18]. Because DSRE generates large-scale data by aligning relational facts
in knowledge bases with evidence sentence [16], mentions in the corpus naturally match their
entities in their corresponding knowledge bases for distant supervision, which streamlines the
process of locating entities via their identifiers and access the knowledge context of entities
in the knowledge bases. Therefore, it is feasible to construct a KG for each DSRE dataset, as
illustrated in Figure 1. Specifically, We collected three DSRE datasets for KERED: NYT10m
[16], Wiki20m [16], and Wiki80 [20]. Then, we refined them by enhancing data quality and
constructing a KG for each. Also, we conducted comprehensive RE experiments on KERED
to evaluate the performance of existing RE methods. The experimental result shows that infor-
mation from auxiliary KGs has positive effects on RE methods. In summary, KERED offers
the first standardized datasets for knowledge-enhanced RE tasks.
Our study’s contributions are twofold. Firstly, we develop KERED, comprising three chal-
lenging RE datasets with auxiliary KGs, with the potential to advance knowledge-enhanced
RE research. We make our datasets publicly available on Figshare1; please refer to Appendix
A for KERED access. Secondly, we establish metrics for knowledge-enhanced RE meth-
ods on KERED and assess state-of-the-art RE methods using our datasets. Our experiments
indicate that knowledge-enhanced RE methods can surpass traditional approaches.
The remainder of the paper is organized as follows: Section 2reviews widely-used DSRE
datasets and knowledge-enhanced RE methods. Section 3details the construction of KERED.
Section 4presents KERED’s descriptive analysis. Section 5evaluates RE models on KERED.
Section 6discusses the experimental results. Finally, Section 7concludes the paper.
2 Related Works
DSRE datasets. The majority of existing DSRE datasets are constructed by identifying enti-
ties mentioned in evidence sentences and linking them to public knowledge bases such as
Wikidata [17] and Freebase [18]. NYT10 [21] is a large-scale dataset automatically con-
structed using DSRE. However, Han et al. [20] highlighted noisy labeling issues in NYT10
and other existing datasets. To mitigate this problem, Gao et al. [16] introduced two DSRE
1https://figshare.com/projects/KERED/134459
3