Knowledge-Enhanced Relation Extraction Dataset Yucong Lin1y Hongming Xiao2y Jiani Liu2 Zichao Lin2 Keming Lu3 Feifei Wang4 Wei Wei5

2025-05-03 0 0 688.6KB 20 页 10玖币
侵权投诉
Knowledge-Enhanced Relation Extraction Dataset
Yucong Lin1, Hongming Xiao2, Jiani Liu2, Zichao Lin2,
Keming Lu3, Feifei Wang4*, Wei Wei5*
1School of Medical Technology, Beijing Institute of Technology,
Zhongguancun South Street No.5, Beijing, 100081, China.
2School of Computer Science and Technology, Beijing Institute of
Technology, No.5, Zhongguancun South Street, Beijing, 100081, China.
3Viterbi School of Engineering, University of Southern California, University
of Southern California, 3939 S Figueroa St, Los Angeles, CA, 90037, USA.
4Center for Applied Statistics and School of Statistics, Renmin University of
China, No. 59, Zhongguancun Street, Beijing, 100872, China.
5HSBC Business School, Peking University, Xueyuan Street, Shenzhen,
518055, Guangdong China.
*Corresponding author(s). E-mail(s): feifei.wang@ruc.edu.cn;
weiwei@phbs.pku.edu.cn;
Contributing authors: linyucong@bit.edu.cn;xiaohongmin@bit.edu.cn;
jiani liu@bit.edu.cn;zc lin@bit.edu.cn;keminglu@usc.edu;
These authors contributed equally to this work.
Abstract
Recently, knowledge-enhanced methods leveraging auxiliary knowledge graphs have
emerged in relation extraction, surpassing traditional text-based approaches. However, to
our best knowledge, there is currently no public dataset available that encompasses both
evidence sentences and knowledge graphs for knowledge-enhanced relation extraction.
To address this gap, we introduce the Knowledge-Enhanced Relation Extraction Dataset
(KERED). KERED annotates each sentence with a relational fact, and it provides knowl-
edge context for entities through entity linking. Using our curated dataset, We compared
contemporary relation extraction methods under two prevalent task settings: sentence-level
and bag-level. The experimental result shows the knowledge graphs provided by KERED
can support knowledge-enhanced relation extraction methods. We believe that KERED
offers high-quality relation extraction datasets with corresponding knowledge graphs
for evaluating the performance of knowledge-enhanced relation extraction methods. Our
dataset is available at: https://figshare.com/projects/KERED/134459
1
arXiv:2210.11231v3 [cs.LG] 25 Apr 2023
Keywords: Distant supervision, Knowledge graph, Knowledge-enhanced relation extraction,
Relation extraction
1 Introduction
Relation extraction (RE) focuses on extracting relationships between entities from natural
language sentences [1]. RE enhances various downstream tasks in natural language process-
ing, including question answering [2,3], knowledge graph construction [4,5], and reading
comprehension [6,7]. Knowledge graphs (KGs) store relational facts as triples including sub-
ject entities, object entities, and the relations between them [8]. For instance, the relational
fact (James Joyce, country of citizenship, Ireland) indicates that James Joyce was a citizen of
Ireland. As a kind of structured representation of facts, KGs find extensive applications, such
as social network analysis [9,10] and recommender systems [11,12].
Ansel Easton Adams was an
American landscape
photographer.
Instance
Subject: Q5767067
(Ansel Easton Adams)
Object: Q739 (Photographer)
Relation: P106 (Occupation)
Adams participated in the
club's annual High Trips, later
becoming official
photographer for the trips.
Instance
Imitating the example of
photographer Alfred Stieglitz,
Adams opened his own art and
photography gallery in 1933.
Instance
Q5767067
Ansel Adams
Q739
Photographer
P106
Occupation
Knowledge Graph
Q30
USA
P27
Country of
citizenship
Q28640
Profession
Q11633
Photography
P3095
Practiced by
P101
Field of work
P31
Instance of
Subject: Q5767067 (Adams)
Object: Q739 (Photographer)
Relation: P106 (Occupation)
Subject: Q5767067 (Adams)
Object: Q739 (Photographer)
Relation: NA
Fig. 1 An illustration of instances in a DSRE dataset and the auxiliary KG for the dataset. An instance consists of
a sentence together with a relational fact expressed by the sentence. A batch of instances constitutes an RE dataset.
A KG contains some or all entities in an RE dataset and relations between them. In this example, we link entities in
the three instances to their counterparts in the auxiliary KG. Then, a knowledge-enhanced RE method can classify
the relation between Ansel Adams and Photographer as Occupation by the virtue of the information provided by the
sentences and the KG.
2
Recently, KGs are widely used as auxiliary information to enhance RE methods [13
15]. And, the availability of resources such as distantly supervised relation extraction
(DSRE) datasets [16] and extensive knowledge databases [17,18] has facilitated the study
of knowledge-enhanced RE. However, no public RE dataset exists that aligns sentences with
the corresponding knowledge context for training and evaluating knowledge-enhanced RE
methods. Previous researchers in this field tend to construct their own auxiliary KGs, cre-
ate datasets from scratch, and retest prior benchmarks for fair comparisons [14,15,19]. The
lack of public benchmarks makes it challenging to report reproducible results or compare the
performance of existing methods.
To address these issues, we adapt three widely-used RE datasets for knowledge-enhanced
RE tasks to curate the Knowledge-Enhanced Relation Extraction Dataset (KERED). KERED
improves the data quality of previous datasets. Also, with information from external knowl-
edge bases, KERED constructs auxiliary KGs for entities in the dataset. We believe KERED
will foster the development of knowledge-enhanced RE in the future.
We commenced our work by examining the original DSRE datasets based on Wikidata
[17] or Freebase [18]. Because DSRE generates large-scale data by aligning relational facts
in knowledge bases with evidence sentence [16], mentions in the corpus naturally match their
entities in their corresponding knowledge bases for distant supervision, which streamlines the
process of locating entities via their identifiers and access the knowledge context of entities
in the knowledge bases. Therefore, it is feasible to construct a KG for each DSRE dataset, as
illustrated in Figure 1. Specifically, We collected three DSRE datasets for KERED: NYT10m
[16], Wiki20m [16], and Wiki80 [20]. Then, we refined them by enhancing data quality and
constructing a KG for each. Also, we conducted comprehensive RE experiments on KERED
to evaluate the performance of existing RE methods. The experimental result shows that infor-
mation from auxiliary KGs has positive effects on RE methods. In summary, KERED offers
the first standardized datasets for knowledge-enhanced RE tasks.
Our study’s contributions are twofold. Firstly, we develop KERED, comprising three chal-
lenging RE datasets with auxiliary KGs, with the potential to advance knowledge-enhanced
RE research. We make our datasets publicly available on Figshare1; please refer to Appendix
A for KERED access. Secondly, we establish metrics for knowledge-enhanced RE meth-
ods on KERED and assess state-of-the-art RE methods using our datasets. Our experiments
indicate that knowledge-enhanced RE methods can surpass traditional approaches.
The remainder of the paper is organized as follows: Section 2reviews widely-used DSRE
datasets and knowledge-enhanced RE methods. Section 3details the construction of KERED.
Section 4presents KERED’s descriptive analysis. Section 5evaluates RE models on KERED.
Section 6discusses the experimental results. Finally, Section 7concludes the paper.
2 Related Works
DSRE datasets. The majority of existing DSRE datasets are constructed by identifying enti-
ties mentioned in evidence sentences and linking them to public knowledge bases such as
Wikidata [17] and Freebase [18]. NYT10 [21] is a large-scale dataset automatically con-
structed using DSRE. However, Han et al. [20] highlighted noisy labeling issues in NYT10
and other existing datasets. To mitigate this problem, Gao et al. [16] introduced two DSRE
1https://figshare.com/projects/KERED/134459
3
datasets with manually annotated test sets, significantly enhancing the data quality of pre-
vious NYT10 [21] and Wiki20 [22]. Although noisy labeling problems were solved in test
sets, data quality issues persist in NYT10m and Wiki20m. Hence, we further denoised these
datasets and enriched them with external KGs through entity linking. Our refined datasets
provide the community with benchmarks for evaluating knowledge-enhanced RE methods.
Knowledge-enhanced Relation Extraction. An increasing number of RE methods incorpo-
rate auxiliary information, such as attributes and embeddings of entities, into their models
[23], where KG information plays a crucial role by revealing associations between enti-
ties [24]. CGRE [14] derives constraint graphs from KGs to model intrinsic connections
between relations. The model generates representations for entities and relations by encod-
ing the graph into vectors and extracting node features. Xu and Barbosa [25] proposed an
RE framework HRERE which jointly learns language representations and knowledge graph
embeddings. Moreover, KGPool [15] employs a graph pooling algorithm that dynamically
selects KG context to enhance model performance. Their method considers only names, cat-
egories, aliases, and descriptions of entities as their side information from KG. REMAP [26],
a multimodal method for DSRE, combines knowledge graph embeddings with deep lan-
guage models to classify relations between entities. However, comparing the performance of
all these methods is impossible due to the absence of benchmarks. Thus, we revisit existing
knowledge-enhanced RE methods and evaluate them on our datasets to facilitate an objective
comparison.
3 Construction of KERED
We first define the problem and present an overview of previous DSRE datasets in Sections 3.1
and 3.2. Subsequently, we detail the process of entity linking in Sections 3.4, and then
describe the process of dataset refinement in 3.5.
3.1 Problem Definition
An RE dataset consists of a collection of instances and a set of candidate relations. Wherein,
each instance contains an evidence sentence and an annotated relational fact (an entity pair
and a relation between them), as illustrated in Figure 1. All relations appearing in the dataset
are restricted to the set of candidate relations. On RE datasets, traditional RE methods pre-
dict the relation between two entities based on the sentence for each instance. In addition to
that, knowledge-enhanced RE methods typically require an auxiliary KG as supplementary
information to improve RE performance. They categorize relations between entity pairs in
evidence sentences with the aid of knowledge base information [23,25]. To establish datasets
for knowledge-enhanced RE tasks, we enhance the data quality of existing RE datasets and
construct KGs for them to obtain KG-enhanced RE datasets.
Our experiments evaluate RE methods at two levels. Sentence-level RE considers only
one instance as input at a time and predicts the relation of entities expressed by the sentence.
In contrast, bag-level RE takes a bag as input at a time. The bag contains multiple instances
with the same entity pairs in the dataset, such as the three instances depicted in Figure 1.
When categorizing the relation between two entities, bag-level RE methods take into account
all instances in the bag.
4
摘要:

Knowledge-EnhancedRelationExtractionDatasetYucongLin1y,HongmingXiao2y,JianiLiu2,ZichaoLin2,KemingLu3,FeifeiWang4*,WeiWei5*1SchoolofMedicalTechnology,BeijingInstituteofTechnology,ZhongguancunSouthStreetNo.5,Beijing,100081,China.2SchoolofComputerScienceandTechnology,BeijingInstituteofTechnology,No.5,Z...

展开>> 收起<<
Knowledge-Enhanced Relation Extraction Dataset Yucong Lin1y Hongming Xiao2y Jiani Liu2 Zichao Lin2 Keming Lu3 Feifei Wang4 Wei Wei5.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:688.6KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注