Knowledge-Enhanced Relation Extraction Dataset Yucong Lin1y Hongming Xiao2y Jiani Liu2 Zichao Lin2 Keming Lu3 Feifei Wang4 Wei Wei5

2025-05-03 0 0 688.6KB 20 页 10玖币

侵权投诉

Knowledge-Enhanced Relation Extraction Dataset

Yucong Lin1†, Hongming Xiao2†, Jiani Liu2, Zichao Lin2,

Keming Lu3, Feifei Wang4*, Wei Wei5*

1School of Medical Technology, Beijing Institute of Technology,

Zhongguancun South Street No.5, Beijing, 100081, China.

2School of Computer Science and Technology, Beijing Institute of

Technology, No.5, Zhongguancun South Street, Beijing, 100081, China.

3Viterbi School of Engineering, University of Southern California, University

of Southern California, 3939 S Figueroa St, Los Angeles, CA, 90037, USA.

4Center for Applied Statistics and School of Statistics, Renmin University of

China, No. 59, Zhongguancun Street, Beijing, 100872, China.

5HSBC Business School, Peking University, Xueyuan Street, Shenzhen,

518055, Guangdong China.

*Corresponding author(s). E-mail(s): feifei.wang@ruc.edu.cn;

weiwei@phbs.pku.edu.cn;

Contributing authors: linyucong@bit.edu.cn;xiaohongmin@bit.edu.cn;

jiani liu@bit.edu.cn;zc lin@bit.edu.cn;keminglu@usc.edu;

†These authors contributed equally to this work.

Abstract

Recently, knowledge-enhanced methods leveraging auxiliary knowledge graphs have

emerged in relation extraction, surpassing traditional text-based approaches. However, to

our best knowledge, there is currently no public dataset available that encompasses both

evidence sentences and knowledge graphs for knowledge-enhanced relation extraction.

To address this gap, we introduce the Knowledge-Enhanced Relation Extraction Dataset

(KERED). KERED annotates each sentence with a relational fact, and it provides knowl-

edge context for entities through entity linking. Using our curated dataset, We compared

contemporary relation extraction methods under two prevalent task settings: sentence-level

and bag-level. The experimental result shows the knowledge graphs provided by KERED

can support knowledge-enhanced relation extraction methods. We believe that KERED

offers high-quality relation extraction datasets with corresponding knowledge graphs

for evaluating the performance of knowledge-enhanced relation extraction methods. Our

dataset is available at: https://ﬁgshare.com/projects/KERED/134459

arXiv:2210.11231v3 [cs.LG] 25 Apr 2023

Keywords: Distant supervision, Knowledge graph, Knowledge-enhanced relation extraction,

Relation extraction

1 Introduction

Relation extraction (RE) focuses on extracting relationships between entities from natural

language sentences [1]. RE enhances various downstream tasks in natural language process-

ing, including question answering [2,3], knowledge graph construction [4,5], and reading

comprehension [6,7]. Knowledge graphs (KGs) store relational facts as triples including sub-

ject entities, object entities, and the relations between them [8]. For instance, the relational

fact (James Joyce, country of citizenship, Ireland) indicates that James Joyce was a citizen of

Ireland. As a kind of structured representation of facts, KGs ﬁnd extensive applications, such

as social network analysis [9,10] and recommender systems [11,12].

Ansel Easton Adams was an

American landscape

photographer.

Instance

Subject: Q5767067

(Ansel Easton Adams)

Object: Q739 (Photographer)

Relation: P106 (Occupation)

Adams participated in the

club's annual High Trips, later

becoming official

photographer for the trips.

Instance

Imitating the example of

photographer Alfred Stieglitz,

Adams opened his own art and

photography gallery in 1933.

Instance

Q5767067

Ansel Adams

Q739

Photographer

P106

Occupation

Knowledge Graph

Q30

USA

P27

Country of

citizenship

Q28640

Profession

Q11633

Photography

P3095

Practiced by

P101

Field of work

P31

Instance of

Subject: Q5767067 (Adams)

Object: Q739 (Photographer)

Relation: P106 (Occupation)

Subject: Q5767067 (Adams)

Object: Q739 (Photographer)

Relation: NA

Fig. 1 An illustration of instances in a DSRE dataset and the auxiliary KG for the dataset. An instance consists of

a sentence together with a relational fact expressed by the sentence. A batch of instances constitutes an RE dataset.

A KG contains some or all entities in an RE dataset and relations between them. In this example, we link entities in

the three instances to their counterparts in the auxiliary KG. Then, a knowledge-enhanced RE method can classify

the relation between Ansel Adams and Photographer as Occupation by the virtue of the information provided by the

sentences and the KG.

Recently, KGs are widely used as auxiliary information to enhance RE methods [13–

15]. And, the availability of resources such as distantly supervised relation extraction

(DSRE) datasets [16] and extensive knowledge databases [17,18] has facilitated the study

of knowledge-enhanced RE. However, no public RE dataset exists that aligns sentences with

the corresponding knowledge context for training and evaluating knowledge-enhanced RE

methods. Previous researchers in this ﬁeld tend to construct their own auxiliary KGs, cre-

ate datasets from scratch, and retest prior benchmarks for fair comparisons [14,15,19]. The

lack of public benchmarks makes it challenging to report reproducible results or compare the

performance of existing methods.

To address these issues, we adapt three widely-used RE datasets for knowledge-enhanced

RE tasks to curate the Knowledge-Enhanced Relation Extraction Dataset (KERED). KERED

improves the data quality of previous datasets. Also, with information from external knowl-

edge bases, KERED constructs auxiliary KGs for entities in the dataset. We believe KERED

will foster the development of knowledge-enhanced RE in the future.

We commenced our work by examining the original DSRE datasets based on Wikidata

[17] or Freebase [18]. Because DSRE generates large-scale data by aligning relational facts

in knowledge bases with evidence sentence [16], mentions in the corpus naturally match their

entities in their corresponding knowledge bases for distant supervision, which streamlines the

process of locating entities via their identiﬁers and access the knowledge context of entities

in the knowledge bases. Therefore, it is feasible to construct a KG for each DSRE dataset, as

illustrated in Figure 1. Speciﬁcally, We collected three DSRE datasets for KERED: NYT10m

[16], Wiki20m [16], and Wiki80 [20]. Then, we reﬁned them by enhancing data quality and

constructing a KG for each. Also, we conducted comprehensive RE experiments on KERED

to evaluate the performance of existing RE methods. The experimental result shows that infor-

mation from auxiliary KGs has positive effects on RE methods. In summary, KERED offers

the ﬁrst standardized datasets for knowledge-enhanced RE tasks.

Our study’s contributions are twofold. Firstly, we develop KERED, comprising three chal-

lenging RE datasets with auxiliary KGs, with the potential to advance knowledge-enhanced

RE research. We make our datasets publicly available on Figshare1; please refer to Appendix

A for KERED access. Secondly, we establish metrics for knowledge-enhanced RE meth-

ods on KERED and assess state-of-the-art RE methods using our datasets. Our experiments

indicate that knowledge-enhanced RE methods can surpass traditional approaches.

The remainder of the paper is organized as follows: Section 2reviews widely-used DSRE

datasets and knowledge-enhanced RE methods. Section 3details the construction of KERED.

Section 4presents KERED’s descriptive analysis. Section 5evaluates RE models on KERED.

Section 6discusses the experimental results. Finally, Section 7concludes the paper.

2 Related Works

DSRE datasets. The majority of existing DSRE datasets are constructed by identifying enti-

ties mentioned in evidence sentences and linking them to public knowledge bases such as

Wikidata [17] and Freebase [18]. NYT10 [21] is a large-scale dataset automatically con-

structed using DSRE. However, Han et al. [20] highlighted noisy labeling issues in NYT10

and other existing datasets. To mitigate this problem, Gao et al. [16] introduced two DSRE

1https://ﬁgshare.com/projects/KERED/134459

datasets with manually annotated test sets, signiﬁcantly enhancing the data quality of pre-

vious NYT10 [21] and Wiki20 [22]. Although noisy labeling problems were solved in test

sets, data quality issues persist in NYT10m and Wiki20m. Hence, we further denoised these

datasets and enriched them with external KGs through entity linking. Our reﬁned datasets

provide the community with benchmarks for evaluating knowledge-enhanced RE methods.

Knowledge-enhanced Relation Extraction. An increasing number of RE methods incorpo-

rate auxiliary information, such as attributes and embeddings of entities, into their models

[23], where KG information plays a crucial role by revealing associations between enti-

ties [24]. CGRE [14] derives constraint graphs from KGs to model intrinsic connections

between relations. The model generates representations for entities and relations by encod-

ing the graph into vectors and extracting node features. Xu and Barbosa [25] proposed an

RE framework HRERE which jointly learns language representations and knowledge graph

embeddings. Moreover, KGPool [15] employs a graph pooling algorithm that dynamically

selects KG context to enhance model performance. Their method considers only names, cat-

egories, aliases, and descriptions of entities as their side information from KG. REMAP [26],

a multimodal method for DSRE, combines knowledge graph embeddings with deep lan-

guage models to classify relations between entities. However, comparing the performance of

all these methods is impossible due to the absence of benchmarks. Thus, we revisit existing

knowledge-enhanced RE methods and evaluate them on our datasets to facilitate an objective

comparison.

3 Construction of KERED

We ﬁrst deﬁne the problem and present an overview of previous DSRE datasets in Sections 3.1

and 3.2. Subsequently, we detail the process of entity linking in Sections 3.4, and then

describe the process of dataset reﬁnement in 3.5.

3.1 Problem Deﬁnition

An RE dataset consists of a collection of instances and a set of candidate relations. Wherein,

each instance contains an evidence sentence and an annotated relational fact (an entity pair

and a relation between them), as illustrated in Figure 1. All relations appearing in the dataset

are restricted to the set of candidate relations. On RE datasets, traditional RE methods pre-

dict the relation between two entities based on the sentence for each instance. In addition to

that, knowledge-enhanced RE methods typically require an auxiliary KG as supplementary

information to improve RE performance. They categorize relations between entity pairs in

evidence sentences with the aid of knowledge base information [23,25]. To establish datasets

for knowledge-enhanced RE tasks, we enhance the data quality of existing RE datasets and

construct KGs for them to obtain KG-enhanced RE datasets.

Our experiments evaluate RE methods at two levels. Sentence-level RE considers only

one instance as input at a time and predicts the relation of entities expressed by the sentence.

In contrast, bag-level RE takes a bag as input at a time. The bag contains multiple instances

with the same entity pairs in the dataset, such as the three instances depicted in Figure 1.

When categorizing the relation between two entities, bag-level RE methods take into account

all instances in the bag.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Knowledge-EnhancedRelationExtractionDatasetYucongLin1y,HongmingXiao2y,JianiLiu2,ZichaoLin2,KemingLu3,FeifeiWang4*,WeiWei5*1SchoolofMedicalTechnology,BeijingInstituteofTechnology,ZhongguancunSouthStreetNo.5,Beijing,100081,China.2SchoolofComputerScienceandTechnology,BeijingInstituteofTechnology,No.5,Z...

展开>> 收起<<

Knowledge-Enhanced Relation Extraction Dataset Yucong Lin1y Hongming Xiao2y Jiani Liu2 Zichao Lin2 Keming Lu3 Feifei Wang4 Wei Wei5.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Knowledge-Enhanced Relation Extraction Dataset Yucong Lin1y Hongming Xiao2y Jiani Liu2 Zichao Lin2 Keming Lu3 Feifei Wang4 Wei Wei5

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: