A Decade of Knowledge Graphs in Natural Language Processing A Survey Phillip Schneider1 Tim Schopf1 Juraj Vladika1 Mikhail Galkin2 Elena Simperl3and Florian Matthes1

2025-05-01 0 0 552.08KB 14 页 10玖币

侵权投诉

A Decade of Knowledge Graphs in Natural Language Processing: A Survey

Phillip Schneider1, Tim Schopf1, Juraj Vladika1, Mikhail Galkin2,

Elena Simperl3and Florian Matthes1

1Technical University of Munich, Department of Computer Science, Germany

2Mila Quebec AI Institute & McGill University, School of Computer Science, Canada

3King’s College London, Department of Informatics, United Kingdom

{phillip.schneider, tim.schopf, juraj.vladika, matthes}@tum.de

mikhail.galkin@mila.quebec

elena.simperl@kcl.ac.uk

Abstract

In pace with developments in the research ﬁeld

of artiﬁcial intelligence, knowledge graphs

(KGs) have attracted a surge of interest from

both academia and industry. As a represen-

tation of semantic relations between entities,

KGs have proven to be particularly relevant for

natural language processing (NLP), experienc-

ing a rapid spread and wide adoption within

recent years. Given the increasing amount of

research work in this area, several KG-related

approaches have been surveyed in the NLP re-

search community. However, a comprehen-

sive study that categorizes established topics

and reviews the maturity of individual research

streams remains absent to this day. Contribut-

ing to closing this gap, we systematically ana-

lyzed 507 papers from the literature on KGs in

NLP. Our survey encompasses a multifaceted

review of tasks, research types, and contribu-

tions. As a result, we present a structured

overview of the research landscape, provide

a taxonomy of tasks, summarize our ﬁndings,

and highlight directions for future work.

1 Introduction

Knowledge acquisition and application are inher-

ent to natural language. Humans use language as a

means of communicating facts, arguing about de-

cisions, or questioning beliefs. Therefore, it is not

surprising that computational linguists started al-

ready in the 1950s and 60s to work out ideas on how

to represent knowledge as relations between con-

cepts in semantic networks (Richens,1956;Quil-

lian,1963;Collins and Quillian,1969).

More recently, knowledge graphs (KGs) have

emerged as an approach for semantically repre-

senting knowledge about real-world entities in a

machine-readable format. They originated from

research on semantic networks, domain-speciﬁc

ontologies, as well as linked data, and are thus not

an entirely new concept (Hitzler,2021). Despite

their growing popularity, there is still no general

understanding of what exactly a

is or for what

tasks it is applicable. Although prior work has al-

ready attempted to deﬁne KGs (Pujara et al.,2013;

Ehrlinger and Wöß,2016;Paulheim,2017;Färber

et al.,2018), the term is not yet used uniformly by

researchers. Most studies implicitly adopt a broad

deﬁnition of KGs, where they are understood as "a

graph of data intended to accumulate and convey

knowledge of the real world, whose nodes represent

entities of interest and whose edges represent rela-

tions between these entities" (Hogan et al.,2022).

KGs have attracted a lot of research attention

in both academia and industry since the introduc-

tion of Google’s KG in 2012 (Singhal,2012). Par-

ticularly in natural language processing (

NLP

) re-

search, the adoption of KGs has become increas-

ingly popular over the past 5 years, and this trend

seems to be accelerating. The underlying paradigm

is that the combination of structured and unstruc-

tured knowledge can beneﬁt all kinds of

NLP

tasks.

For instance, structured knowledge from KGs can

be injected into that of the contextual knowledge

found in language models, which improves the per-

formance in downstream tasks (Colon-Hernandez

et al.,2021). Furthermore, with the growing impor-

tance of KGs, there are also expanding efforts to

construct new KGs from unstructured texts.

Ten years after Google coined the term knowl-

edge graph in 2012, a plethora of novel approaches

has been proposed by scholars. Therefore, it is im-

portant to assemble insights, consolidate existing

results, and provide a structured overview. How-

ever, to our knowledge, there are no studies that

offer an overview of the whole research landscape

of KGs in the

NLP

ﬁeld. Contributing to closing

this gap, we performed a comprehensive survey

to analyze all research performed in this area by

classifying established topics, identifying trends,

and outlining areas for future research. Our three

main contributions are as follows:

arXiv:2210.00105v1 [cs.CL] 30 Sep 2022

Task Taxonomy of Knowledge Graphs in Natural Language Processing

Knowledge Graph Construction Knowledge Graph Reasoning

Knowledge Extraction

Knowledge Acquisition Knowledge Application

Natural Language Understanding Natural Language Generation

Attribute Extraction

Entity Extraction

Relation Extraction

Knowledge Integration

Entity Alignment

Entity Linking

Ontology Construction

Entity Classification

Error Detection

Knowledge Graph Embedding

Link Prediction

Relation Linking

Relation Classification

Natural Language Inference

Semantic Parsing

Semantic Search

Semantic Similarity

Text Analysis

Text Classification

Data-to-Text Generation

Machine Translation

Question Generation

Text Generation

Text Summarization

Augmented Language Models

Conversational Interfaces

Question Answering

Triple Classification

Figure 1: Taxonomy of tasks in the literature on KGs in NLP.

We systematically extract information from

507 included papers and report insights about

tasks, research types, and contributions.

We provide a taxonomy of tasks in the litera-

ture on KGs in NLP shown in Figure 1.

We assess the maturity of individual research

streams, identify trends, and highlight direc-

tions for future work.

Our survey sheds light on the evolution and cur-

rent research progress regarding KGs in

NLP

. Al-

though we cannot achieve complete coverage of all

relevant papers on this topic, we aim at providing

a representative overview that can help both

NLP

scholars and practitioners by offering a starting

point in the literature. Moreover, our multifaceted

analysis can guide the research community in clos-

ing existing gaps and ﬁnding novel ways how to

combine KGs with NLP.

2 Related Work

Related literature that includes both KGs and

NLP

seems to be relatively scarce. Most survey papers

focus either only on KGs or only on NLP. In their

broad introduction to KGs, Hogan et al. (2022)

point out that existing surveys on KGs tend to re-

volve around speciﬁc aspects of KGs, most com-

monly their construction and embedding.

Such surveys with a KG focus usually bring up

NLP

only in the context of employed

NLP

meth-

ods, like information extraction, being used to pop-

ulate and reﬁne graphs (Nickel et al.,2016). Other

surveys on KGs mention some downstream appli-

cations of KGs for

NLP

tasks, such as for con-

structing augmented language models, question

answering over knowledge bases (

KBQA

), or rec-

ommender systems (Ji et al.,2021).

As noted previously, related work that includes

both KGs and

NLP

strictly focus on a speciﬁc ap-

plication or task. For example, Safavi and Koutra

(2021) provide an overview on applying relational

world knowledge from KGs to augment large con-

textual language models. Other surveys on speciﬁc

applications include

reasoning (Chen et al.,

2019), biomedical KGs (Nicholson and Greene,

2020), and the task of KBQA (Fu et al.,2020).

The survey on graphs in

NLP

by Nastase et al.

(2015) covers only smaller graphs such as depen-

dency graphs and dialogue trees. Even though it

does not include KGs, the survey concludes that

graphs are a powerful representation formalism and

how

NLP

tasks can beneﬁt from harnessing the po-

tential of data presented in graph structures.

To the best of our knowledge, this is the ﬁrst

survey covering a wide spectrum of techniques,

methods as well as applications of KGs within the

NLP research ﬁeld.

3 Method

To achieve our objective of providing a thorough

overview of the research landscape, we conducted

a systematic mapping study following the process

deﬁned by Petersen et al. (2008). Its three main

steps are explained in the next subsections.

3.1 Research Questions

The goal of our study is a multifaceted analysis

of KGs in the ﬁeld of

NLP

, such as identifying

and quantifying research topics, domains, and out-

comes. These objectives are reﬂected in the re-

search questions (RQs) stated below.

RQ1

: What are the characteristics and trends of

the research literature on KGs in NLP?

RQ2

: What are the different tasks mentioned in

the existing research studies?

RQ3

: What are the research types and main

contributions of the studies?

3.2 Search and Screening Procedure

After specifying the RQs, we deﬁned a set of re-

lated keywords for KGs and

NLP

to be used for

the database search of relevant studies. From ini-

tial test searches, we observed that including terms

associated with KGs (e.g., “semantic network” or

“ontology”) yielded too many irrelevant results. To

restrict the research scope to the concept of KGs,

we decided to use the following search string:

("knowledge graph") AND ("NLP" OR "natu-

ral language processing" OR "computational lin-

guistics"). The search string was applied to title,

abstract, and keywords. If a given paper had no key-

words, we used index keywords from the database

if they were available.

For our search of relevant publications, we

queried six academic databases, as listed in Table

1. The ACL Anthology is a digital archive of presti-

gious conferences and journals in

NLP

. ACM and

IEEE provide access to publications of additional

reputable venues in the broader computer science

ﬁeld. The remaining databases are commonly cho-

sen in other related surveys to further increase the

coverage of the respective ﬁeld of interest.

In the ﬁrst week of 2022, we applied our search

string to the databases and restricted the time win-

dow to ten years from 2012 until 2021. Then, the

exported ﬁles were merged, ensuring that each pub-

lication record was either a conference or a journal

paper. We automatically identiﬁed and removed du-

plicate records as well. Through this, we obtained

a dataset of 746 unique papers. Given this initial

dataset, we further ﬁltered down the truly relevant

studies by screening for the following inclusion cri-

teria: (1) peer-reviewed studies from conferences

or journals, (2) studies with a clear focus on KGs

NLP

, (3) studies are written in English and full

texts are electronically accessible. In reverse, this

implies the publications that did not satisfy all three

inclusion criteria were excluded from the dataset.

As part of the screening procedure, two of the

authors read title, abstract, and keywords to deter-

Academic Database No. of Papers

ACL Anthology 164

ACM Digital Library 26

IEEE Xplore 76

ScienceDirect 34

Scopus 200

Web of Science 7

Total 507

Table 1: Overview of academic databases and number

of included papers.

mine if a paper matched the inclusion criteria. In

ambiguous cases, the full text of the paper was ex-

amined. The two authors screened all papers and

decided together on keeping or dropping records

from the dataset. The ﬁnal dataset included a total

of 507 papers, as listed in Table 1. We make our an-

notated dataset available through a public GitHub

repository.1

3.3 Classiﬁcation Scheme and Data

Extraction

According to our RQs, the included papers had to

be categorized with respect to three facets: task,

research type, and contribution. Established classi-

ﬁcation schemes from Wieringa et al. (2006) and

Shaw (2003) were adapted for the research and

contribution type as presented in Appendix A. For

classifying tasks, we constructed a task taxonomy,

following the iterative procedure suggested by Pe-

tersen et al. (2008), in which an initial classiﬁca-

tion scheme derived from keywords continuously

evolves through adding, merging, or splitting cate-

gories during the classiﬁcation process. Our task

taxonomy is based on existing schemes from Paul-

heim (2017), Liu et al. (2020a), and Ji et al. (2021).

Once the initial schemes were set up, all papers

were sorted into the classes as part of the data ex-

traction process. The 507 included studies were

divided between two of the authors. In regular ses-

sions, they discussed changes to the classiﬁcation

schemes or clariﬁed uncertain labels. While each

paper got assigned one label for the research type

assigned, multiple labels were possible with regard

to tasks and contributions. To assess the reliability

of the inter-annotator agreement, the two authors

independently classiﬁed a random sample of 50

papers. We calculated Cohen’s Kappa coefﬁcient

of these annotations for each facet (Cohen,1960).

1https://github.com/sebischair/KG-in-NLP-survey

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ADecadeofKnowledgeGraphsinNaturalLanguageProcessing:ASurveyPhillipSchneider1,TimSchopf1,JurajVladika1,MikhailGalkin2,ElenaSimperl3andFlorianMatthes11TechnicalUniversityofMunich,DepartmentofComputerScience,Germany2MilaQuebecAIInstitute&McGillUniversity,SchoolofComputerScience,Canada3King'sCollegeLond...

展开>> 收起<<

A Decade of Knowledge Graphs in Natural Language Processing A Survey Phillip Schneider1 Tim Schopf1 Juraj Vladika1 Mikhail Galkin2 Elena Simperl3and Florian Matthes1.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Decade of Knowledge Graphs in Natural Language Processing A Survey Phillip Schneider1 Tim Schopf1 Juraj Vladika1 Mikhail Galkin2 Elena Simperl3and Florian Matthes1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: