Step out of KG Knowledge Graph Completion via Knowledgeable Retrieval and Reading Comprehension Xin Lv12 Yankai Lin3 Zijun Yao12 Kaisheng Zeng12

2025-04-26 1 0 295.33KB 8 页 10玖币

侵权投诉

Step out of KG: Knowledge Graph Completion via

Knowledgeable Retrieval and Reading Comprehension

Xin Lv1,2, Yankai Lin3, Zijun Yao1,2, Kaisheng Zeng1,2

Jiajie Zhang1,2,Lei Hou1,2,Juanzi Li1,2

1Department of Computer Science and Technology, BNRist

2KIRC, Institute for Artiﬁcial Intelligence, Tsinghua University, Beijing 100084, China

3Renmin University of China, Beijing 100872, China

lv-x18@mails.tsinghua.edu.cn

Abstract

Knowledge graphs, as the cornerstone of many

AI applications, usually face serious incom-

pleteness problems. In recent years, there

have been many efforts to study automatic

knowledge graph completion (KGC), most of

which use existing knowledge to infer new

knowledge. However, in our experiments, we

ﬁnd that not all relations can be obtained by

inference, which constrains the performance

of existing models. To alleviate this prob-

lem, we propose a new model based on infor-

mation retrieval and reading comprehension,

namely IR4KGC. Speciﬁcally, we pre-train a

knowledge-based information retrieval mod-

ule that can retrieve documents related to the

triples to be completed. Then, the retrieved

documents are handed over to the reading com-

prehension module to generate the predicted

answers. In experiments, we ﬁnd that our

model can well solve relations that cannot be

inferred from existing knowledge, and achieve

good results on KGC datasets.

1 Introduction

Knowledge Graphs (KGs), which represent knowl-

edge as structured triples, are the infrastructure for

many AI studies. However, most real KGs face seri-

ous incompleteness problems. For example, about

71% of people in Freebase (Bollacker et al.,2008)

lack birthplace information (Dong et al.,2014),

which limits the performance of downstream tasks.

To alleviate the incompleteness of KGs, knowl-

edge graph completion (KGC) task is proposed,

which usually uses the schema of KGs to determine

which knowledge is missing and use KGC models

to complete it. Among these models, knowledge

graph embedding (KGE) models (Bordes et al.,

2013) are dominant, which usually embed entities

and relations into vector spaces and predict missing

knowledge based on vector operations.

Relation MRR Hits@10

cause of death .099 .162

/people/person/place_of_birth .075 .134

ethnic group .452 .684

/people/ethnicity/languages_spoken .325 .738

Table 1: Performance gap of KGE model TuckER (Bal-

aževi´

c et al.,2019) on different relations. The bolded

red relations are difﬁcult to infer from existing knowl-

edge and the KGE model struggles with its perfor-

mance.

However, the effectiveness of KGE models relies

on the assumption that the missing knowledge in

the KGs can be inferred from existing knowledge.

Although this assumption holds for most relations

in the KGs, there still exist several non-negligible

exceptions. We refer to these exception relations

as uninferable relations and the others as inferable

relations. For example, for the relation cause of

death, it is difﬁcult to infer the cause of someone’s

death from what we already know. Table 1gives

the performance of the KGE model on different re-

lations, which shows that the KGE model struggles

with the uninferable relations.

A reasonable solution to complete the uninfer-

able relations is to extract the knowledge from the

corresponding text. There are two main types of re-

lated KGC models: (1) KGE models that introduce

description information of entities (Wang et al.,

2021); (2) models based on pre-trained language

model (PLM) (Saxena et al.,2022). However, both

types of models have corresponding drawbacks,

where the former cannot guarantee that the knowl-

edge to be completed is in the description and the

latter depend on the knowledge contained in the

PLM. In addition to KGC models, relation extrac-

tion (RE) models (Levy et al.,2017) can also de-

rive knowledge from text. But they do not apply to

arXiv:2210.05921v1 [cs.CL] 12 Oct 2022

this task since they require the corresponding text,

which is missing in the KGC task.

To complete uninferable relations more accu-

rately, we are inspired by Open-domain Question

Answering (OpenQA) models (Guu et al.,2020;

Lewis et al.,2020) and propose a novel KGC

method based on information retrieval and machine

reading comprehension, namely IR4KGC. Specif-

ically, the triple query

(h, r, ?)

is ﬁrstly converted

to a search query containing its knowledge seman-

tics. Then, our model uses a pre-trained knowledge

retrieval module to retrieve documents that match

the search query and generates the ﬁnal predictions

based on a generative PLM. Besides, the retrieved

documents provide additional interpretability.

Most of the existing OpenQA models are based

on Dense Retrieval (Karpukhin et al.,2020) or

BM25 (Robertson et al.,2009). These retrieval

modules can retrieve natural language queries well,

but it is challenging to handle search queries con-

taining rich knowledge semantics in the KGC task.

To solve this problem, we construct a training cor-

pus for retrieval based on the idea of distant supervi-

sion and pre-train our knowledge retrieval module

on the task KGC. Thus it can better capture the

knowledge semantic information contained in the

search query and return more relevant documents.

Experimental results on two KGC datasets show

that IR4KGC achieves superior results on uninfer-

able relations over the KGE models. In addition,

the combination of IR4KGC and the KGE model

achieves the best performance on all datasets.

2 Related Work

2.1 Knowledge Graph Completion

KGE models are the main components of knowl-

edge graph completion models. KGE models

can be divided into four main categories: (1)

translation-based models (Bordes et al.,2013;Sun

et al.,2019); (2) models based on tensor decompo-

sition (Nickel et al.,2011;Balaževi´

c et al.,2019);

(3) models based on neural networks (Socher et al.,

2013;Dettmers et al.,2018); (4) models that intro-

duce additional information (Lin et al.,2016;Wang

et al.,2021). As we introduced in Section 1, KGE

models struggle with uninferable relations.

There are some PLM-based KGC models being

proposed in recent years, most of which use PLM

to determine the correctness of a given triple (Yao

et al.,2019;Lv et al.,2022) or to directly gener-

ate the predicted tail entities (Saxena et al.,2022).

Implicit knowledge in PLM can help the model to

complete uninferable relations. But these models

still have drawbacks since it is difﬁcult for PLM to

accurately remember all knowledge in the world.

RE models can also complete knowledge from

text. However, RE aims to extract all the knowl-

edge from text, and it is difﬁcult to do speciﬁc

knowledge completion. Furthermore, the text re-

quired for RE is also missing, making RE unsuit-

able for the KGC task in this paper.

2.2 Open-domain Question Answering

Open-domain Question Answering aims to answer

open-domain questions without context. Most of

the OpenQA models in recent years have adopted

the retrieving and reading pipeline (Chen et al.,

2017;Guu et al.,2020;Lewis et al.,2020). Specif-

ically, these models use retrieval modules such

as Dense Retrieval (Karpukhin et al.,2020) or

BM25 (Robertson et al.,2009) to retrieve rele-

vant documents and give answers using extraction

or generation-based methods. However, these re-

trieval modules are difﬁcult to adapt to KGC tasks

and have low retrieval efﬁciency. In addition, there

are some OpenQA models based on knowledge-

guided retrieval (Min et al.,2019;Asai et al.,2019),

but they are limited by KGs with Wikipedia links

and are difﬁcult to adapt to most KGs.

3 Method

Given a triple query

(h, r, ?)

, where

is the head

entity and

is the relation, we transform it into

a search query and retrieve relevant documents

using our retrieval module. After that, the con-

ditional generation module generates predicted an-

swers based on the documents. These two modules

are optimized jointly following Lewis et al. (2020).

3.1 Knowledge-based Information Retrieval

Triple Query Transformation

. For a triple query

tq = (h, r, ?)

, we have two functions to con-

vert it into a search query, denoted as

and

FLA

FL(tq) = LABEL(h)kLABEL(r)

, where

LABEL(x)

is the label corresponding to

and

denotes the concatenation operation.

FLA

uses

alias to increase the query diversity. Speciﬁcally,

FLA(tq) = TEXT(h)kTEXT(r)

, where

TEXT(x)

has a

50%

probability of being a label of

and a

50% probability of being a random alias of x.

Pre-training Method

. Following the training ap-

proach of DPR (Karpukhin et al.,2020), we pre-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

StepoutofKG:KnowledgeGraphCompletionviaKnowledgeableRetrievalandReadingComprehensionXinLv1;2,YankaiLin3,ZijunYao1;2,KaishengZeng1;2JiajieZhang1;2,LeiHou1;2,JuanziLi1;21DepartmentofComputerScienceandTechnology,BNRist2KIRC,InstituteforArticialIntelligence,TsinghuaUniversity,Beijing100084,China3Renmin...

展开>> 收起<<

Step out of KG Knowledge Graph Completion via Knowledgeable Retrieval and Reading Comprehension Xin Lv12 Yankai Lin3 Zijun Yao12 Kaisheng Zeng12.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Step out of KG Knowledge Graph Completion via Knowledgeable Retrieval and Reading Comprehension Xin Lv12 Yankai Lin3 Zijun Yao12 Kaisheng Zeng12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: