Step out of KG Knowledge Graph Completion via Knowledgeable Retrieval and Reading Comprehension Xin Lv12 Yankai Lin3 Zijun Yao12 Kaisheng Zeng12

2025-04-26 0 0 295.33KB 8 页 10玖币
侵权投诉
Step out of KG: Knowledge Graph Completion via
Knowledgeable Retrieval and Reading Comprehension
Xin Lv1,2, Yankai Lin3, Zijun Yao1,2, Kaisheng Zeng1,2
Jiajie Zhang1,2,Lei Hou1,2,Juanzi Li1,2
1Department of Computer Science and Technology, BNRist
2KIRC, Institute for Artificial Intelligence, Tsinghua University, Beijing 100084, China
3Renmin University of China, Beijing 100872, China
lv-x18@mails.tsinghua.edu.cn
Abstract
Knowledge graphs, as the cornerstone of many
AI applications, usually face serious incom-
pleteness problems. In recent years, there
have been many efforts to study automatic
knowledge graph completion (KGC), most of
which use existing knowledge to infer new
knowledge. However, in our experiments, we
find that not all relations can be obtained by
inference, which constrains the performance
of existing models. To alleviate this prob-
lem, we propose a new model based on infor-
mation retrieval and reading comprehension,
namely IR4KGC. Specifically, we pre-train a
knowledge-based information retrieval mod-
ule that can retrieve documents related to the
triples to be completed. Then, the retrieved
documents are handed over to the reading com-
prehension module to generate the predicted
answers. In experiments, we find that our
model can well solve relations that cannot be
inferred from existing knowledge, and achieve
good results on KGC datasets.
1 Introduction
Knowledge Graphs (KGs), which represent knowl-
edge as structured triples, are the infrastructure for
many AI studies. However, most real KGs face seri-
ous incompleteness problems. For example, about
71% of people in Freebase (Bollacker et al.,2008)
lack birthplace information (Dong et al.,2014),
which limits the performance of downstream tasks.
To alleviate the incompleteness of KGs, knowl-
edge graph completion (KGC) task is proposed,
which usually uses the schema of KGs to determine
which knowledge is missing and use KGC models
to complete it. Among these models, knowledge
graph embedding (KGE) models (Bordes et al.,
2013) are dominant, which usually embed entities
and relations into vector spaces and predict missing
knowledge based on vector operations.
Relation MRR Hits@10
cause of death .099 .162
/people/person/place_of_birth .075 .134
ethnic group .452 .684
/people/ethnicity/languages_spoken .325 .738
Table 1: Performance gap of KGE model TuckER (Bal-
aževi´
c et al.,2019) on different relations. The bolded
red relations are difficult to infer from existing knowl-
edge and the KGE model struggles with its perfor-
mance.
However, the effectiveness of KGE models relies
on the assumption that the missing knowledge in
the KGs can be inferred from existing knowledge.
Although this assumption holds for most relations
in the KGs, there still exist several non-negligible
exceptions. We refer to these exception relations
as uninferable relations and the others as inferable
relations. For example, for the relation cause of
death, it is difficult to infer the cause of someone’s
death from what we already know. Table 1gives
the performance of the KGE model on different re-
lations, which shows that the KGE model struggles
with the uninferable relations.
A reasonable solution to complete the uninfer-
able relations is to extract the knowledge from the
corresponding text. There are two main types of re-
lated KGC models: (1) KGE models that introduce
description information of entities (Wang et al.,
2021); (2) models based on pre-trained language
model (PLM) (Saxena et al.,2022). However, both
types of models have corresponding drawbacks,
where the former cannot guarantee that the knowl-
edge to be completed is in the description and the
latter depend on the knowledge contained in the
PLM. In addition to KGC models, relation extrac-
tion (RE) models (Levy et al.,2017) can also de-
rive knowledge from text. But they do not apply to
arXiv:2210.05921v1 [cs.CL] 12 Oct 2022
this task since they require the corresponding text,
which is missing in the KGC task.
To complete uninferable relations more accu-
rately, we are inspired by Open-domain Question
Answering (OpenQA) models (Guu et al.,2020;
Lewis et al.,2020) and propose a novel KGC
method based on information retrieval and machine
reading comprehension, namely IR4KGC. Specif-
ically, the triple query
(h, r, ?)
is firstly converted
to a search query containing its knowledge seman-
tics. Then, our model uses a pre-trained knowledge
retrieval module to retrieve documents that match
the search query and generates the final predictions
based on a generative PLM. Besides, the retrieved
documents provide additional interpretability.
Most of the existing OpenQA models are based
on Dense Retrieval (Karpukhin et al.,2020) or
BM25 (Robertson et al.,2009). These retrieval
modules can retrieve natural language queries well,
but it is challenging to handle search queries con-
taining rich knowledge semantics in the KGC task.
To solve this problem, we construct a training cor-
pus for retrieval based on the idea of distant supervi-
sion and pre-train our knowledge retrieval module
on the task KGC. Thus it can better capture the
knowledge semantic information contained in the
search query and return more relevant documents.
Experimental results on two KGC datasets show
that IR4KGC achieves superior results on uninfer-
able relations over the KGE models. In addition,
the combination of IR4KGC and the KGE model
achieves the best performance on all datasets.
2 Related Work
2.1 Knowledge Graph Completion
KGE models are the main components of knowl-
edge graph completion models. KGE models
can be divided into four main categories: (1)
translation-based models (Bordes et al.,2013;Sun
et al.,2019); (2) models based on tensor decompo-
sition (Nickel et al.,2011;Balaževi´
c et al.,2019);
(3) models based on neural networks (Socher et al.,
2013;Dettmers et al.,2018); (4) models that intro-
duce additional information (Lin et al.,2016;Wang
et al.,2021). As we introduced in Section 1, KGE
models struggle with uninferable relations.
There are some PLM-based KGC models being
proposed in recent years, most of which use PLM
to determine the correctness of a given triple (Yao
et al.,2019;Lv et al.,2022) or to directly gener-
ate the predicted tail entities (Saxena et al.,2022).
Implicit knowledge in PLM can help the model to
complete uninferable relations. But these models
still have drawbacks since it is difficult for PLM to
accurately remember all knowledge in the world.
RE models can also complete knowledge from
text. However, RE aims to extract all the knowl-
edge from text, and it is difficult to do specific
knowledge completion. Furthermore, the text re-
quired for RE is also missing, making RE unsuit-
able for the KGC task in this paper.
2.2 Open-domain Question Answering
Open-domain Question Answering aims to answer
open-domain questions without context. Most of
the OpenQA models in recent years have adopted
the retrieving and reading pipeline (Chen et al.,
2017;Guu et al.,2020;Lewis et al.,2020). Specif-
ically, these models use retrieval modules such
as Dense Retrieval (Karpukhin et al.,2020) or
BM25 (Robertson et al.,2009) to retrieve rele-
vant documents and give answers using extraction
or generation-based methods. However, these re-
trieval modules are difficult to adapt to KGC tasks
and have low retrieval efficiency. In addition, there
are some OpenQA models based on knowledge-
guided retrieval (Min et al.,2019;Asai et al.,2019),
but they are limited by KGs with Wikipedia links
and are difficult to adapt to most KGs.
3 Method
Given a triple query
(h, r, ?)
, where
h
is the head
entity and
r
is the relation, we transform it into
a search query and retrieve relevant documents
using our retrieval module. After that, the con-
ditional generation module generates predicted an-
swers based on the documents. These two modules
are optimized jointly following Lewis et al. (2020).
3.1 Knowledge-based Information Retrieval
Triple Query Transformation
. For a triple query
tq = (h, r, ?)
, we have two functions to con-
vert it into a search query, denoted as
FL
and
FLA
.
FL(tq) = LABEL(h)kLABEL(r)
, where
LABEL(x)
is the label corresponding to
x
and
k
denotes the concatenation operation.
FLA
uses
alias to increase the query diversity. Specifically,
FLA(tq) = TEXT(h)kTEXT(r)
, where
TEXT(x)
has a
50%
probability of being a label of
x
and a
50% probability of being a random alias of x.
Pre-training Method
. Following the training ap-
proach of DPR (Karpukhin et al.,2020), we pre-
摘要:

StepoutofKG:KnowledgeGraphCompletionviaKnowledgeableRetrievalandReadingComprehensionXinLv1;2,YankaiLin3,ZijunYao1;2,KaishengZeng1;2JiajieZhang1;2,LeiHou1;2,JuanziLi1;21DepartmentofComputerScienceandTechnology,BNRist2KIRC,InstituteforArticialIntelligence,TsinghuaUniversity,Beijing100084,China3Renmin...

收起<<
Step out of KG Knowledge Graph Completion via Knowledgeable Retrieval and Reading Comprehension Xin Lv12 Yankai Lin3 Zijun Yao12 Kaisheng Zeng12.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:295.33KB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注