PROVE A P IPELINE FOR AUTOMATED PROVENANCE VERIFICATION OF KNOWLEDGE GRAPHS AGAINST TEXTUAL SOURCES

2025-05-02 0 0 1.15MB 30 页 10玖币

侵权投诉

PROVE:APIPELINE FOR AUTOMATED PROVENANCE

VERIFICATION OF KNOWLEDGE GRAPHS AGAINST TEXTUAL

SOURCES

A PREPRINT

Gabriel Amaral1[0000−0002−4482−5376] Odinaldo Rodrigues1[0000−0001−7823−1034]

Elena Simperl1[0000−0003−1722−947X]

October 27, 2022

ABSTRACT

Knowledge Graphs are repositories of information that gather data from a multitude of domains and

sources in the form of semantic triples, serving as a source of structured data for various crucial ap-

plications in the modern web landscape, from Wikipedia infoboxes to search engines. Such graphs

mainly serve as secondary sources of information and depend on well-documented and veriﬁable

provenance to ensure their trustworthiness and usability. However, their ability to systematically

assess and assure the quality of this provenance, most crucially whether it properly supports the

graph’s information, relies mainly on manual processes that do not scale with size. ProVe aims at

remedying this, consisting of a pipelined approach that automatically veriﬁes whether a Knowl-

edge Graph triple is supported by text extracted from its documented provenance. ProVe is intended

to assist information curators and consists of four main steps involving rule-based methods and ma-

chine learning models: text extraction, triple verbalisation, sentence selection, and claim veriﬁcation.

ProVe is evaluated on a Wikidata dataset, achieving promising results overall and excellent perfor-

mance on the binary classiﬁcation task of detecting support from provenance, with 87.5% accuracy

and 82.9% F1-macro on text-rich sources. The evaluation data and scripts used in this paper are

available in GitHub and Figshare.

Keywords Fact Veriﬁcation ·Data Verbalisation ·Knowledge Graphs

1 Introduction

A Knowledge Graph (KG) is a type of knowledge base that stores information in the form of semantic triples formed by

a subject, a predicate, and an object. KGs represent both real and abstract entities internally as labelled and uniquely

identiﬁable entities, such as The Moon or Happiness, and can amass information from a multitude of domains and

sources by connecting such entities amongst themselves or to literals through relationships, coded via uniquely iden-

tiﬁed predicates. KGs serve as sources of both human and machine-readable semantically structured data for various

crucial applications in the modern web landscape, such as Wikipedia infoboxes, search engines results, voice-activated

assistants, and information gathering projects [30].

Developed and maintained by ontology experts, data curators, and even anonymous volunteers, KGs have massively

grown in size and adoption in the last decade, mainly as secondary sources of information. This means not storing

new information, but taking it from authoritative and reliable sources which are explicitly referenced. As such, KGs

depend on well-documented and veriﬁable provenance to ensure they are regarded as trustworthy and usable [56].

Processes to assess and assure the quality of information provenance are thus crucial to KGs, especially measuring

and maintaining veriﬁability, i.e. the degree to which consumers of KG triples can attest these are truly supported by

their sources [56]. However, such processes are currently performed mostly manually, which does not scale with size.

Manually ensuring high veriﬁability on vital KGs such as Wikidata and DBpedia is prohibitive due to their sheer size.

arXiv:2210.14846v1 [cs.CL] 26 Oct 2022

ProVe: A Pipeline for Automated Provenance Veriﬁcation of Knowledge Graphs against Textual SourcesA PREPRINT

ProVe (Provenance Veriﬁcation) is proposed to assist data curators and editors in handling the upkeep of KG veriﬁa-

bility. It consists of an automated approach that leverages state-of-the-art Natural Language Processing (NLP) models,

public datasets on data verbalisation and fact veriﬁcation, as well as rule-based methods. ProVe consists of a pipeline

that aims at automatically verifying whether a KG triple is supported by a web page that is documented as its prove-

nance. ProVe ﬁrst extracts text passages from the triple’s reference. Then, it verbalises the KG triple and ranks the

extracted passages according to their relevance to the triple. The most relevant passages have their stances towards

the KG triple determined (i.e. supporting, refuting, neither) and ﬁnally ProVe estimates whether the whole reference

supports the triple.

This task is a speciﬁc application of Automated Fact Checking (AFC), also known as AFC on KGs. AFC is a currently

well explored topic of research with several published papers, surveys, and datasets [47, 57, 15, 43, 29, 27, 16, 58, 28,

59, 49, 48, 38], and generally deﬁned as the veriﬁcation of a natural language claim by collecting and reasoning over

evidence extracted from text documents or structured data sources. Both the veriﬁcation verdict and the collected

evidence are the main outputs. While general AFC takes a textual claim and a searchable evidence base as inputs,

AFC on KGs takes a single KG triple and its documented provenance in the form of an external reference.

Approaches tackling AFC on KGs are very few, with the two only works of note in a similar direction, as far as is

known by the authors, being DeFacto [25, 14] and its successor FactCheck [45]. While they tackle this task mostly as

deﬁned above, they rely on a searchable document base instead of a given reference and judge triples on a true-false

spectrum instead of veriﬁability. Like these few approaches, ProVe diverges from the general AFC framework and

introduces a few different sub-tasks. Still, it makes use of the current state-of-the-art on those subtasks in common,

being the ﬁrst approach to tackle AFC on KGs with large pre-trained Language Models (LMs), which can be expanded

to work in languages other than English and beneﬁts from an Active Learning scenario.

ProVe is evaluated on an annotated dataset of Wikidata triples and their references, combining multiple types of prop-

erties and web domains. ProVe achieves promising results overall (75% accuracy and 68.1% F1-macro) on classifying

references as either supporting their triples or not, with an excellent performance on explicit and naturally written

references (87.5% accuracy, 82.9% F1-macro, 0.908 AUC). Additionally, ProVe assesses passage relevance with a

strong positive correlation (0.5058 Pearson’s r) to human judgements.

In summary, this paper’s main contributions are:

1. A novel pipelined approach to evidence-based Automated Fact Checking on Knowledge Graphs based on

large Language Models;

2. A benchmarking dataset of Wikidata triples and references for Automated Fact Checking on Knowledge

Graphs, covering a variety of information domains as well a balanced sample of diverse web domains;

3. Novel crowdsourcing task designs that facilitate repeatable, quick, and large-scale collection of human anno-

tations on passage relevance and textual entailment at good agreement levels.

These contributions directly aid KG curators, editors, and researchers in improving KG provenance. Properly deployed,

ProVe can do so in multiple ways. Firstly, by assisting the detection of veriﬁability issues in existing references,

bringing them to the attention of humans. Secondly, given a triple and its reference, it can promote re-usability of the

reference by verifying it against neighbouring triples. Finally, given a new KG triple entered by editors or suggested

by KG completion processes, it can analyse and suggest references. The remainder of this paper is structured as

follows. Section 2 explores related work on KG data quality, mainly veriﬁability, as well as approaches to AFC on

KGs. Section 3 presents ProVe’s formulation and covers each of its modules in detail. Section 4 presents an evaluation

dataset consisting of triple-reference pairs, including its generation and its annotation. Section 5 details the results of

ProVe’s evaluation. Finally, Section 6 delivers discussions around this work and ﬁnal conclusions. All code and data

used in this paper are available on Figshare 1and GitHub. 2,3

2 Related Work

ProVe attempts to solve the task of AFC on KGs, with the purpose of assisting data curators in improving the veriﬁ-

ability of KGs. Thus, to understand how ProVe approaches this task, it is important to ﬁrst understand how the data

quality dimension of veriﬁability is currently deﬁned and measured in KGs, as well as how state-of-the-art approaches

to general AFC and AFC on KGs tackle these tasks and how ProVe learns or differs from them.

1https://ﬁgshare.com/s/df0ec1c233ebd50817f4

2https://anonymous.4open.science/r/RSP-F367/

3https://anonymous.4open.science/r/ClaimVeriﬁcationHIT-A04D

ProVe: A Pipeline for Automated Provenance Veriﬁcation of Knowledge Graphs against Textual SourcesA PREPRINT

2.1 Veriﬁability in KGs

In order to properly evaluate the degree to which ProVe adequately predicts veriﬁability, this dimension ﬁrst needs to

be well deﬁned and a strategy needs to be established to measure it given an evaluation dataset. Veriﬁability in the

context of KGs, whose information is mainly secondary, is deﬁned as the degree to which consumers of KG triples

can attest these are truly supported by their sources [56]. It is an essential aspect of trustworthiness [56, 11, 34], yet is

amongst the least explored quality dimensions [56, 34], with most measures carried superﬁcially, unlike correctness or

consistency [34, 40, 2, 22, 1].

For instance, Farber et al. [11] measure veriﬁability only by considering whether any provenance is provided at all.

Flouris et al. [12] look deeper into sources’ contents, but only verify speciﬁc and handcrafted irrationalities, such

as a city being founded before it had citizens. Algorithmic indicators are not suited to directly measure veriﬁabil-

ity, as sources are varied and natural language understanding is needed. As such, recent works [33, 3] measure KG

veriﬁability through crowdsourced manual veriﬁcation, giving crowdworkers direct access to triples and references.

Crowdsourcing allows for more subjective and nuanced metrics to be implemented, as well as for natural text compre-

hension [55, 8].

Thus, this paper employs crowdsourcing in order to measure veriﬁability metrics of individual triple-reference pairs.

By comparing a pair’s metrics with ProVe’s outputs given said pair as input, ProVe and its components can be eval-

uated. Like similar crowdsourcing studies [33, 3], multiple quality assurance techniques are implemented to ensure

collected annotations are trustworthy [10]. To the best of the authors’ knowledge, this is the ﬁrst work to use crowd-

sourcing as a tool to measure the relevance and stance of references in regards to KG triples at levels varying from

whole references to individual text passages.

2.2 Automated Fact Checking on Knowledge Graphs

General AFC

Automated Fact Checking (AFC) is a topic of several works of research, datasets, and surveys [47, 57, 15, 43, 29, 27,

16, 58, 28, 59, 49, 48, 38]. AFC is commonly deﬁned in the Natural Language Processing (NLP) domain as a broader

category of tasks and subtasks [47, 57, 15] whose goal is to, given a textual claim and searchable document corpora as

inputs, verify said claim’s veracity or support by collecting and reasoning over evidence. Such evidence is extracted

from the input document corpora and constitutes AFC’s output alongside the claim’s verdict. While a detailed explo-

ration of individual AFC state-of-the-art approaches is out of this paper’s scope, it is crucial to deﬁne their general

framework in order to properly cover ProVe’s architecture.

A general framework for AFC has been identiﬁed by recent surveys [57, 15], and can be seen in Figure 1. Zeng et

al. [57] deﬁne it as a multi-step process where each step can be tackled as a subtask. Firstly, a claim detection step

identiﬁes which claims need to be veriﬁed. Based on such claims, a document retrieval step gathers documents that

might contain information relevant to verifying the claim. A sentence selection step then identiﬁes and extracts from

retrieved documents a set of few individual text passages that are deemed the most relevant. Based on these passages,

aclaim veriﬁcation step provides the ﬁnal verdict. Guo et al. [15] add that a ﬁnal justiﬁcation production step is

crucial for explainability. Given the framework’s nature, it is no wonder pipelined approaches are extremely popular

and compose the current state-of-the-art.

Claim

Detection Document

Retrieval

Claim Evidence

Selection Claim

Verification

Statements Final Verdict

Document

Corpora

Relevant

Evidence Justification

Production Justification

Fig. 1. Overview of a general AFC pipeline. White diamond blocks are documents and objects, and grey square blocks are AFC

subtasks. Speciﬁc formulations and implementation of course might differ.

AFC mainly deals with text, both as claims to be veriﬁed and as evidence documents, due to recent advances in

this direction being greatly facilitated by textual resources like the FEVER shared task [49] and its associated large-

scale benchmark FEVER dataset [48]. Still, some tasks in AFC take semantic triples as veriﬁable claims, either from

KGs [41, 21] or by extracting them from text. Some also utilise KGs as reasoning structures from where to draw

ProVe: A Pipeline for Automated Provenance Veriﬁcation of Knowledge Graphs against Textual SourcesA PREPRINT

evidence [15, 50, 46, 9, 42]. For instance, Thorne and Vlachos [46] directly map claims found in text to triples in a KG

to verify numerical values. Both Ciampaglia et al. [9] and Shiralkar et al. [42] use entity paths in DBpedia to verify

triples extracted from text. Other approaches based on KG embeddings associate the likelihood of a claim being true

to that of it belonging as a new triple to a KG [5, 18].

These tasks, while incorporating semantic triples and KGs, can not be exactly deﬁned as AFC on KG; either the

veriﬁed triples do not come from full and consistent KGs, or the evidence used for reasoning is not taken from sources

that could serve as provenance, but inferred from the graph itself.

AFC on KGs

AFC on KGs is a more speciﬁc task within AFC, explored by a handful of approaches, the most prominent of which

are DeFacto [14] and its successor FactCheck [45]. Its main purpose is to ensure KGs are ﬁt for use by asserting

whether their information is veriﬁable by trustworthy evidence. Given a KG triple and either its documented external

provenance or searchable external document corpora whose items could be used as provenance, AFC on KGs can be

deﬁned as the automated veriﬁcation of said triple’s veracity or support by collecting and reasoning over evidence

extracted from such actual or potential provenance. Its outputs are the verdict and the evidence used.

KGCleaner [31] uses string matching and manual mappings to retrieve sentences relevant to a KG triple from a docu-

ment, using embeddings and handcrafted features to predict the triple’s credibility. Leopard [44] validates KG triples

for three speciﬁc organisation properties, using speciﬁcally designed extractions from HTML content. Both approaches

entail manual work overhead, cover a limited amount of predicates, and do provide human-readable evidence.

DeFacto [14] and its successor FactCheck [45] represent the current state-of-the-art on this task. They verbalise KG

triples using text patterns and use it to retrieve web pages with related content. They then score sentences based on

relevance to the claim and use a supervised classiﬁer to classify the entire web page. Despite their good performance,

both approaches depend on string matching, which might miss verbalisations that are more nuanced and also entail

considerable overhead for unseen predicates. ProVe, on the other hand, covers any non-ontological predicate (such as

subclass of and main category of ) by using pre-trained LMs that leverage context and meaning to infer verbalisations.

Due to its speciﬁc application scenario, approaches tackling AFC on KGs differ from the general framework [57, 15]

seen in Figure 1. A claim detection step is not deemed necessary, as triples are trivial to extract and it is commonly as-

sumed they all need verifying. Alternatively, triples with particular predicates can be easily selected. The existence of a

document retrieval step depends on whether provenance exists or needs to be searched from a repository, with the for-

mer scenario dismissing the need for the step. This is the case for ProVe, but not for DeFacto [14] and FactCheck [45],

which search for web documents.

Additionally, KG triples are often not understood by the components’ main labels alone. Descriptions, alternative

labels, editor conventions, and discussion boards help deﬁne their proper usage and interpretation, rendering their

meaning not trivial, in contrast to the natural language sentences tackled by general AFC. As such, approaches tack-

ling AFC on KGs rely on transforming KG triples into natural sentences [25, 14, 45] through an additional claim

verbalisation step. While both DeFacto [14] and FactCheck [45] rely on sentence patterns that are completed with the

components’ labels, ProVe relies on state-of-the-art Language Models (LMs) for data-to-text conversion.

Lastly, evidence document corpora normally used in general AFC tend to have a standard structure or come from a

speciﬁc source. Both FEVER [48] and VitaminC [39] take their evidence sets from Wikipedia, with FEVER’s even

coming pre-segmented as individual and clean sentences. Vo and Lee [51] use web articles from snopes.com and

politifact.com only. KGs, however, accept provenance from potentially any website domains. As such, unlike general

AFC approaches, ProVe employs a text extraction step in order to retrieve and segment text from triples’ references.

While previous approaches simply remove HTML markup, ProVe employs a rule-based approach that allows for more

ﬂexibility.

Large Pre-trained Language Models on AFC

Advances towards textual evidence-based AFC, particularly the sentence selection and claim veriﬁcation subtasks,

have been facilitated by resources like the FEVER [49, 48] shared task and its benchmarking dataset. The FEVER

dataset consists of a large set of claims annotated with one of three classes: supported, refuted, and not enough in-

formation to determine (neither). The dataset also provides pre-extracted and segmented passages from Wikipedia as

evidence for each claim.

Tackling FEVER through pre-trained LMs [43, 27, 29] and graph networks [28, 59, 58] represents the current state-

of-the-art. While approaches using graph networks (such as KGAT [28], GEAR [59], and DREAM [58]) for claim

veriﬁcation slightly outperform those based mainly on sequence-to-sequence LMs, they still massively depend on the

ProVe: A Pipeline for Automated Provenance Veriﬁcation of Knowledge Graphs against Textual SourcesA PREPRINT

latter for sentence selection. Additionally, explainability for task-speciﬁc graph architectures, like those of KGAT and

DREAM, is harder to tackle than for generalist sequence-to-sequence LM architectures which are shared across the

research community [7, 36, 23]. Slightly decreasing potential performance in favour of a simpler and more explainable

pipeline, ProVe employs LMs for both sentence selection and claim veriﬁcation.

On sentence selection, the common strategy is to assign relevance scores to text passages based on their contextual

proximity to a veriﬁable claim. GEAR [59] does so with LSTMs, but use a BERT model to acquire text encodings.

Soleimani et al. [43], KGAT [28], and current state-of-the-art DREAM [58] outperform GEAR by directly using BERT

for the rankings, an approach ProVe also follows. Graph networks are employed at the claim veriﬁcation subtask [28,

59, 58]. Soleimani et al. [43] are among the few to achieve near state-of-the-art results using an LM and a rule-based

aggregation instead of graph networks. While ProVe handles the subtask similarly, it uses a weak classiﬁer as its ﬁnal

aggregation.

As a subtask of AFC on KGs, claim verbalisation is normally done through text patterns [14, 45] and by ﬁlling

templates [31], both of which can either be distantly learned or manually crafted. ProVe is the ﬁrst approach to utilise

an LM for this subtask. Amaral et al. [4] shows that a T5 model ﬁne-tuned on WebNLG achieves very good results

when verbalising triples from Wikidata across many domains. ProVe follows suit by also using a T5.

Table 1 shows a comparison of ProVe to other AFC approaches mentioned in this section grouped by speciﬁc task,

showcasing the particular subtasks each targets, as well as the datasets used as a basis for their evaluation. AFC on KG

is amongst the least researched tasks within AFC. ProVe is the ﬁrst to tackle it through ﬁne-tuned LMs that adapt to

unseen KG predicates and to be evaluated on a Wikidata dataset consisting of multiple non-ontological predicates.

Task Input

Type

Evidence

Source

Evidence

Returned Subtasks Evaluation

Dataset Approaches

General

text-based

AFC

Textual

claims Text Yes

DR,

SS,

FEVER [43, 29, 28, 59, 58],

Graph-based

AFC

Textual

claims KG Yes SS,

CV Freebase [50, 46]

KG triple

prediction

triples

paths Yes SS,

DBpedia,

SemMedDB,

Wikipedia

[41, 21, 9, 42],

KGE No

RE,

EL,

Kaggle,

news articles

DBpedia,

Freebase

[5, 18]

AFC

on KGs

triples Text

Yes

CVb,

DR,

TA,

DBpedia,

FactBench [25, 14, 45]

No SS,

Wikidata

(48 predicates),

SWC 2017

[31, 44]

Yes

CVb,

TR,

SS,

Wikidata

(any non-ontological

predicate)

ProVe

Table 1. Comparison between ProVe and others within AFC. KGE = KG Embeddings, DR = Document Retrieval, SS = Sentence

Selection, CV = Claim Veriﬁcation, RE = Relation Extraction, EL = Embedding Learning, CVb = Claim Verbalisation, TA =

Trustworthiness Analysis, TR = Text Retrieval.

3 Approach

ProVe consists of a pipeline for Automated Fact Checking (AFC) on Knowledge Graphs (KG) that, provided with a KG

triple that is not ontological in nature (e.g. denoting subclasses, categories, lists, etc) and its documented provenance

in the form of a web page or text document, automatically veriﬁes whether the page textually supports the triple,

retrieving from it relevant text passages that can be used as evidence. This section presents an overview of ProVe and

its task, as well as detailed descriptions of its modules and the subtasks they target.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PROVE:APIPELINEFORAUTOMATEDPROVENANCEVERIFICATIONOFKNOWLEDGEGRAPHSAGAINSTTEXTUALSOURCESAPREPRINTGabrielAmaral1[0000000244825376]OdinaldoRodrigues1[0000000178231034]ElenaSimperl1[000000031722947X]October27,2022ABSTRACTKnowledgeGraphsarerepositoriesofinformationthatgatherdatafromamultitudeofdomainsand...

展开>> 收起<<

PROVE A P IPELINE FOR AUTOMATED PROVENANCE VERIFICATION OF KNOWLEDGE GRAPHS AGAINST TEXTUAL SOURCES.pdf

共30页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

PROVE A P IPELINE FOR AUTOMATED PROVENANCE VERIFICATION OF KNOWLEDGE GRAPHS AGAINST TEXTUAL SOURCES

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: