PROVE A P IPELINE FOR AUTOMATED PROVENANCE VERIFICATION OF KNOWLEDGE GRAPHS AGAINST TEXTUAL SOURCES

2025-05-02 0 0 1.15MB 30 页 10玖币
侵权投诉
PROVE:APIPELINE FOR AUTOMATED PROVENANCE
VERIFICATION OF KNOWLEDGE GRAPHS AGAINST TEXTUAL
SOURCES
A PREPRINT
Gabriel Amaral1[0000000244825376] Odinaldo Rodrigues1[0000000178231034]
Elena Simperl1[000000031722947X]
October 27, 2022
ABSTRACT
Knowledge Graphs are repositories of information that gather data from a multitude of domains and
sources in the form of semantic triples, serving as a source of structured data for various crucial ap-
plications in the modern web landscape, from Wikipedia infoboxes to search engines. Such graphs
mainly serve as secondary sources of information and depend on well-documented and verifiable
provenance to ensure their trustworthiness and usability. However, their ability to systematically
assess and assure the quality of this provenance, most crucially whether it properly supports the
graph’s information, relies mainly on manual processes that do not scale with size. ProVe aims at
remedying this, consisting of a pipelined approach that automatically verifies whether a Knowl-
edge Graph triple is supported by text extracted from its documented provenance. ProVe is intended
to assist information curators and consists of four main steps involving rule-based methods and ma-
chine learning models: text extraction, triple verbalisation, sentence selection, and claim verification.
ProVe is evaluated on a Wikidata dataset, achieving promising results overall and excellent perfor-
mance on the binary classification task of detecting support from provenance, with 87.5% accuracy
and 82.9% F1-macro on text-rich sources. The evaluation data and scripts used in this paper are
available in GitHub and Figshare.
Keywords Fact Verification ·Data Verbalisation ·Knowledge Graphs
1 Introduction
A Knowledge Graph (KG) is a type of knowledge base that stores information in the form of semantic triples formed by
a subject, a predicate, and an object. KGs represent both real and abstract entities internally as labelled and uniquely
identifiable entities, such as The Moon or Happiness, and can amass information from a multitude of domains and
sources by connecting such entities amongst themselves or to literals through relationships, coded via uniquely iden-
tified predicates. KGs serve as sources of both human and machine-readable semantically structured data for various
crucial applications in the modern web landscape, such as Wikipedia infoboxes, search engines results, voice-activated
assistants, and information gathering projects [30].
Developed and maintained by ontology experts, data curators, and even anonymous volunteers, KGs have massively
grown in size and adoption in the last decade, mainly as secondary sources of information. This means not storing
new information, but taking it from authoritative and reliable sources which are explicitly referenced. As such, KGs
depend on well-documented and verifiable provenance to ensure they are regarded as trustworthy and usable [56].
Processes to assess and assure the quality of information provenance are thus crucial to KGs, especially measuring
and maintaining verifiability, i.e. the degree to which consumers of KG triples can attest these are truly supported by
their sources [56]. However, such processes are currently performed mostly manually, which does not scale with size.
Manually ensuring high verifiability on vital KGs such as Wikidata and DBpedia is prohibitive due to their sheer size.
arXiv:2210.14846v1 [cs.CL] 26 Oct 2022
ProVe: A Pipeline for Automated Provenance Verification of Knowledge Graphs against Textual SourcesA PREPRINT
ProVe (Provenance Verification) is proposed to assist data curators and editors in handling the upkeep of KG verifia-
bility. It consists of an automated approach that leverages state-of-the-art Natural Language Processing (NLP) models,
public datasets on data verbalisation and fact verification, as well as rule-based methods. ProVe consists of a pipeline
that aims at automatically verifying whether a KG triple is supported by a web page that is documented as its prove-
nance. ProVe first extracts text passages from the triple’s reference. Then, it verbalises the KG triple and ranks the
extracted passages according to their relevance to the triple. The most relevant passages have their stances towards
the KG triple determined (i.e. supporting, refuting, neither) and finally ProVe estimates whether the whole reference
supports the triple.
This task is a specific application of Automated Fact Checking (AFC), also known as AFC on KGs. AFC is a currently
well explored topic of research with several published papers, surveys, and datasets [47, 57, 15, 43, 29, 27, 16, 58, 28,
59, 49, 48, 38], and generally defined as the verification of a natural language claim by collecting and reasoning over
evidence extracted from text documents or structured data sources. Both the verification verdict and the collected
evidence are the main outputs. While general AFC takes a textual claim and a searchable evidence base as inputs,
AFC on KGs takes a single KG triple and its documented provenance in the form of an external reference.
Approaches tackling AFC on KGs are very few, with the two only works of note in a similar direction, as far as is
known by the authors, being DeFacto [25, 14] and its successor FactCheck [45]. While they tackle this task mostly as
defined above, they rely on a searchable document base instead of a given reference and judge triples on a true-false
spectrum instead of verifiability. Like these few approaches, ProVe diverges from the general AFC framework and
introduces a few different sub-tasks. Still, it makes use of the current state-of-the-art on those subtasks in common,
being the first approach to tackle AFC on KGs with large pre-trained Language Models (LMs), which can be expanded
to work in languages other than English and benefits from an Active Learning scenario.
ProVe is evaluated on an annotated dataset of Wikidata triples and their references, combining multiple types of prop-
erties and web domains. ProVe achieves promising results overall (75% accuracy and 68.1% F1-macro) on classifying
references as either supporting their triples or not, with an excellent performance on explicit and naturally written
references (87.5% accuracy, 82.9% F1-macro, 0.908 AUC). Additionally, ProVe assesses passage relevance with a
strong positive correlation (0.5058 Pearson’s r) to human judgements.
In summary, this paper’s main contributions are:
1. A novel pipelined approach to evidence-based Automated Fact Checking on Knowledge Graphs based on
large Language Models;
2. A benchmarking dataset of Wikidata triples and references for Automated Fact Checking on Knowledge
Graphs, covering a variety of information domains as well a balanced sample of diverse web domains;
3. Novel crowdsourcing task designs that facilitate repeatable, quick, and large-scale collection of human anno-
tations on passage relevance and textual entailment at good agreement levels.
These contributions directly aid KG curators, editors, and researchers in improving KG provenance. Properly deployed,
ProVe can do so in multiple ways. Firstly, by assisting the detection of verifiability issues in existing references,
bringing them to the attention of humans. Secondly, given a triple and its reference, it can promote re-usability of the
reference by verifying it against neighbouring triples. Finally, given a new KG triple entered by editors or suggested
by KG completion processes, it can analyse and suggest references. The remainder of this paper is structured as
follows. Section 2 explores related work on KG data quality, mainly verifiability, as well as approaches to AFC on
KGs. Section 3 presents ProVe’s formulation and covers each of its modules in detail. Section 4 presents an evaluation
dataset consisting of triple-reference pairs, including its generation and its annotation. Section 5 details the results of
ProVe’s evaluation. Finally, Section 6 delivers discussions around this work and final conclusions. All code and data
used in this paper are available on Figshare 1and GitHub. 2,3
2 Related Work
ProVe attempts to solve the task of AFC on KGs, with the purpose of assisting data curators in improving the verifi-
ability of KGs. Thus, to understand how ProVe approaches this task, it is important to first understand how the data
quality dimension of verifiability is currently defined and measured in KGs, as well as how state-of-the-art approaches
to general AFC and AFC on KGs tackle these tasks and how ProVe learns or differs from them.
1https://figshare.com/s/df0ec1c233ebd50817f4
2https://anonymous.4open.science/r/RSP-F367/
3https://anonymous.4open.science/r/ClaimVerificationHIT-A04D
2
ProVe: A Pipeline for Automated Provenance Verification of Knowledge Graphs against Textual SourcesA PREPRINT
2.1 Verifiability in KGs
In order to properly evaluate the degree to which ProVe adequately predicts verifiability, this dimension first needs to
be well defined and a strategy needs to be established to measure it given an evaluation dataset. Verifiability in the
context of KGs, whose information is mainly secondary, is defined as the degree to which consumers of KG triples
can attest these are truly supported by their sources [56]. It is an essential aspect of trustworthiness [56, 11, 34], yet is
amongst the least explored quality dimensions [56, 34], with most measures carried superficially, unlike correctness or
consistency [34, 40, 2, 22, 1].
For instance, Farber et al. [11] measure verifiability only by considering whether any provenance is provided at all.
Flouris et al. [12] look deeper into sources’ contents, but only verify specific and handcrafted irrationalities, such
as a city being founded before it had citizens. Algorithmic indicators are not suited to directly measure verifiabil-
ity, as sources are varied and natural language understanding is needed. As such, recent works [33, 3] measure KG
verifiability through crowdsourced manual verification, giving crowdworkers direct access to triples and references.
Crowdsourcing allows for more subjective and nuanced metrics to be implemented, as well as for natural text compre-
hension [55, 8].
Thus, this paper employs crowdsourcing in order to measure verifiability metrics of individual triple-reference pairs.
By comparing a pair’s metrics with ProVe’s outputs given said pair as input, ProVe and its components can be eval-
uated. Like similar crowdsourcing studies [33, 3], multiple quality assurance techniques are implemented to ensure
collected annotations are trustworthy [10]. To the best of the authors’ knowledge, this is the first work to use crowd-
sourcing as a tool to measure the relevance and stance of references in regards to KG triples at levels varying from
whole references to individual text passages.
2.2 Automated Fact Checking on Knowledge Graphs
General AFC
Automated Fact Checking (AFC) is a topic of several works of research, datasets, and surveys [47, 57, 15, 43, 29, 27,
16, 58, 28, 59, 49, 48, 38]. AFC is commonly defined in the Natural Language Processing (NLP) domain as a broader
category of tasks and subtasks [47, 57, 15] whose goal is to, given a textual claim and searchable document corpora as
inputs, verify said claim’s veracity or support by collecting and reasoning over evidence. Such evidence is extracted
from the input document corpora and constitutes AFC’s output alongside the claim’s verdict. While a detailed explo-
ration of individual AFC state-of-the-art approaches is out of this paper’s scope, it is crucial to define their general
framework in order to properly cover ProVe’s architecture.
A general framework for AFC has been identified by recent surveys [57, 15], and can be seen in Figure 1. Zeng et
al. [57] define it as a multi-step process where each step can be tackled as a subtask. Firstly, a claim detection step
identifies which claims need to be verified. Based on such claims, a document retrieval step gathers documents that
might contain information relevant to verifying the claim. A sentence selection step then identifies and extracts from
retrieved documents a set of few individual text passages that are deemed the most relevant. Based on these passages,
aclaim verification step provides the final verdict. Guo et al. [15] add that a final justification production step is
crucial for explainability. Given the framework’s nature, it is no wonder pipelined approaches are extremely popular
and compose the current state-of-the-art.
Claim
Detection Document
Retrieval
Claim Evidence
Selection Claim
Verification
Statements Final Verdict
Document
Corpora
Relevant
Evidence Justification
Production Justification
Fig. 1. Overview of a general AFC pipeline. White diamond blocks are documents and objects, and grey square blocks are AFC
subtasks. Specific formulations and implementation of course might differ.
AFC mainly deals with text, both as claims to be verified and as evidence documents, due to recent advances in
this direction being greatly facilitated by textual resources like the FEVER shared task [49] and its associated large-
scale benchmark FEVER dataset [48]. Still, some tasks in AFC take semantic triples as verifiable claims, either from
KGs [41, 21] or by extracting them from text. Some also utilise KGs as reasoning structures from where to draw
3
ProVe: A Pipeline for Automated Provenance Verification of Knowledge Graphs against Textual SourcesA PREPRINT
evidence [15, 50, 46, 9, 42]. For instance, Thorne and Vlachos [46] directly map claims found in text to triples in a KG
to verify numerical values. Both Ciampaglia et al. [9] and Shiralkar et al. [42] use entity paths in DBpedia to verify
triples extracted from text. Other approaches based on KG embeddings associate the likelihood of a claim being true
to that of it belonging as a new triple to a KG [5, 18].
These tasks, while incorporating semantic triples and KGs, can not be exactly defined as AFC on KG; either the
verified triples do not come from full and consistent KGs, or the evidence used for reasoning is not taken from sources
that could serve as provenance, but inferred from the graph itself.
AFC on KGs
AFC on KGs is a more specific task within AFC, explored by a handful of approaches, the most prominent of which
are DeFacto [14] and its successor FactCheck [45]. Its main purpose is to ensure KGs are fit for use by asserting
whether their information is verifiable by trustworthy evidence. Given a KG triple and either its documented external
provenance or searchable external document corpora whose items could be used as provenance, AFC on KGs can be
defined as the automated verification of said triple’s veracity or support by collecting and reasoning over evidence
extracted from such actual or potential provenance. Its outputs are the verdict and the evidence used.
KGCleaner [31] uses string matching and manual mappings to retrieve sentences relevant to a KG triple from a docu-
ment, using embeddings and handcrafted features to predict the triple’s credibility. Leopard [44] validates KG triples
for three specific organisation properties, using specifically designed extractions from HTML content. Both approaches
entail manual work overhead, cover a limited amount of predicates, and do provide human-readable evidence.
DeFacto [14] and its successor FactCheck [45] represent the current state-of-the-art on this task. They verbalise KG
triples using text patterns and use it to retrieve web pages with related content. They then score sentences based on
relevance to the claim and use a supervised classifier to classify the entire web page. Despite their good performance,
both approaches depend on string matching, which might miss verbalisations that are more nuanced and also entail
considerable overhead for unseen predicates. ProVe, on the other hand, covers any non-ontological predicate (such as
subclass of and main category of ) by using pre-trained LMs that leverage context and meaning to infer verbalisations.
Due to its specific application scenario, approaches tackling AFC on KGs differ from the general framework [57, 15]
seen in Figure 1. A claim detection step is not deemed necessary, as triples are trivial to extract and it is commonly as-
sumed they all need verifying. Alternatively, triples with particular predicates can be easily selected. The existence of a
document retrieval step depends on whether provenance exists or needs to be searched from a repository, with the for-
mer scenario dismissing the need for the step. This is the case for ProVe, but not for DeFacto [14] and FactCheck [45],
which search for web documents.
Additionally, KG triples are often not understood by the components’ main labels alone. Descriptions, alternative
labels, editor conventions, and discussion boards help define their proper usage and interpretation, rendering their
meaning not trivial, in contrast to the natural language sentences tackled by general AFC. As such, approaches tack-
ling AFC on KGs rely on transforming KG triples into natural sentences [25, 14, 45] through an additional claim
verbalisation step. While both DeFacto [14] and FactCheck [45] rely on sentence patterns that are completed with the
components’ labels, ProVe relies on state-of-the-art Language Models (LMs) for data-to-text conversion.
Lastly, evidence document corpora normally used in general AFC tend to have a standard structure or come from a
specific source. Both FEVER [48] and VitaminC [39] take their evidence sets from Wikipedia, with FEVER’s even
coming pre-segmented as individual and clean sentences. Vo and Lee [51] use web articles from snopes.com and
politifact.com only. KGs, however, accept provenance from potentially any website domains. As such, unlike general
AFC approaches, ProVe employs a text extraction step in order to retrieve and segment text from triples’ references.
While previous approaches simply remove HTML markup, ProVe employs a rule-based approach that allows for more
flexibility.
Large Pre-trained Language Models on AFC
Advances towards textual evidence-based AFC, particularly the sentence selection and claim verification subtasks,
have been facilitated by resources like the FEVER [49, 48] shared task and its benchmarking dataset. The FEVER
dataset consists of a large set of claims annotated with one of three classes: supported, refuted, and not enough in-
formation to determine (neither). The dataset also provides pre-extracted and segmented passages from Wikipedia as
evidence for each claim.
Tackling FEVER through pre-trained LMs [43, 27, 29] and graph networks [28, 59, 58] represents the current state-
of-the-art. While approaches using graph networks (such as KGAT [28], GEAR [59], and DREAM [58]) for claim
verification slightly outperform those based mainly on sequence-to-sequence LMs, they still massively depend on the
4
ProVe: A Pipeline for Automated Provenance Verification of Knowledge Graphs against Textual SourcesA PREPRINT
latter for sentence selection. Additionally, explainability for task-specific graph architectures, like those of KGAT and
DREAM, is harder to tackle than for generalist sequence-to-sequence LM architectures which are shared across the
research community [7, 36, 23]. Slightly decreasing potential performance in favour of a simpler and more explainable
pipeline, ProVe employs LMs for both sentence selection and claim verification.
On sentence selection, the common strategy is to assign relevance scores to text passages based on their contextual
proximity to a verifiable claim. GEAR [59] does so with LSTMs, but use a BERT model to acquire text encodings.
Soleimani et al. [43], KGAT [28], and current state-of-the-art DREAM [58] outperform GEAR by directly using BERT
for the rankings, an approach ProVe also follows. Graph networks are employed at the claim verification subtask [28,
59, 58]. Soleimani et al. [43] are among the few to achieve near state-of-the-art results using an LM and a rule-based
aggregation instead of graph networks. While ProVe handles the subtask similarly, it uses a weak classifier as its final
aggregation.
As a subtask of AFC on KGs, claim verbalisation is normally done through text patterns [14, 45] and by filling
templates [31], both of which can either be distantly learned or manually crafted. ProVe is the first approach to utilise
an LM for this subtask. Amaral et al. [4] shows that a T5 model fine-tuned on WebNLG achieves very good results
when verbalising triples from Wikidata across many domains. ProVe follows suit by also using a T5.
Table 1 shows a comparison of ProVe to other AFC approaches mentioned in this section grouped by specific task,
showcasing the particular subtasks each targets, as well as the datasets used as a basis for their evaluation. AFC on KG
is amongst the least researched tasks within AFC. ProVe is the first to tackle it through fine-tuned LMs that adapt to
unseen KG predicates and to be evaluated on a Wikidata dataset consisting of multiple non-ontological predicates.
Task Input
Type
Evidence
Source
Evidence
Returned Subtasks Evaluation
Dataset Approaches
General
text-based
AFC
Textual
claims Text Yes
DR,
SS,
CV
FEVER [43, 29, 28, 59, 58],
Graph-based
AFC
Textual
claims KG Yes SS,
CV Freebase [50, 46]
KG triple
prediction
KG
triples
KG
paths Yes SS,
CV
DBpedia,
SemMedDB,
Wikipedia
[41, 21, 9, 42],
KGE No
RE,
EL,
CV
Kaggle,
news articles
DBpedia,
Freebase
[5, 18]
AFC
on KGs
KG
triples Text
Yes
CVb,
DR,
TA,
CV
DBpedia,
FactBench [25, 14, 45]
No SS,
CV
Wikidata
(48 predicates),
SWC 2017
[31, 44]
Yes
CVb,
TR,
SS,
CV
Wikidata
(any non-ontological
predicate)
ProVe
Table 1. Comparison between ProVe and others within AFC. KGE = KG Embeddings, DR = Document Retrieval, SS = Sentence
Selection, CV = Claim Verification, RE = Relation Extraction, EL = Embedding Learning, CVb = Claim Verbalisation, TA =
Trustworthiness Analysis, TR = Text Retrieval.
3 Approach
ProVe consists of a pipeline for Automated Fact Checking (AFC) on Knowledge Graphs (KG) that, provided with a KG
triple that is not ontological in nature (e.g. denoting subclasses, categories, lists, etc) and its documented provenance
in the form of a web page or text document, automatically verifies whether the page textually supports the triple,
retrieving from it relevant text passages that can be used as evidence. This section presents an overview of ProVe and
its task, as well as detailed descriptions of its modules and the subtasks they target.
5
摘要:

PROVE:APIPELINEFORAUTOMATEDPROVENANCEVERIFICATIONOFKNOWLEDGEGRAPHSAGAINSTTEXTUALSOURCESAPREPRINTGabrielAmaral1[0000000244825376]OdinaldoRodrigues1[0000000178231034]ElenaSimperl1[000000031722947X]October27,2022ABSTRACTKnowledgeGraphsarerepositoriesofinformationthatgatherdatafromamultitudeofdomainsand...

展开>> 收起<<
PROVE A P IPELINE FOR AUTOMATED PROVENANCE VERIFICATION OF KNOWLEDGE GRAPHS AGAINST TEXTUAL SOURCES.pdf

共30页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:30 页 大小:1.15MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 30
客服
关注