
ProVe: A Pipeline for Automated Provenance Verification of Knowledge Graphs against Textual SourcesA PREPRINT
evidence [15, 50, 46, 9, 42]. For instance, Thorne and Vlachos [46] directly map claims found in text to triples in a KG
to verify numerical values. Both Ciampaglia et al. [9] and Shiralkar et al. [42] use entity paths in DBpedia to verify
triples extracted from text. Other approaches based on KG embeddings associate the likelihood of a claim being true
to that of it belonging as a new triple to a KG [5, 18].
These tasks, while incorporating semantic triples and KGs, can not be exactly defined as AFC on KG; either the
verified triples do not come from full and consistent KGs, or the evidence used for reasoning is not taken from sources
that could serve as provenance, but inferred from the graph itself.
AFC on KGs
AFC on KGs is a more specific task within AFC, explored by a handful of approaches, the most prominent of which
are DeFacto [14] and its successor FactCheck [45]. Its main purpose is to ensure KGs are fit for use by asserting
whether their information is verifiable by trustworthy evidence. Given a KG triple and either its documented external
provenance or searchable external document corpora whose items could be used as provenance, AFC on KGs can be
defined as the automated verification of said triple’s veracity or support by collecting and reasoning over evidence
extracted from such actual or potential provenance. Its outputs are the verdict and the evidence used.
KGCleaner [31] uses string matching and manual mappings to retrieve sentences relevant to a KG triple from a docu-
ment, using embeddings and handcrafted features to predict the triple’s credibility. Leopard [44] validates KG triples
for three specific organisation properties, using specifically designed extractions from HTML content. Both approaches
entail manual work overhead, cover a limited amount of predicates, and do provide human-readable evidence.
DeFacto [14] and its successor FactCheck [45] represent the current state-of-the-art on this task. They verbalise KG
triples using text patterns and use it to retrieve web pages with related content. They then score sentences based on
relevance to the claim and use a supervised classifier to classify the entire web page. Despite their good performance,
both approaches depend on string matching, which might miss verbalisations that are more nuanced and also entail
considerable overhead for unseen predicates. ProVe, on the other hand, covers any non-ontological predicate (such as
subclass of and main category of ) by using pre-trained LMs that leverage context and meaning to infer verbalisations.
Due to its specific application scenario, approaches tackling AFC on KGs differ from the general framework [57, 15]
seen in Figure 1. A claim detection step is not deemed necessary, as triples are trivial to extract and it is commonly as-
sumed they all need verifying. Alternatively, triples with particular predicates can be easily selected. The existence of a
document retrieval step depends on whether provenance exists or needs to be searched from a repository, with the for-
mer scenario dismissing the need for the step. This is the case for ProVe, but not for DeFacto [14] and FactCheck [45],
which search for web documents.
Additionally, KG triples are often not understood by the components’ main labels alone. Descriptions, alternative
labels, editor conventions, and discussion boards help define their proper usage and interpretation, rendering their
meaning not trivial, in contrast to the natural language sentences tackled by general AFC. As such, approaches tack-
ling AFC on KGs rely on transforming KG triples into natural sentences [25, 14, 45] through an additional claim
verbalisation step. While both DeFacto [14] and FactCheck [45] rely on sentence patterns that are completed with the
components’ labels, ProVe relies on state-of-the-art Language Models (LMs) for data-to-text conversion.
Lastly, evidence document corpora normally used in general AFC tend to have a standard structure or come from a
specific source. Both FEVER [48] and VitaminC [39] take their evidence sets from Wikipedia, with FEVER’s even
coming pre-segmented as individual and clean sentences. Vo and Lee [51] use web articles from snopes.com and
politifact.com only. KGs, however, accept provenance from potentially any website domains. As such, unlike general
AFC approaches, ProVe employs a text extraction step in order to retrieve and segment text from triples’ references.
While previous approaches simply remove HTML markup, ProVe employs a rule-based approach that allows for more
flexibility.
Large Pre-trained Language Models on AFC
Advances towards textual evidence-based AFC, particularly the sentence selection and claim verification subtasks,
have been facilitated by resources like the FEVER [49, 48] shared task and its benchmarking dataset. The FEVER
dataset consists of a large set of claims annotated with one of three classes: supported, refuted, and not enough in-
formation to determine (neither). The dataset also provides pre-extracted and segmented passages from Wikipedia as
evidence for each claim.
Tackling FEVER through pre-trained LMs [43, 27, 29] and graph networks [28, 59, 58] represents the current state-
of-the-art. While approaches using graph networks (such as KGAT [28], GEAR [59], and DREAM [58]) for claim
verification slightly outperform those based mainly on sequence-to-sequence LMs, they still massively depend on the
4