
truth” (Garnkel, 2008). The Semantic
eb reinforces this aspect since each applica-
tion processing information must evaluate trustworthiness by probing the statements’
context (i.e., the provenance) (Koivunen & Miller, 2001).
Moreover, data changes over time, for either the natural evolution of concepts or
the correction of mistakes. Indeed, the latest version of knowledge may not be the most
accurate. Such phenomena are particularly tangible in the
eb of Data, as highlighted
in a study by the Dynamic Linked Data Observatory, which noted the modication
of about 38% of the nearly 90,000 RDF documents monitored for 29 weeks and the
permanent disappearance of 5% (K¨afer et al., 2013).
Notwithstanding these premises, the most extensive RDF datasets to date – DB-
Pedia, Wikidata, Yago, and the Dynamic Linked Data Observatory – either do not use
RDF to track changes or do not provide provenance information at the entity level.
(Dooley & Boˇzi´c, 2019; Orlandi & Passant, 2011; Project, 2021; Umbrich et al., 2010).
Therefore, they don’t allow SPARQL time-traversal queries on previous statuses of
their entities together with provenance information. For instance,
ikidata allows
SPARQL queries on entities temporally annotated via its proprietary RDF extension
but does not allow queries on change-tracking data.
The main reason behind this phenomenon is that the founding technologies of the
Semantic
L, and RDF – did not initially provide an ef-
fective mechanism to annotate statements with metadata information. This lacking
led to the introduction of numerous metadata representation models, none of which
succeeded in establishing itself over the others and becoming a widely accepted stan-
dard to track both provenance and changes to RDF entities (Berners-Lee, 2005; Board,
2020; Caplan, 2017; Carroll et al., 2005; Ciccarese et al., 2008; Damiani et al., 2019;
da Silva et al., 2006; Dividino et al., 2009; Flouris et al., 2009; Hartig & Thompson,
2019; Hoart et al., 2013; Lebo et al., 2013; Moreau et al., 2011; Nguyen et al., 2014;
Pediaditis et al., 2009; Sahoo et al., 2010; Sahoo & Sheth, 2009; Suchanek et al., 2019;
Zimmermann et al., 2012).
In the past, some software was developed to perform time-traversal queries on
RDF datasets, enabling the reconstruction of the status of a particular entity at a
given time. However, as far as we know, all existing solutions need to preprocess
and index RDF data to work eciently (Cerdeira-Pena et al., 2016; Im et al., 2012;
Neumann &
eikum, 2010; Pellissier Tanon & Suchanek, 2019; Taelman et al., 2019).
This requirement is impractical for linked open datasets that constantly receive many
updates, such as
ikidata. For example, “Ostrich requires ∼22 hours to ingest
revision 9 of DBpedia (2.43M added and 2.46M deleted triples)” (Pelgrin et al., 2021).
Conversely, software operating on the y either does not support all query types (Noy
& Musen, 2002), or supports them non-generically by imposing a custom database
(Graube et al., 2016) or a specic triplestore (Arndt et al., 2019; Sande et al., 2013).
This work introduces a methodology and a Python library enabling all the time-
related retrieval functionalities identied by Fern´andez et al. (2016) live, i.e., allowing
real-time queries and updates without preprocessing the data. Moreover, data can
be stored on any RDF-compliant storage system (e.g., RDF-serialized textual les
and triplestores) when the provenance and data changes are tracked according to the
OpenCitations Data Model (Daquino et al., 2020).
The rest of the paper is organized as follows. Section 2 reviews the literature on
metadata representation models, retrieval functionalities, and archiving policies for dy-
namic linked data. Section 3 showcases the methodology underlying the time-agnostic-
library implementation, and Section 4 discusses the nal product from a quantitative
point of view, reporting the benchmarks results on execution times and memory.
2