Conference’17, July 2017, Washington, DC, USA Simon Fowler, Vashti Galpin, and James Cheney
then onwards, and at 19:00 “Watch TV” was removed from
the table without being completed. Technically, note that
this example interprets the time annotations as transaction
time, that is, the times indicate when certain data was in the
database; there is another dimension, valid time, and we will
discuss both dimensions in greater detail later on.
The problems of querying and updating temporal databases
have been well-studied, leading to the landmark language de-
sign TSQL2 [
32
] extending SQL. However, despite decades of
eort, only limited elements of TSQL2 were eventually incor-
porated into the SQL:2011 standard [
21
] and these features
have not yet been widely adopted. Directly implementing
temporal queries in SQL is possible, but painful: a TSQL2-
style query or update operation may grow by a factor of 10
or more when translated to plain SQL, which leaves plenty of
scope for error, and thus these powerful capabilities remain
outside the grasp of non-experts. In this paper we take rst
steps towards reconciling temporal data management with
language-integrated query based on query normalisation.
We propose supporting temporal capabilities by translation
to ordinary language-integrated query and hypothesise that
this approach can make temporal data management safer,
easier to use and more accessible to non-experts than the
current state of aairs. As an initial test of this hypothesis
we present a high-level design, a working implementation,
and detail our experience with a nontrivial case study.
Although both language-integrated query and temporal
databases are now well-studied topics, we believe that their
combination has never been considered before. Doing so has
a number of potential benets, including making the power
of well-studied language designs such as TSQL2 more acces-
sible to non-expert programmers, and providing a high-level
abstraction that can be implemented eciently in dierent
ways. Our interest is particularly motivated by the needs
of scientic database development, where data versioning
for accountability and research integrity are very important
needs that are not well-supported by conventional database
systems [
4
]. Temporal data management has the potential
to become a “killer app” for language-integrated query, and
this paper takes a rst but signicant step towards this goal.
The overarching contribution of this paper is the rst ex-
tension of language-integrated query to support transaction
time and valid time temporal data.
Concretely, we make three main contributions:
1.
Based on
𝜆LINQ
(§2), a formalism based on the Nested
Relational Calculus (NRC) [
30
], we introduce typed
𝜆
-calculi to model queries and modications on trans-
action time (§3) and valid time (§4) databases. We give
semantics-preserving translations to 𝜆LINQ for both.
2.
We show how existing work on query normalisation
allows a surprisingly straightforward implementation
strategy for sequenced joins (§5).
Types 𝐴, 𝐵 ::=𝐶|𝐴→𝐸𝐵|Bag(𝐴) | ( g
ℓ:𝐴) | Table(𝐴)
Base types 𝐶::=String |Int |Bool |Time
Eects 𝑒::=read |write
Eect sets 𝐸
Terms 𝐿, 𝑀, 𝑁 ::=𝑥|𝑐|𝑡
|𝜆𝑥.𝑀 |𝑀 𝑁 | ⊙ {−→
𝑀}
|if 𝐿then 𝑀else 𝑁
|* + |*𝑀+|𝑀⊎𝑁|for (𝑥←𝑀)𝑁
| (
ℓ=𝑀) | 𝑀.ℓ |now
|query 𝑀|get 𝑀|insert 𝑀values 𝑁
|update (𝑥⇐𝐿)where 𝑀set (
ℓ=𝑁)
|delete (𝑥⇐𝑀)where 𝑁
Figure 1. Syntax of 𝜆LINQ
3.
We implement our constructs in the Links functional
web programming language, and describe a case study
based on curating COVID-19 data (§6).
Although the concepts behind translating temporal
queries and updates into non-temporal core languages are
well known [
33
], our core calculi
𝜆TLINQ
and
𝜆VLINQ
are novel
and aid us in showing (to the best of our knowledge) the rst
correctness results for the translations.
We relegate several details and all proofs to the appendices.
2 Background: Language-Integrated Query
We begin by introducing a basic
𝜆
-calculus, called
𝜆LINQ
, to
model language-integrated query in non-temporal databases.
Our calculus is based heavily on the Nested Relational Cal-
culus [
30
], with support for database modications heavily
inspired by the calculus of Fehrenbach and Cheney
[13]
. The
calculus uses a type-and-eect system to ensure database
accesses can occur in ‘safe’ places, i.e., that we do not attempt
to perform a modication operation in the middle of a query.
Eects include
read
(denoting a read from a database) and
write (denoting an update to the database).
Types
𝐴, 𝐵
include base types
𝐶
, eect-annotated function
types
𝐴→𝐸𝐵
, unordered collection types
Bag(𝐴)
, record
types
(ℓ𝑖
:
𝐴𝑖)𝑖
denoting a record with labels
ℓ𝑖
and types
𝐴𝑖
,
and handles
Table(𝐴)
for tables containing records of type
𝐴
. A record is a base record if it contains only elds of base
type. We assume that the base types includes at least
Bool
and the Time type, which denotes (abstract) timestamps.
Basic terms include variables
𝑥
, constants
𝑐
, table handles
𝑡
, functions
𝜆𝑥.𝑀
, application
𝑀 𝑁
, n-ary operations
⊙{−→
𝑀}
,
and conditionals
if 𝐿then 𝑀else 𝑁
. We assume that the
set of operations contains the usual unary and binary rela-
tion operators, as well as the
𝑛
-ary operations
greatest
and
least
on timestamps which return their largest and smallest
arguments respectively. We assume that the set of constants
contains timestamps
𝜄
of type
Time
, and two distinguished
timestamps
−∞
and
∞
of type
Time
, which denote the mini-
mum and maximum timestamps respectively. The calculus
also includes the empty multiset constructor
* +
; the single-
ton multiset constructor
*𝑀+
; multiset union
𝑀⊎𝑁
; and
comprehensions
for (𝑥←𝑀)𝑁
. We write
*𝑀1, . . . , 𝑀𝑛+
as