
iors, requires a simplification of the interface between
system and operator, rather than further sophistication.
3 Timestamp tokens
We propose that dataflow systems and operator logic
can coordinate precisely, efficiently, and ergonomically
by explicitly handling in-memory tokens that represent
their ability to produce outgoing data in the future. We
borrow and adapt this idiom from capability systems
(e.g. object-capability systems [13,15], capability-based
protection and security [12,21], hardware capabilities
[5,11]). Similarly to capabilities1, a timestamp token
represents a computing object – an operator output –
and the actions that can be performed with respect to
that object: the production of data at timestamp tand
dataflow location l.
Following Naiad we refer to the pair of timestamp t
and location las a pointstamp (t, l). A location can be
either a node in Vor an edge in E.
Definition. Atimestamp token is a coordination
primitive that names an associated pointstamp (t, l),
and which gives its holder the ability to produce mes-
sages with timestamp tat location l.
The location for a timestamp token is typically one of
the output edges of the operator that holds it.
Nothwithstanding any other similarities to capabil-
ities, our interest is in the information that holding
timestamp tokens communicates to others. The system
tracks the set of live timestamp tokens and summarizes
this information to operators as frontiers: lower bounds
on the timestamps that operators may yet observe in
their inputs. By downgrading (to future timestamps)
or discarding their held timestamp tokens, operators al-
low frontiers to advance and the computation as a whole
to make forward progress.
3.1 The timestamp token life-cycle
Each dataflow operator is initially provided with a
timestamp token for each of its output edges, each bear-
ing some minimal “zero” timestamp. This gives each
operator the opportunity to be a source of timestamped
messages, even without receiving input messages. For
many operators, their first actions will be to discard
these timestamp tokens, by which they release their
ability to produce output messages unprompted, and
unblock the dataflow system at the same time.
1“Each capability [...] locates by means of a pointer some
computing object, and indicates the actions that the computation
may perform with respect to that object.” [13]
3. downgrade
operator A
operator (B)
t=4,
timestamp tokens
t=3,
t=6,
1. receive
2. exercise
A→B
A→B
A→B
t=6
input output
t=6
Figure 2: Timestamp token life-cycle.
As a dataflow operator executes, it can receive, ex-
ercise, downgrade, and discard timestamp tokens (Fig-
ure 2). Operators receive timestamped input messages,
each of which provides a timestamp token at that times-
tamp for each of the operator’s outputs. Operators can
produce timestamped output messages as long as they
hold a timestamp token with the corresponding times-
tamp and output edge. Lastly, operators can arbitrar-
ily hold, downgrade (to future timestamps), and discard
their timestamp tokens as their logic dictates.
The dataflow system is informed of the net changes to
the number of timestamp tokens for each pointstamp,
but only passively in response to operator actions,
rather than actively as a gatekeeper. Through this infor-
mation the system can inform dataflow operators about
the consequences of operator actions, without the spe-
cific details of the reasons for those actions.
3.2 Coordination
The coordination state of the dataflow system is the set
of timestamp tokens, which when combined with the
dataflow graph determines lower bounds for the times-
tamps at each operator input. As the set of times-
tamp tokens evolves these lower bounds advance, and
the dataflow system has the responsibility of informing
operators as this happens. The difference with times-
tamp tokens is that operators drive the production of
this information, instead of the system itself.2
Operators have a great deal of flexibility in how (or
even if) they respond to changes in their input frontiers
(timestamp lower bounds). Certain streaming opera-
tors like map and filter can be oblivious to this in-
formation and process data as it arrives. Synchronous
reduction operators like reduce should await the in-
dication that they have received all inputs for a times-
tamp before they apply their reduction function and
produce output. Hybrid operators like count may per-
2For example, Naiad does not allow operators to hold tokens
across invocations; Timely Dataflow (without timestamp tokens)
does, by allowing operators to participate directly (and often in-
correctly) in the coordination protocol. Here, timestamp tokens
are respectively more expressive, and safer.
3