
G. Bilardi and L. De Stefani 3
known pebbling upper bound,
p
(
GR
)
≤
(
dout −
1)
`
+ 1. It remains an open question whether
a tight boundary-complexity lower bound to the pebbling number can always be found by
tailoring the choice of
r
to the DAG, or there are DAGs for which the two metrics exhibit a
gap for any rule.
We also exploit visits to analyze the I/O complexity of a DAG
G
,
IO (M, G)
, pioneered by
Hong and Kung in [
22
]. This quantity is the minimum number of accesses to the second level
of a two-level memory, with the first level (i.e.,, the cache) of size
M
, required to compute
G
.
Such computation can be modeled by a game with pebbles of two colors. Let
k
(
G, M
)be the
smallest integer
k
such that the vertices of
G
can be topologically partitioned into a sequence
of
k
subsets, each with a dominator set and minimum set no larger than
M
. (
D
is a dominator
of
U
if the vertices of
U
can be computed from those of
D
. The minimum set of
U
contains
those vertices of
U
with no successor in
U
.) Then,
IO (G, M)≥M
(
k
(
G,
2
M
)
−
1) [
22
].
Dominators play a role in the red-blue game (where pebbles are initially placed on input
vertices which, if unpebbled, cannot be replebbled), but not in standard pebbling (where a
pebble can be placed on an input vertex at any time, hence a dominator of what is yet to be
computed needs not be currently in memory). Intuitively, each segment of a computation
must read a dominator set Dof the vertices being computed and at least |D| − Mof these
reads must be to the second level of the memory. It is also shown in [
22
] that the minimum
set, say
Y
, of a segment of the computation must be present in memory at the end of such
segment, so that at least
|Y| − M
of its elements must have been written to the second
level of the memory. In the visit perspective, the minimum set emerges as the boundary of
topological visits, capturing a space requirement at various points of the computation. In
addition to providing some intuition on minimum sets, this insight suggests a generalization
of the partitioning technique to any type of visit. In fact, the universal upper bounds on visit
boundaries mentioned above do indicate that the singleton rule has the potential to yield
better lower bounds than the topological one. Following this insight, we have developed the
visit partition technique. For some DAGs for which
S
partitions can only lead to a trivial,
Ω(1), lower bound, visit partitions yield a much higher and tight lower bound.
Further related work:
Since the work of Hong and Kung [
22
], I/O complexity has attracted
considerable attention, thanks also to the increasing impact of the memory hierarchy on
the performance of all computing systems, from general purpose processors, to accelerators
such as GPUs, FPGAs, and Tensor engines. Their
S
-partition technique has been the
foundation to lower bounds for a number of important computational problems, such as the
Fast Fourier Transform [
22
], the definition-based matrix multiplication [
3
,
21
,
32
], sparse
matrix multiplication [
25
], Strassen’s matrix multiplication [
8
] (this work also introduces
the “G-flow” technique, based on the Grigoriev flow of functions [
18
], to lower bound the
size of dominator sets), and various integer multiplication algorithms [
9
,
15
]. Ballard et
al. [
5
,
4
] generalized the results on matrix multiplication of [
22
], by means of the approach
proposed by Irony, Toledo, and Tiskin in [
21
] based on the Loomis-Whitney geometric
theorem [
23
], which captures a trade-off between dominator size and minimum set size. The
same papers present tight I/O complexity bounds for various linear algebra algorithms for
LU/Cholesky/LDLT/QR factorization and eigenvalues and singular values computation.
After four decades from its introduction, the S-partition technique [
22
] is still the state of
the art for I/O lower bounds that do hold when recomputation (the repeated evaluation of the
same DAG vertex) is allowed. Savage [
29
] has proposed the S-span technique, as “ a slightly
weaker but simpler version of the Hong-Kung lower bound on I/O time”[
30
]. The S-covering
technique [
10
], which merges and extends aspects from both [
22
] and [
29
], is in principle