for details). In this paper, we focus on the ther-
modynamic regime, i.e. we keep the quantity n·
rnd=λconstant. Up to a constant factor, the
quantity n·rd
nis the average number of points in a
ball of radius rn[18, Section 1]. This value neither
goes to zero nor to infinity as n→ ∞ in the ther-
modynamic regime, leading to complex topology;
see for instance [19, Chapter 9]. Now it is straight-
forward to observe that a subset of our sample
σ⊆Xforms a simplex in the ˇ
Cech complex at
scale rniff
\
x∈σ
Brn(x)̸=∅ ⇔ \
x∈n1/dσ
Bλ(x)̸=∅.
This is because for any x∈X, x′∈Rd, we have
∥x′−x∥ ≤ rn⇔n1/d∥x′−x∥ ≤ n1/drn
⇔ ∥n1/dx′−n1/dx∥ ≤ λ1/d
This observation motivates us to scale a sample of
size nby n1/d. In fact, this setup aligns with the
approach of [20]. Due to this scaling, the aver-
age number of points in a ball of radius r=λ1/d
stays the same as we increase n→ ∞. There-
fore, it makes sense to compare ECCs at fixed
radius r=λ1/d for samples of different sizes. Visu-
ally speaking, we can compare (expected) ECCs
from samples of different sizes in a common coor-
dinate system using the r-axis scaled in this way.
In particular, one can study the point-wise limit of
the expected ECC; that is, when the sample size
approaches infinity for a fixed r. Moreover, this
rescaling allows us to conduct two sample tests
with samples of different sizes, cf. Section 2.2.
1.2 Previous Work
Let us briefly review some related work on the
intersection of topology and statistics. The most
popular tool of TDA is persistent homology. Its
key property is stability [21]; informally speak-
ing, a small perturbation of the input yields a
small change in the output. However, persistent
homology is a complicated setting for statistics;
for example, there are no unique means [22].
For a survey on the topology of random geo-
metric complexes see [18]. A text book for the
case of one-dimensional complexes, i.e. graphs,
is [19]. The Euler characteristic of random geo-
metric complexes has been studied in [23,24].
Notably, in [24], the limiting ECC in the ther-
modynamic regime is computed for the uniform
distribution on [0,1]3. More recently, [25] provided
a functional central limit theorem for ECCs, which
was subsequently generalized by [20]. The Euler
characteristic has been studied in the context of
random fields [26] by Adler and Taylor. Adler
suggested to use it for model selection purposes
and normality testing [27, Section 7]. Building
on this work, such a normality test has been
extensively studied in [28]. Using topological sum-
maries for statistical testing has moreover been
suggested by [29] for persistence vineyards, [30]
for persistent Betti numbers and [31] for multipa-
rameter persistent Betti numbers. Mukherjee and
Vejdemo-Johansson [14] describe a framework for
multiple hypothesis testing for persistent homol-
ogy. Very recently, Vishwanath et al. [32] provided
criteria to check the injectivity of topological
summary statistics including ECCs.
1.3 Our Contributions
In this paper, to the best of our knowledge, we
present the first mathematically rigorous approach
using the Euler characteristic curves to perform
general goodness-of-fit testing. Our procedure
is theoretically justified by Theorem 2.4. The
concentration inequality for Gaussian processes
(Lemma 2.2) might be of independent interest.
Simulations conducted in Section 4and 5
indicate that TopoTest outperforms the Kolmogo-
rov-Smirnov test we used as a baseline in arbitrary
dimension both in terms of the test power but
also in terms of computational time for moderate
sample sizes and dimensions.
The implementation of TopoTest is publicly
available at https://github.com/dioscuri-tda/
topotests.
2 Method
2.1 One-sample test
While topological descriptors are computable and
have a strong theory underlying them, they are
not complete invariants of the underlying distri-
butions, as recently pointed out in [32]. Hence
the statement of the null hypothesis and the
alternative require some care.
5