Computing the Matching Distance of 2-Parameter Persistence Modules from Critical Values Asilata Bapat

2025-04-29 0 0 1.34MB 30 页 10玖币
侵权投诉
Computing the Matching Distance of 2-Parameter Persistence
Modules from Critical Values
Asilata Bapat
asilata.bapat@anu.edu.au
Robyn Brooks
robyn.brooks@utah.edu
Celia Hacker
celia.hacker@epfl.ch
Claudia Landi§
claudia.landi@unimore.it
Barbara I. Mahler
bmahler@kth.se
Elizabeth R. Stephenson
elizasteprene@gmail.com
September 19, 2024
Abstract
The exact computation of the matching distance for multi-parameter persistence modules is an
active area of research in computational topology. Achieving an easily obtainable exact computation
of this distance would allow multi-parameter persistent homology to be a viable option for data
analysis. In this paper, we provide theoretical results for the computation of the matching distance
in two dimensions along with a geometric interpretation of the lines through parameter space realizing
this distance. In particular, we show that it is possible to avoid considering vertical and horizontal
lines. To prove this, lines with slope 1 play an essential role.
1 Introduction
Persistent Homology is known as one of the most-often used tools in Topological Data Analysis, allowing
one to use topological invariants to understand the underlying shape of data. In particular, single pa-
rameter persistence yields a summary of data through a one-dimensional filtration, allowing an overview
of the data at many different scales. Single parameter persistent homology has been the subject of much
study and has proven to be useful in many applications [2, 3, 16, 24, 28].
However, some data requires filtration along multiple parameters to fully capture its information:
this is the role of multi-parameter persistent homology [10, 9]. In some contexts, it can be helpful to use
multiple parameters to capture the details of the data [11, 19, 26, 17, 29]. Additionally, single parameter
persistent homology is not robust to outliers in a point cloud; these outliers can lead to a misinterpretation
of the persistent homology, with the unnecessary creation or destruction of homology classes. Multi-
parameter persistence can be a natural fix to this problem by adding a dimension depending on the
density of the samples (see, e.g., [15, 6]). Unfortunately, understanding, visualizing, and computing
invariants in multi-parameter persistent homology remains a difficult task both mathematically and
computationally. This difficulty holds as well when it comes to computing distances between such
invariants.
In the single parameter case, there are several ways to compare persistent homology modules — such
as the bottleneck distance and Wasserstein distances — which exhibit some stability properties with
respect to variations in the input [14]. For more than one parameter, there are also several definitions of
Australian National University, Canberra, Australia
Boston College, Massachusetts, United States
EPFL, Lausanne, Switzerland
§Universit`a di Modena e Reggio Emilia, DISMI, Italy
KTH Royal Institute of Technology Stockholm, Sweden
Institute of Science and Technology Austria, Klosterneuburg, Austria
1
arXiv:2210.12868v2 [math.AT] 17 Sep 2024
distances between persistence modules. Amongst them, the matching distance [27] attracts the attention
of multi-parameter persistence practitioners.
To compute the matching distance between two n-parameter persistence modules, one uses the fact
that restricting an n-parameter filtration to any affine line of positive slope through the parameter
space yields a 1-parameter filtration. There is therefore a corresponding restriction of the n-parameter
persistence module to a 1-parameter module along that line. This construction allows for the knowledge
and computational methods from the 1-dimensional case to be applied to the n-dimensional case.
Indeed, following this idea, the matching distance is defined as a supremum of the one-dimensional
bottleneck distance, over the collection of all lines of positive slope in the parameter space, i.e.,
dmatch(M, N) := sup
L:u=s ⃗m+b
ˆmL·dB(dgm ML,dgm NL),
where ˆmLis a normalization term explained in Section 2. However, exactly computing this distance is
not an easy task given the nature of its definition. As a first step towards an exact computation, several
approximations of this distance have been provided for 2-dimensional persistence modules [4, 22]. The
exact computation of the matching distance is currently only possible for 2-dimensional modules [20],
with recent computational improvements in terms of time complexity [5]. These methods exploit the
duality of points and lines in the plane.
While approximated computations can be very useful in practice, computing the exact value of the
matching distance has its own merits. One of them is that it can distinguish modules for which the rank
invariant is a complete invariant, such as rectangle decomposable modules, as the matching distance is
a metric for the rank invariant. [7]
In this paper, we propose a method to compute the matching distance based on a refinement of
the framework developed in [1], in which we compute the rank invariant of a multi-persistence module
from a finite set of critical parameter values. These critical parameter values capture all the changes
in homology occurring throughout the multi-filtration. They may also be used to partition the set of
positive lines in the parameter space into equivalence classes, where each equivalence class maintains the
same birth and death ordering within the restricted module for all possible homology classes.
Based on these results, we formulate the idea that the critical values must be relevant to the choice
of lines for the computation of the matching distance. Leveraging the framework of [1] to compute the
matching distance allows us to reduce the number of lines necessary to compute the matching distance
to a finite set, thus reducing the computation to a maximum rather than a supremum without exploiting
the point-line duality used in [20]. Through some examples, we show that considering only lines passing
through pairs of critical values is not sufficient. This is because the definition of matching distance
depends on the bottleneck distance, and lines in the same equivalence class might not be such that the
bottleneck distance is always achieved by the same pairing of births/deaths, as will be further explained
in Section 3.
To overcome this problem, we analyze where switches might happen in the matching which achieves
the bottleneck distance, identifying a set of points in the parameter space which we later on refer to
as the set of switch points. This set allows us to refine our equivalence relation on the set of positive
lines (see Section 4) by partitioning lines through the parameter space using the union of the critical
parameter values with the switch points. Using this union, it is possible to identify all the lines at which
the matching distance can be realised. We explain in detail how to compute all relevant parameter values
and show that the matching distance is attained either at: a line through a pair of points in this union,
or a line of diagonal slope through exactly one of the points in the union (see Theorem 3.1).
The contribution of this paper is to advance the state of the art, although still restricted to two dimen-
sions, in two ways: computing the matching distance in a way which is both geometrically interpretable
and implementable. In contrast to other methods such as [20, 5], we provide a geometric understanding
of different lines, including horizontal, vertical, and diagonal lines, as well as lines through critical values,
and their contribution to the matching distance. In addition, the method we propose leads to explicit
computations of important lines, which will lead to an algorithm that can be implemented.
The paper is structured as follows. In Section 2, we give necessary background from multi-parameter
persistent homology, as well as some of the concepts coming from the framework developed in [1]. We
also provide some examples of two-parameter persistence modules for which we explicitly show which
lines are necessary to compute the matching distance. In Section 3, we provide the statement and proof
of our main theoretical result Theorem 3.1 and include a discussion on the role of vertical and horizontal
lines. In Section 4, we provide guidelines for the computation of the switch points, which are necessary
to the exact computation of the matching distance. Finally, in Section 6, we discuss future directions of
study.
2
2 Background
While we will subsequently specialize to the case of n= 2 parameters, we begin with some general
definitions to set the scene.
2.1 Notation and definitions
Definition 2.1 (Parameter Spaces).Let nN. If n= 1, endow Rwith the usual order . If n > 1,
take the following partial order on Rn: for u= (ui), v = (vi)Rn, set uvif and only if uivifor
i= 1, . . . , n. The poset (Rn,) is called an nD parameter space.
For n > 1, the nD parameter space can be sliced by lines with positive slope.
A line LRnis a 1Dparameter space when considered parameterized by sRas L:u=ms +b
where m Rnand bRn.Lhas positive slope if each coordinate miof m is positive: mi>0.
Throughout the paper, we consider the following standard normalisation for parameterizations of lines:
m:= max{mi|i= 1, . . . , n}= 1,
n
X
i=1
bi= 0 (1)
This normalization is the one used in papers such as [12, 23]. Other choices are possible. Changing
the normalization would impact all the intermediate computations that follow but not the overall results.
Definition 2.2 (Persistence Modules).Let Fbe a fixed field. An (n-parameter) persistence module M
over the parameter space Rnis an assignment of an F-vector space Muto each uRn, and linear maps,
called transition or internal maps,iM(u, v): MuMvto each pair of points uvRn, satisfying the
following properties:
iM(u, u) is the identity map for all uRn.
iM(v, w)iM(u, v) = iM(u, w) for all uvwRn.
In this paper, persistence modules will always assumed to be finitely presented. To fix the ideas, we
will assume that they are obtained applying homology in a fixed degree, say q, over a fixed coefficient
field F, to a tame and one-critical (cf. [1]) n-parameter filtration Kof a finite simplicial complex: M=
Hq(K;F).
In the particular case when n= 1, a finitely presented persistence module Mcan be uniquely
decomposed as a finite sum of interval modules [30].
Definition 2.3 (Interval Module).An interval module is a 1-parameter persistence module of the form
I[b, d) with b<d≤∞∈R:= R∪ {∞} such that, for every stR,
I[b, d)s=Fif bs < d
0 otherwise , I[b, d)(st) = idFif bst<d
0 otherwise
The interval [b, d) can be represented as a point (b, d) in R×R, above the diagonal. This encoding
permits defining persistence diagrams.
Definition 2.4 (Persistence Diagram).If M
=LjJI[bj, dj)mj, then the persistence diagram of M,
denoted dgm M, is the finite multi-set of points (bj, dj) of R×Rwith multiplicity mjfor jJ.
We now review the definitions of bottleneck distance between persistence diagrams and matching
distance between two n-parameter persistence modules Mand N.
When n= 1, the bottleneck distance dBis an easily computable extended metric on persistence
diagrams [21], defined as follows.
Definition 2.5 (Bottleneck Distance).Let M,Nbe two 1-parameter persistence modules, with persis-
tence diagrams dgm Mand dgm N.
Amatching between dgm Mand dgm Nis a multi-bijection σ:PQbetween the points of a
multi-subset Pin dgm Mand a multi-subset Qin dgm N.
3
The cost of a matching σ, denoted c(σ), is the greatest among the costs of each matched pair of
points, taken to be equal to the distance of the corresponding pair of points in R×R, with the
convention that ∞−∞= 0 and ∞ − a=for every aR, and the costs of each unmatched point,
taken to be equal to the distance of that point to the diagonal of R2:
c(σ) := max max
pPpσ(p),max
p /P`Q
p2p1
2.
The bottleneck distance dBis defined as the smallest possible cost of any matching σbetween dgm M
and dgm N:
dB(dgm M, dgm N) := min
σ:PQ
Pdgm M,Qdgm N
c(σ).
Observe that a matching σthat pairs a point (b, d)R2with a point (b,)R×Rhas cost equal
to infinity. Hence, the bottleneck distance between two persistence diagrams with a different number of
points at infinity cannot be finite. On the contrary, matching points of the same type always gives a
finite (more convenient) cost that can be expressed as follows.
Remark 2.6. By definition, the cost c(σ) of σ:PQtakes one of the two forms.
If c(σ) is realized by matching pPto q=σ(p), then c(σ) = |st|, where sand tare either both
abscissas or both ordinates of pand q, respectively;
If c(σ) is realized by some pnot in P`Q, then c(σ) = |st|
2, where sand tare the abscissa and
the ordinate of p.
Briefly, we can say that c(σ) is attained by some |st|
δ, with δ∈ {1,2}and s, t coordinates of points
in dgm Mdgm N.
When the number of parameters is n2, we can use the bottleneck distance to define an extended
pseudo-metric between persistence modules Mand Nvia restrictions to lines with positive slope.
The restriction of Mto a line of positive slope LRnis the persistence module MLthat assigns
Muto each uL, and whose transition maps iML(u, v) : (ML)u(ML)vfor uvLare the same
as in M. Once a parameterization u=ms +bof Lis fixed, the persistence module MLis isomorphic to
the 1-parameter persistence module, by abuse of notation still denoted by ML, that
assigns to each sRthe vector space (ML)s:= Mu, and
whose transition map between (ML)sand (ML)tfor stis equal to that of Mbetween Muand
Mvwith u=ms +band v=mt +b.
Definition 2.7 (Matching Distance).The matching distance between Mand Nis defined by
dmatch(M, N) := sup
L:u=s ⃗m+b
ˆmL·dB(dgm ML,dgm NL)
where ˆmL:= min{mi|i= 1, . . . , n}>0 is the minimal coordinate of the direction vector m of L, and
Lvaries among all the lines of Rnwith positive slope, taken with standard normalization.
The role of the weight ˆmLin the definition of the matching distance is that of ensuring stability
of persistence modules against perturbations of the filtrations that originate them (cf. [12, 23]). Such
weight would vanish for lines parallel to the coordinate axes. Therefore, such lines are not considered in
the definition of the matching distance.
In the context of the matching distance between multi-parameter persistence modules, it is more
convenient to include the factor ˆmLwhen computing the cost of a matching σLbetween the points of
a multi-subset in dgm MLand the points of a multi-subset in dgm NLfor a fixed line Lwith positive
slope. Therefore, we introduce the following notation:
cost(σL) := ˆmLc(σL).
Examples of computation of the matching distance in some simple cases are given in Section 2.3.
By definition, the matching distance is computed by taking the supremum of the bottleneck distance
over all (infinitely many) lines with positive slope through the parameter space. However, in [1], we
4
have shown that this set of lines may be partitioned into a finite number of equivalence classes. For the
two-parameter case (i.e., for n= 2), we will use these equivalence classes to generate a finite set of lines
from which one may compute the matching distance. Therefore, we introduce notation and definitions
necessary to understand these equivalence classes below.
Henceforth in this paper, we always assume n= 2, which is the case for which we have extremely
explicit results. We introduce the following definitions (with associated figures) following [1].
Definition 2.8 (Positive Cone).For every point uin R2, let S+(u) be the positive cone with vertex u:
S+(u) := {vR2:uv}.
The boundary of the positive cone, S+(u), decomposes into open faces. In particular, S+(u) can
be partitioned by non-empty subsets Aof {1,2}in the following way. For ∅ ̸=A⊆ {1,2}, define
SA(u) :=
{u}if A={1,2},
{(x1, x2)R2|x1=u1, x2> u2}if A={1},
{(x1, x2)R2|x1> u1, x2=u2}if A={2}.
u=S{1,2}S{2}
S{1}S+(u)
Figure 1: The positive cone S+(u) of uR2and the decomposition of its boundary into S{1},S{2}and
S{1,2}, which correspond respectively to the vertical boundary, horizontal boundary, and u.
Definition 2.9. A line Lwith positive slope intersects S+(u) in exactly one point, which we call the
push of uonto L, and denote by pushL(u):
pushL(u) := LS+(u).
u
L
pushL(u)
Figure 2: The push of ualong the line L. In this case, AL
u={2}, and so uis to the left of L.
Because of the partition of S+(u) by its open faces, there is a unique non-empty subset of {1,2},
denoted AL
u, such that pushL(u) = LSAL
u(u). Concretely, in the plane, AL
u={1}means that uis
5
摘要:

ComputingtheMatchingDistanceof2-ParameterPersistenceModulesfromCriticalValuesAsilataBapat∗asilata.bapat@anu.edu.auRobynBrooks†robyn.brooks@utah.eduCeliaHacker‡celia.hacker@epfl.chClaudiaLandi§claudia.landi@unimore.itBarbaraI.Mahler¶bmahler@kth.seElizabethR.Stephenson‖elizasteprene@gmail.comSeptember...

展开>> 收起<<
Computing the Matching Distance of 2-Parameter Persistence Modules from Critical Values Asilata Bapat.pdf

共30页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:30 页 大小:1.34MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 30
客服
关注