Comparing Embedded Graphs Using Average Branching Distance Levent Batakci Case Western Reserve University

2025-04-29 0 0 763.34KB 17 页 10玖币
侵权投诉
Comparing Embedded Graphs Using Average Branching Distance
Levent Batakci
Case Western Reserve University
lab192@case.edu
Abigail Branson
Union University
abby.branson@my.uu.edu
Bryan Castillo
Mesa Community College
bryancastillo98@gmail.com
Candace Todd
The Pennsylvania State
University
clt5441@psu.edu
Erin Wolf Chambers
Saint Louis University
erin.chambers@slu.edu
Elizabeth Munch
Michigan State University
muncheli@msu.edu
Abstract
Graphs drawn in the plane are ubiquitous, arising from data sets through a variety of methods ranging
from GIS analysis to image classification to shape analysis. A fundamental problem in this type of data
is comparison: given a set of such graphs, can we rank how similar they are, in such a way that we
capture their geometric “shape” in the plane?
In this paper we explore a method to compare two such embedded graphs, via a simplified combina-
torial representation called a tail-less merge tree which encodes the structure based on a fixed direction.
First, we examine the properties of a distance designed to compare merge trees called the branching dis-
tance, and show that the distance as defined in previous work fails to satisfy some of the requirements of
a metric. We incorporate this into a new distance function called average branching distance to compare
graphs by looking at the branching distance for merge trees defined over many directions. Despite the
theoretical issues, we show that the definition is still quite useful in practice by using our open-source
code to cluster data sets of embedded graphs.
1 Introduction
Embedded graphs appear in a wide variety of applications, including map reconstructions, GIS data, shape
analysis, and medical imaging. Such graphs are more than simple abstract representations, since they have
geometric information attached, usually in the form of coordinates and edge lengths. When faced with data
of this type, an immediate question to ask is then how to compare them, so that we can apply techniques
like clustering, image recognition, or machine learning. In essence, we seek a distance measure, which given
two input graphs, returns a number representing how similar they are. Ideally, this value should be zero if
they are the same, and take increasing values for increasingly different embedded graphs.
While solving this problem perfectly is of course difficult in practice, many examples of graph distances
do exist in the literature. For example, one well-known one is the graph edit distance [10], which gives a cost
to inserting or removing vertices and edges; the distance is then the minimum possible sequence of edge and
vertex operations when transforming one graph into the other. Unfortunately, this particular measure does
not take any embedding information into account, so two graphs with identical vertices and weights which
are embedded in a very different configuration will still have zero distance.
In this paper, we develop and implement a technique to compare 2-dimensional embedded graphs, such
as are generated from GIS data or skeletonization of images. We note that there are available options which
are close to our work, where distances are specifically built to take the embedding into account [2, 9, 13,
14, 16, 19]. However, we seek to develop a faster pipeline which incorporates graph simplification while still
retaining at least part of the geometric embedding information. More specifically, we compare embedded
graphs by replacing them with a simpler graph which still encodes some aspect of the structure.
Our algorithm fixes a direction in the plane and computes the merge tree of the graph, which is a tree
with a real valued function representing how the connected components of sub-level sets of the graph changes
in that direction; see e.g. Fig. 1. To be able to compare merge trees in a meaningful way, we need a distance
function that is computable, accurate, comprehensive, and versatile. There are many possible options for
metrics to compare merge trees specifically, including the edit distance [21], and interleaving distance [15,
22–24]. In this paper, we use the branching distance for merge trees [17]. Then, to avoid bias caused by
fixing a single a direction, we rotate the graphs, calculating the merge trees at each rotation, then compare
1
arXiv:2210.10181v1 [cs.CG] 18 Oct 2022
all of the resulting merge trees. We take the median of these distances, naming this distance as average
branching distance.
The branching distance for merge trees [17] builds on the idea of a branch decomposition for a broader
class of trees, introduced in [4]. However, in the course of our work, we have discovered an issue with the
definition presented in [17], caused by ignoring the infinite tail present in merge trees. The result is that
the distance as defined does not satisfy the properties of a metric, and we provide counterexamples to that
effect. We then prove that this issue propogates into the average branching distance, so it also is not a
metric, semi-metric, or pseudo-metric. We also explore its theoretical properties on simple classes of graphs
such as convex polygons.
Despite this, we show that the average branch decomposition is still a useful tool for distinguishing
embedded graphs in practice. To test its utility on practical examples, we tested it with visualizations
such as dendrograms and lower dimensional embeddings, using a range of available data sets. Our code is
posted publicly [25]. We found that despite the theoretical issues, the distance still is potentially useful for
understanding real data.
2 Background
In this section, we give the relevant background terminology and results in graph theory and merge trees,
culminating in the definition of the branching distance for merge trees (Defn. 2.4).
2.1 Graph theory
Here, we provide a brief discussion of relevant terms from traditional graph theory, following [7].
Agraph G= (V, E) consists of a set of vertices V=V(G), and a set of unordered pairs of vertices called
edges E=E(G). An edge e=uv is incident to either of its endpoints uand v. An edge deletion operation
removes an edge from E(G). A vertex deletion operation removes a vertex vfrom V(G) and removes all
edges incident to vfrom E(G)
Aloop is an edge such that both endpoints are the same vertex. Two edges are called parallel if their
endpoints share the same vertices. A simple graph has no loops or parallel edges. The degree of a vertex v
is the number of edges incident to v, with loops counting as two edges.
Asubgraph Hof a graph G, denoted HG, is a graph with V(H)V(G) and E(H)E(H). Note
that a subgraph of Gcan be obtained by performing a combination of vertex deletions and/or edge deletions
on G. An induced subgraph of Gis a subgraph obtained solely by vertex deletions in G. In this case, we say
that His a subgraph induced by Aif V(H) = AV(G). Two graphs Gand Hare isomorphic when there
exists a bijective mapping between the vertices of Gand Hthat preserves adjacency.
Apath in a graph is a sequence of ordered vertices v1,· · · , vnwith edges vivi+1 for i= 1,· · · , n 1. A
graph is connected if every pair of vertices can be connected by a path. A connected component of a graph
Gis a connected subgraph HGwhich is not contained in a larger connected subgraph. A cycle is a path
whose start and end vertices are the same vertex. A graph is acyclic if it has no subgraph which is a cycle
graph. A tree is a connected acyclic graph, and a forest is a graph solely consisting of trees.
We assume we have a topological graph, where each edge can be considered as a copy of the unit interval.
Let Gbe a topological graph and let fbe a function on G,f:GR, providing every vertex and point
on each edge of the graph with a placement on the real number line. See an example in Fig. 1. For a fixed
aR, the sub-level set at ais an induced subgraph Hof Gsuch that any vertex with a function value in
(a, ) is deleted; that is, it is the subgraph induced by the set of vertices with function value f(v)a. In
this case, we write H=f1((−∞, a]). The sublevel set below ais the subgraph induced by the set of vertices
with function value f(v)< a. A component is a set of connected vertices in the sub-level set A.
2.2 Merge Trees
A merge tree is a representation of the connected components of sublevel sets of a graph with an R-valued
function. The vertices of a merge tree represent changes in connectedness of the input graph and its edges
2
g
f
e
d
c
ab
j
C1C2
g
f
e
d
c
ab
j
C3
C4
C3
g
f
e
d
c
ab
k
g
f
e
d
c
ab
k
C5
{a} {b}
{a,b}
{a,b,d}
{d}
Figure 1: The sublevel sets of a graph below and at two values, j, k R; and the tail-lessmerge tree of the
graph.
encode the relationships between these changes. In this way, the merge tree serves as a less complex repre-
sentation of the graph. What follows is a formal definition of a merge tree closely related to the dendrogram
definition of [11].
A merge tree Mis a structure defined on a graph Gwith a function f:GRwhich encodes the
changing connected components of f1(−∞, a] for a shifting value a. For simplicity, we assume the function
is entirely determined by values on the vertices f:V(G)Rgiven by v7→ f(vi) and extended linearly to
the edges. We use the following notation to define merge trees. Fix a connected set AR. Let f1(A)
denote the induced subgraph of Gcontaining just the vertices vGsuch that f(v)A. Fix a connected
subgraph CG. Define the minima µ(C) of Cto be the set of vertices in Chaving no neighbors of lesser
function value; that is
µ(C) = {vV(C)|f(v)f(w)(vw)E(C)}.
Let the identified components on Abe defined as Γ(A) = {µ(C)|Cis a connected component of f1(A)}.
Note that Γ(A) is a set of sets and that its elements are in one to one correspondence with the connected
components. Let the change in connectedness ∆(a) at abe defined as Γ((−∞, a]) \Γ((−∞, a)). Note that
there is a change in connectedness in the sublevel sets of Gat aif and only if ∆(a)6=.
We note that we are defining merge trees as used in [17], which means that we inherited a bug in the
definition from what they work with implicitly. Normally, a merge tree would have an infinite upwards
tail representing the fact that a connected component is always visible in f1(, a] for agreater than the
maximum value. However, [17] does not utilize this tail, cutting off the merge tree at the last merge. For this
reason, we call this construction the tail-less merge tree to distinguish it from more standard mathematical
constructions. For the remainder of the paper, we call this construction simply the “merge tree” unless there
is potential for confusion.
Definition 2.1 (Tail-less Merge Tree).The (tail-less) merge tree Mof a graph Gwith function f:GR
is a graph given by
V(M) = {L|L∆(a), a R},
E(M) = {L1L2|L1∆(a)for aR, L2Γ((−∞, a))
and L2L1},
with function defined by
fM:V(M)R
L7→ a|L∆(a).
To understand these definitions, consider the example of Fig. 1. First, consider function value j=f(c)
for vertex c. The highlighted portion of the leftmost graph represents f1((−∞, j)), which has only vertices
aand b. The two connected components are identified as C1and C2. Furthermore, µ(C1) = {a}and
µ(C2) = {b}. Thus, Γ((−∞, j)) = {{a},{b}}. The second graph represents f1((−∞, j]), where the only
connected component is identified as C3. Here, µ(C3) = {a, b}and Γ((−∞, j]) = {{a, b}}. It follows that
∆(j) = {{a, b}}.
3
Figure 2: Two (nearly identical) embedded graphs at two different rotations with the associated merge trees.
Now we will compute ∆(k), where k=f(e) for vertex eby considering the third and fourth graphs
in Fig. 1. We see that µ(C3) = {a, b}and µ(C4) = {d}. Thus, Γ((−∞, k)) = {{a, b},{d}} Furthermore,
µ(C5) = {a, b, d}and Γ((−∞, k]) = {{a, b, d}}. It follows that ∆(k) = {{a, b, d}}.
In the far right of Fig. 1, we show the merge tree for the input graph and function. Each vertex vof the
merge tree with f(v) = ais labeled with its corresponding element L∆(a). Edges are included between
vertices L1and L2when L1L2.
2.3 Merge tree for a fixed direction
In this paper, we will consider graphs with a map to R2. In the case that this map is injective, the result is
an embedded graph.1We can then pick a direction in the plane by fixing an angle ωfor the orientation and
calculating function values for a vertex as the magnitude of the projection of the original position vectors
onto ω. So, from the input f:GR2, we get a function fω:GRfor each angle ω[0,2π). We
normalize this function by shifting the function values to have median vertex value equal to 0. We get the
same function if we think of rotating the graph in the plane and computing the height function in the vertical
direction, i.e. fω(v) is given by the y-coordinate of f(v). Then, we can compute the merge tree of the graph
for each direction. See the example of Fig. 2.
2.4 Branching Distance
In order to compare two merge trees, we use the following distance as defined by Beketayev et al. [17] which
we call the branching distance. Note that this definition utilizes the tail-less merge tree construction as
defined above.
In what follows, we assume the tail-less merge tree is a non-empty, connected graph. The root of the
merge tree is the vertex with highest function value. A merge tree is trivial if it consists of a single vertex.
For a non-trivial, merge tree M, a saddle is any vertex with degree 2, and a minimum is any non-root
vertex with degree 1. For a trivial merge tree, the only available vertex is considered to be both the root
and a minimum. A root branch is a pairing of some minimum mrwith the highest function value vertex sr
of the merge tree.
Definition 2.2 (Branch Decomposition).A branch decomposition Bof a tail-less merge tree Mis collection
of pairs B={(mi, si)}iconsisting of a minima miand either a saddle or the root si. Every vertex appears
in at least one pair. Further, for each pair there is a descending path from the saddle sito the minimum mi,
no two such paths share an edge, and every edge appears in at least one such path.
We note that in the case of a trivial merge tree with a single vertex v, the only possible branch decom-
position consists of the single degenerate branch {(v, v)}. For a non-trivial merge tree in general position
(i.e. the number of edges adjacent and below any vertex is at most 2), then all saddles and minima (with the
possible exception of the root) appear in exactly one pair of the branch decomposition. If a saddle vertex has
1While we will often be interested in embedded graphs in practice, much of the work in this paper does not require the
assumptions of injectivity.
4
摘要:

ComparingEmbeddedGraphsUsingAverageBranchingDistanceLeventBatakciCaseWesternReserveUniversitylab192@case.eduAbigailBransonUnionUniversityabby.branson@my.uu.eduBryanCastilloMesaCommunityCollegebryancastillo98@gmail.comCandaceToddThePennsylvaniaStateUniversityclt5441@psu.eduErinWolfChambersSaintLouisU...

展开>> 收起<<
Comparing Embedded Graphs Using Average Branching Distance Levent Batakci Case Western Reserve University.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:763.34KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注