Comparing Embedded Graphs Using Average Branching Distance Levent Batakci Case Western Reserve University

2025-04-29 0 0 763.34KB 17 页 10玖币

侵权投诉

Comparing Embedded Graphs Using Average Branching Distance

Levent Batakci

Case Western Reserve University

lab192@case.edu

Abigail Branson

Union University

abby.branson@my.uu.edu

Bryan Castillo

Mesa Community College

bryancastillo98@gmail.com

Candace Todd

The Pennsylvania State

University

clt5441@psu.edu

Erin Wolf Chambers

Saint Louis University

erin.chambers@slu.edu

Elizabeth Munch

Michigan State University

muncheli@msu.edu

Abstract

Graphs drawn in the plane are ubiquitous, arising from data sets through a variety of methods ranging

from GIS analysis to image classiﬁcation to shape analysis. A fundamental problem in this type of data

is comparison: given a set of such graphs, can we rank how similar they are, in such a way that we

capture their geometric “shape” in the plane?

In this paper we explore a method to compare two such embedded graphs, via a simpliﬁed combina-

torial representation called a tail-less merge tree which encodes the structure based on a ﬁxed direction.

First, we examine the properties of a distance designed to compare merge trees called the branching dis-

tance, and show that the distance as deﬁned in previous work fails to satisfy some of the requirements of

a metric. We incorporate this into a new distance function called average branching distance to compare

graphs by looking at the branching distance for merge trees deﬁned over many directions. Despite the

theoretical issues, we show that the deﬁnition is still quite useful in practice by using our open-source

code to cluster data sets of embedded graphs.

1 Introduction

Embedded graphs appear in a wide variety of applications, including map reconstructions, GIS data, shape

analysis, and medical imaging. Such graphs are more than simple abstract representations, since they have

geometric information attached, usually in the form of coordinates and edge lengths. When faced with data

of this type, an immediate question to ask is then how to compare them, so that we can apply techniques

like clustering, image recognition, or machine learning. In essence, we seek a distance measure, which given

two input graphs, returns a number representing how similar they are. Ideally, this value should be zero if

they are the same, and take increasing values for increasingly diﬀerent embedded graphs.

While solving this problem perfectly is of course diﬃcult in practice, many examples of graph distances

do exist in the literature. For example, one well-known one is the graph edit distance [10], which gives a cost

to inserting or removing vertices and edges; the distance is then the minimum possible sequence of edge and

vertex operations when transforming one graph into the other. Unfortunately, this particular measure does

not take any embedding information into account, so two graphs with identical vertices and weights which

are embedded in a very diﬀerent conﬁguration will still have zero distance.

In this paper, we develop and implement a technique to compare 2-dimensional embedded graphs, such

as are generated from GIS data or skeletonization of images. We note that there are available options which

are close to our work, where distances are speciﬁcally built to take the embedding into account [2, 9, 13,

14, 16, 19]. However, we seek to develop a faster pipeline which incorporates graph simpliﬁcation while still

retaining at least part of the geometric embedding information. More speciﬁcally, we compare embedded

graphs by replacing them with a simpler graph which still encodes some aspect of the structure.

Our algorithm ﬁxes a direction in the plane and computes the merge tree of the graph, which is a tree

with a real valued function representing how the connected components of sub-level sets of the graph changes

in that direction; see e.g. Fig. 1. To be able to compare merge trees in a meaningful way, we need a distance

function that is computable, accurate, comprehensive, and versatile. There are many possible options for

metrics to compare merge trees speciﬁcally, including the edit distance [21], and interleaving distance [15,

22–24]. In this paper, we use the branching distance for merge trees [17]. Then, to avoid bias caused by

ﬁxing a single a direction, we rotate the graphs, calculating the merge trees at each rotation, then compare

arXiv:2210.10181v1 [cs.CG] 18 Oct 2022

all of the resulting merge trees. We take the median of these distances, naming this distance as average

branching distance.

The branching distance for merge trees [17] builds on the idea of a branch decomposition for a broader

class of trees, introduced in [4]. However, in the course of our work, we have discovered an issue with the

deﬁnition presented in [17], caused by ignoring the inﬁnite tail present in merge trees. The result is that

the distance as deﬁned does not satisfy the properties of a metric, and we provide counterexamples to that

eﬀect. We then prove that this issue propogates into the average branching distance, so it also is not a

metric, semi-metric, or pseudo-metric. We also explore its theoretical properties on simple classes of graphs

such as convex polygons.

Despite this, we show that the average branch decomposition is still a useful tool for distinguishing

embedded graphs in practice. To test its utility on practical examples, we tested it with visualizations

such as dendrograms and lower dimensional embeddings, using a range of available data sets. Our code is

posted publicly [25]. We found that despite the theoretical issues, the distance still is potentially useful for

understanding real data.

2 Background

In this section, we give the relevant background terminology and results in graph theory and merge trees,

culminating in the deﬁnition of the branching distance for merge trees (Defn. 2.4).

2.1 Graph theory

Here, we provide a brief discussion of relevant terms from traditional graph theory, following [7].

Agraph G= (V, E) consists of a set of vertices V=V(G), and a set of unordered pairs of vertices called

edges E=E(G). An edge e=uv is incident to either of its endpoints uand v. An edge deletion operation

removes an edge from E(G). A vertex deletion operation removes a vertex vfrom V(G) and removes all

edges incident to vfrom E(G)

Aloop is an edge such that both endpoints are the same vertex. Two edges are called parallel if their

endpoints share the same vertices. A simple graph has no loops or parallel edges. The degree of a vertex v

is the number of edges incident to v, with loops counting as two edges.

Asubgraph Hof a graph G, denoted H⊆G, is a graph with V(H)⊆V(G) and E(H)⊆E(H). Note

that a subgraph of Gcan be obtained by performing a combination of vertex deletions and/or edge deletions

on G. An induced subgraph of Gis a subgraph obtained solely by vertex deletions in G. In this case, we say

that His a subgraph induced by Aif V(H) = A⊆V(G). Two graphs Gand Hare isomorphic when there

exists a bijective mapping between the vertices of Gand Hthat preserves adjacency.

Apath in a graph is a sequence of ordered vertices v1,· · · , vnwith edges vivi+1 for i= 1,· · · , n −1. A

graph is connected if every pair of vertices can be connected by a path. A connected component of a graph

Gis a connected subgraph H⊆Gwhich is not contained in a larger connected subgraph. A cycle is a path

whose start and end vertices are the same vertex. A graph is acyclic if it has no subgraph which is a cycle

graph. A tree is a connected acyclic graph, and a forest is a graph solely consisting of trees.

We assume we have a topological graph, where each edge can be considered as a copy of the unit interval.

Let Gbe a topological graph and let fbe a function on G,f:G→R, providing every vertex and point

on each edge of the graph with a placement on the real number line. See an example in Fig. 1. For a ﬁxed

a∈R, the sub-level set at ais an induced subgraph Hof Gsuch that any vertex with a function value in

(a, ∞) is deleted; that is, it is the subgraph induced by the set of vertices with function value f(v)≤a. In

this case, we write H=f−1((−∞, a]). The sublevel set below ais the subgraph induced by the set of vertices

with function value f(v)< a. A component is a set of connected vertices in the sub-level set A.

2.2 Merge Trees

A merge tree is a representation of the connected components of sublevel sets of a graph with an R-valued

function. The vertices of a merge tree represent changes in connectedness of the input graph and its edges

C1C2

{a} {b}

{a,b}

{a,b,d}

{d}

Figure 1: The sublevel sets of a graph below and at two values, j, k ∈R; and the tail-lessmerge tree of the

graph.

encode the relationships between these changes. In this way, the merge tree serves as a less complex repre-

sentation of the graph. What follows is a formal deﬁnition of a merge tree closely related to the dendrogram

deﬁnition of [11].

A merge tree Mis a structure deﬁned on a graph Gwith a function f:G→Rwhich encodes the

changing connected components of f−1(−∞, a] for a shifting value a. For simplicity, we assume the function

is entirely determined by values on the vertices f:V(G)→Rgiven by v7→ f(vi) and extended linearly to

the edges. We use the following notation to deﬁne merge trees. Fix a connected set A⊆R. Let f−1(A)

denote the induced subgraph of Gcontaining just the vertices v∈Gsuch that f(v)∈A. Fix a connected

subgraph C⊆G. Deﬁne the minima µ(C) of Cto be the set of vertices in Chaving no neighbors of lesser

function value; that is

µ(C) = {v∈V(C)|f(v)≤f(w)∀(vw)∈E(C)}.

Let the identiﬁed components on Abe deﬁned as Γ(A) = {µ(C)|Cis a connected component of f−1(A)}.

Note that Γ(A) is a set of sets and that its elements are in one to one correspondence with the connected

components. Let the change in connectedness ∆(a) at abe deﬁned as Γ((−∞, a]) \Γ((−∞, a)). Note that

there is a change in connectedness in the sublevel sets of Gat aif and only if ∆(a)6=∅.

We note that we are deﬁning merge trees as used in [17], which means that we inherited a bug in the

deﬁnition from what they work with implicitly. Normally, a merge tree would have an inﬁnite upwards

tail representing the fact that a connected component is always visible in f−1(∞, a] for agreater than the

maximum value. However, [17] does not utilize this tail, cutting oﬀ the merge tree at the last merge. For this

reason, we call this construction the tail-less merge tree to distinguish it from more standard mathematical

constructions. For the remainder of the paper, we call this construction simply the “merge tree” unless there

is potential for confusion.

Deﬁnition 2.1 (Tail-less Merge Tree).The (tail-less) merge tree Mof a graph Gwith function f:G→R

is a graph given by

V(M) = {L|L∈∆(a), a ∈R},

E(M) = {L1L2|L1∈∆(a)for a∈R, L2∈Γ((−∞, a))

and L2⊂L1},

with function deﬁned by

fM:V(M)→R

L7→ a|L∈∆(a).

To understand these deﬁnitions, consider the example of Fig. 1. First, consider function value j=f(c)

for vertex c. The highlighted portion of the leftmost graph represents f−1((−∞, j)), which has only vertices

aand b. The two connected components are identiﬁed as C1and C2. Furthermore, µ(C1) = {a}and

µ(C2) = {b}. Thus, Γ((−∞, j)) = {{a},{b}}. The second graph represents f−1((−∞, j]), where the only

connected component is identiﬁed as C3. Here, µ(C3) = {a, b}and Γ((−∞, j]) = {{a, b}}. It follows that

∆(j) = {{a, b}}.

Figure 2: Two (nearly identical) embedded graphs at two diﬀerent rotations with the associated merge trees.

Now we will compute ∆(k), where k=f(e) for vertex eby considering the third and fourth graphs

in Fig. 1. We see that µ(C3) = {a, b}and µ(C4) = {d}. Thus, Γ((−∞, k)) = {{a, b},{d}} Furthermore,

µ(C5) = {a, b, d}and Γ((−∞, k]) = {{a, b, d}}. It follows that ∆(k) = {{a, b, d}}.

In the far right of Fig. 1, we show the merge tree for the input graph and function. Each vertex vof the

merge tree with f(v) = ais labeled with its corresponding element L∈∆(a). Edges are included between

vertices L1and L2when L1⊂L2.

2.3 Merge tree for a ﬁxed direction

In this paper, we will consider graphs with a map to R2. In the case that this map is injective, the result is

an embedded graph.1We can then pick a direction in the plane by ﬁxing an angle ωfor the orientation and

calculating function values for a vertex as the magnitude of the projection of the original position vectors

onto ω. So, from the input f:G→R2, we get a function fω:G→Rfor each angle ω∈[0,2π). We

normalize this function by shifting the function values to have median vertex value equal to 0. We get the

same function if we think of rotating the graph in the plane and computing the height function in the vertical

direction, i.e. fω(v) is given by the y-coordinate of f(v). Then, we can compute the merge tree of the graph

for each direction. See the example of Fig. 2.

2.4 Branching Distance

In order to compare two merge trees, we use the following distance as deﬁned by Beketayev et al. [17] which

we call the branching distance. Note that this deﬁnition utilizes the tail-less merge tree construction as

deﬁned above.

In what follows, we assume the tail-less merge tree is a non-empty, connected graph. The root of the

merge tree is the vertex with highest function value. A merge tree is trivial if it consists of a single vertex.

For a non-trivial, merge tree M, a saddle is any vertex with degree ≥2, and a minimum is any non-root

vertex with degree 1. For a trivial merge tree, the only available vertex is considered to be both the root

and a minimum. A root branch is a pairing of some minimum mrwith the highest function value vertex sr

of the merge tree.

Deﬁnition 2.2 (Branch Decomposition).A branch decomposition Bof a tail-less merge tree Mis collection

of pairs B={(mi, si)}iconsisting of a minima miand either a saddle or the root si. Every vertex appears

in at least one pair. Further, for each pair there is a descending path from the saddle sito the minimum mi,

no two such paths share an edge, and every edge appears in at least one such path.

We note that in the case of a trivial merge tree with a single vertex v, the only possible branch decom-

position consists of the single degenerate branch {(v, v)}. For a non-trivial merge tree in general position

(i.e. the number of edges adjacent and below any vertex is at most 2), then all saddles and minima (with the

possible exception of the root) appear in exactly one pair of the branch decomposition. If a saddle vertex has

1While we will often be interested in embedded graphs in practice, much of the work in this paper does not require the

assumptions of injectivity.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ComparingEmbeddedGraphsUsingAverageBranchingDistanceLeventBatakciCaseWesternReserveUniversitylab192@case.eduAbigailBransonUnionUniversityabby.branson@my.uu.eduBryanCastilloMesaCommunityCollegebryancastillo98@gmail.comCandaceToddThePennsylvaniaStateUniversityclt5441@psu.eduErinWolfChambersSaintLouisU...

展开>> 收起<<

Comparing Embedded Graphs Using Average Branching Distance Levent Batakci Case Western Reserve University.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Comparing Embedded Graphs Using Average Branching Distance Levent Batakci Case Western Reserve University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: