Comparing Embedded Graphs Using Average Branching Distance
Levent Batakci
Case Western Reserve University
lab192@case.edu
Abigail Branson
Union University
abby.branson@my.uu.edu
Bryan Castillo
Mesa Community College
bryancastillo98@gmail.com
Candace Todd
The Pennsylvania State
University
clt5441@psu.edu
Erin Wolf Chambers
Saint Louis University
erin.chambers@slu.edu
Elizabeth Munch
Michigan State University
muncheli@msu.edu
Abstract
Graphs drawn in the plane are ubiquitous, arising from data sets through a variety of methods ranging
from GIS analysis to image classification to shape analysis. A fundamental problem in this type of data
is comparison: given a set of such graphs, can we rank how similar they are, in such a way that we
capture their geometric “shape” in the plane?
In this paper we explore a method to compare two such embedded graphs, via a simplified combina-
torial representation called a tail-less merge tree which encodes the structure based on a fixed direction.
First, we examine the properties of a distance designed to compare merge trees called the branching dis-
tance, and show that the distance as defined in previous work fails to satisfy some of the requirements of
a metric. We incorporate this into a new distance function called average branching distance to compare
graphs by looking at the branching distance for merge trees defined over many directions. Despite the
theoretical issues, we show that the definition is still quite useful in practice by using our open-source
code to cluster data sets of embedded graphs.
1 Introduction
Embedded graphs appear in a wide variety of applications, including map reconstructions, GIS data, shape
analysis, and medical imaging. Such graphs are more than simple abstract representations, since they have
geometric information attached, usually in the form of coordinates and edge lengths. When faced with data
of this type, an immediate question to ask is then how to compare them, so that we can apply techniques
like clustering, image recognition, or machine learning. In essence, we seek a distance measure, which given
two input graphs, returns a number representing how similar they are. Ideally, this value should be zero if
they are the same, and take increasing values for increasingly different embedded graphs.
While solving this problem perfectly is of course difficult in practice, many examples of graph distances
do exist in the literature. For example, one well-known one is the graph edit distance [10], which gives a cost
to inserting or removing vertices and edges; the distance is then the minimum possible sequence of edge and
vertex operations when transforming one graph into the other. Unfortunately, this particular measure does
not take any embedding information into account, so two graphs with identical vertices and weights which
are embedded in a very different configuration will still have zero distance.
In this paper, we develop and implement a technique to compare 2-dimensional embedded graphs, such
as are generated from GIS data or skeletonization of images. We note that there are available options which
are close to our work, where distances are specifically built to take the embedding into account [2, 9, 13,
14, 16, 19]. However, we seek to develop a faster pipeline which incorporates graph simplification while still
retaining at least part of the geometric embedding information. More specifically, we compare embedded
graphs by replacing them with a simpler graph which still encodes some aspect of the structure.
Our algorithm fixes a direction in the plane and computes the merge tree of the graph, which is a tree
with a real valued function representing how the connected components of sub-level sets of the graph changes
in that direction; see e.g. Fig. 1. To be able to compare merge trees in a meaningful way, we need a distance
function that is computable, accurate, comprehensive, and versatile. There are many possible options for
metrics to compare merge trees specifically, including the edit distance [21], and interleaving distance [15,
22–24]. In this paper, we use the branching distance for merge trees [17]. Then, to avoid bias caused by
fixing a single a direction, we rotate the graphs, calculating the merge trees at each rotation, then compare
1
arXiv:2210.10181v1 [cs.CG] 18 Oct 2022