Learning Ultrametric Trees for Optimal Transport Regression Samantha Chen1 Puoya Tabaghi2 and Yusu Wang2

2025-05-02 0 0 2.21MB 22 页 10玖币
侵权投诉
Learning Ultrametric Trees for Optimal Transport
Regression
Samantha Chen 1, Puoya Tabaghi 2, and Yusu Wang 2
1Department of Computer Science and Engineering, University of California
- San Diego
2Halıcıo˘glu Data Science Institute, University of California - San Diego
January 30, 2024
Abstract
Optimal transport provides a metric which quantifies the dissimilarity between
probability measures. For measures supported in discrete metric spaces, finding the
optimal transport distance has cubic time complexity in the size of the space. How-
ever, measures supported on trees admit a closed-form optimal transport that can be
computed in linear time. In this paper, we aim to find an optimal tree structure for
a given discrete metric space so that the tree-Wasserstein distance approximates the
optimal transport distance in the original space. One of our key ideas is to cast the
problem in ultrametric spaces. This helps us optimize over the space of ultrametric
trees — a mixed-discrete and continuous optimization problem — via projected gradi-
ent decent over the space of ultrametric matrices. During optimization, we project the
parameters to the ultrametric space via a hierarchical minimum spanning tree algo-
rithm, equivalent to the closest projection to ultrametrics under the supremum norm.
Experimental results on real datasets show that our approach outperforms previous
approaches (e.g. Flowtree, Quadtree) in approximating optimal transport distances.
Finally, experiments on synthetic data generated on ground truth trees show that our
algorithm can accurately uncover the underlying trees.
1 Introduction
First formulated by Gaspard Monge in 18th-century France, the optimal transport problem
is often explained by analogy to the problem of minimizing the time spent transporting coal
from mines to factories. More formally, given two distributions and a transportation cost,
sac003@ucsd.edu
ptabaghi@ucsd.edu
yusuwang@ucsd.edu
1
arXiv:2210.12288v2 [cs.LG] 26 Jan 2024
optimal transport aims to find the lowest-cost way of moving points from the first distribution
to the second one. When the transportation cost is a metric, the optimal transport distance
is also called the 1-Wasserstein distance (Villani, 2009).
Optimal transport is applied in many areas such as machine learning (Solomon et al.,
2014; Frogner et al., 2015; Montavon et al., 2016; Kolouri et al., 2017; Arjovsky et al., 2017;
Genevay et al., 2016; Lee and Raginsky, 2018), statistics (El Moselhy and Marzouk, 2012;
Reich, 2013; Panaretos and Zemel, 2016), and computer graphics (Dominitz and Tannen-
baum, 2009; Rubner et al., 2000; Rabin et al., 2011; Lavenant et al., 2018; Solomon et al.,
2015). Since the optimal transport distance is computationally expensive (cubic in the num-
ber of points), several methods have emerged to efficiently approximate optimal transport
distance. One of the most popular methods is the Sinkhorn distance, which uses entropic
regularization to compute an approximation of optimal transport in quadratic time (Cuturi,
2013).
Another approach relies on approximating the original metric with a tree metric (Evans
and Matsen, 2012; Le et al., 2019a; Indyk and Thaper, 2003; Takezawa et al., 2021; Yamada
et al., 2022). While this approach yields a coarser approximation of the optimal transport, it
has linear time complexity with respect to the size of the metric space. The classic Quadtree
is one of the most widely used tree approximations methods. It recursively partitions a metric
space into four quadrants to construct internal nodes and represents each element of the
original discrete metric space as a leaf node. Quadtree Wasserstein distance approximates
the true optimal transport distance with a logarithmic distortion (Indyk and Thaper, 2003).
Flowtree (Backurs et al., 2019) and sliced-tree Wasserstein (Le et al., 2019b) are Quadtree-
based methods designed to improve over the 1-Wasserstein distance approximation of the
Quadtree algorithm. A main drawback of Quadtree-based methods is that they require a
Euclidean embedding of the original discrete metric space. In contrast, clustertree is a tree-
based approximation that does not require a Euclidean embedding of the original discrete
metric space (Le et al., 2019b). Given a fixed Quadtree or clustertree topology, Yamada
et al. (2022) propose a state-of-the-art method based on optimizing the weights on the tree
to best approximate Wasserstein distances but does not change the topology of the input
tree.
Our goal is to find the tree topology and weight that closely approximates the Wasserstein
distance in the original metric space. To achieve this, we introduce a projected gradient
descent procedure over the space of ultrametrics to find a tree that approximates the original
Wasserstein distances. As constraining the problem to tree metrics is challenging in general,
we instead use ultrametrics — a subfamily of tree metrics — as a proxy for tree weights and
structure.
Problem Statement (informal) Let Xbe a point set in a metric space and W1(·,·) be
the optimal transport distance. We want to learn an ultrametric on Xsuch that ultrametric
optimal transport Wuapproximates W1. To achieve this, we cast this as a regression problem
over ultrametric trees.
The key advantage of optimizing via ultrametrics is that we can project any (semi)metric
to the ultrametrics; see Section 3. This projection allows us to optimize in the space of ultra-
metrics via a projected gradient descent-type procedure. Our contributions are as follows.
1. We define a quadratic cost function to measure the discrepancy between the true and
ultrametric optimal transport distances. This cost does not require the point positions a
2
Figure 1: An overall summary of the our projected gradient descent procedure. (a) Measure-
ments are points in a metric space, vertices of a weighted graph, or a distance matrix. (b0)
Hierarchical minimum spanning tree builds an ultrametric tree given a semimetric matrix.
(b1) We compute a distance matrix for the tree leaves. (b2) We update the distance matrix
by applying gradient descent on the optimal transport regression cost.
priori but rather is parameterized by pairwise dissimilarity measures between points, i.e., it
does not assume that input points are embedded, or even equipped with a metric.
2. We then propose a projected gradient descent method to perform optimization in
ultrametric spaces. The proposed optimization process learns a weighted tree structure.
Our proposed method adjusts both tree weights and structure throughout training. This
is a novel contribution to existing work on tree-Wasserstein approximation. In previous
methods, either the tree structure is fixed Yamada et al. (2022); Backurs et al. (2019) or it
is determined by approximating the discrete metric — not the Wasserstein distances Indyk
and Thaper (2003); Le et al. (2019b).
3. The learned ultrametric trees provide more accurate optimal transport approximations
compared to Flowtree and Quadtree for various real-world datasets. Our method performs
slightly worse than the weight-optimized methods (Yamada et al., 2022) on real-world dis-
tributions with sparse support; however, for denser synthetic distributions, it outperforms
all aforementioned methods. The computational complexity of approximating the optimal
transport distance at inference time with our learned ultrametric tree is O(N) similar to
other tree approximation methods.
Notation. For NN, we define [N] = {1, . . . , N}. We denote the set of nonnegative real
numbers as R+. For a vector xRd, we denote its pnorm as xp. Given a metric space
X, we denote the space of probability measures over Xas P(X). For any x, y R, we let
xy= max{x, y}. For matrices X, Y Rd1×d2, we let X, Y = tr(XY). Let Xto be
a finite discrete set. A semimetric over Xis the function ds:X × X R+that does not
necessarily satisfy the triangle inequality.
2 Preliminaries
Wasserstein Distance. The Wasserstein distance provides a metric for the space of prob-
ability distributions supported on a compact metric space. We focus on the 1-Wasserstein
distance (or the optimal transport distance) for discrete probability distributions. For a
discrete set X={xn:n[N]}, we can compute the Wasserstein distance by solving the
3
following linear programming problem:
W1(µ, ρ) = min
ΠRN×N
+
{⟨Π, D: Π1 = ρ, Π1 = µ}(1)
where D=d(xn1, xn2)n1,n2[N]RN×N
+is the distance matrix, measures µand ρare N-
dimensional vectors, viz., 1µ= 1, µ 0. When Dis any arbitrary cost matrix, we can still
solve the optimal transport problem. Solving this linear programming problem has a time
complexity of O(N3log N) (Pele and Werman, 2009).
Tree Wasserstein Distance. Consider a weighted tree T= (V, E) with metric dT
V×VR+. For nodes v1, v2V, let Pv1,v2be the unique path between them, and let
λbe the length measure on Tsuch that dT(v1, v2) = λ(Pv1,v2). We define Tvas the set of
nodes contained in the subtree of Trooted at vrV, i.e., Tv={vV:vPvr,v}. Given
the metric space (T, dT) and measures µand ρ∈ P(T), Theorem 2.1 provides a closed-form
expression for the 1-Wasserstein distance WT(µ, ρ).
Theorem 2.1. Given two measures µ, ρ supported on T= (V, E)with metric dT, we have
WT(µ, ρ) = X
eE
we|µ(Tve)ρ(Tve)|,(2)
where weis the weight of edge eE, and veis the node of eEthat is farther from the
root (Le et al., 2019b).
From Theorem 2.1, we can compute the tree Wasserstein distance using a simple greedy
matching algorithm where mass from measures µand ρis pushed from child to parent nodes
and matched at parent nodes. This involves computing |µ(Tve)ρ(Tve)|for all nodes {ve:
eE}. We denote the optimal coupling associated with the tree Wasserstein distance as
ΠT
µ,ρ Γ(µ, ρ). Theorem 2.1 also provides a natural embedding for probability distributions
on a tree to the 1space, as stated in Corollary 2.2. See Appendix A for discussion on tree
Wasserstein.
Corollary 2.2. Let WTbe the set of probability measures defined on a tree T. Then, the
tree Wasserstein space can be isometrically embedded in the 1space.
3 Optimal Transport Regression in Ultrametric Spaces
Our goal is to learn a tree metric on a discrete point set such that its optimal transport
distance approximates the measured optimal transport distances. We define an optimization
problem on ultrametrics as a proxy for tree metrics.
Definition 3.1. Consider the set X. An ultrametric function du:X × X R+is a metric
on Xthat also satisfies the strong triangle inequality, i.e.,
x, y, z ∈ X :du(x, y)du(x, z)du(y, z).
4
Figure 2: A stylized example of how the tree structure vary with heights of least common
ancestors. Notice that in the second step, the heights of nodes fand chave increased while
the height of node ghas decreased. Due to these changes in height, the recomputed distance
between nodes cand dis smaller than the recomputed distance between aand bso cand d
are the first to be clustered in the final projection to the new ultrametric (instead of node c
clustering with aand bin the original ultrametric).
Any compact ultrametric space can be represented by a rooted tree denoted as T=
(vr, V, E), where vris the root node and Xis the set of leaves. This representation also
includes a height function h:VRwith the following property: if vis the lowest common
ancestor of the leaves xand y∈ X (denoted LCA(x, y)), then h(v) = du(x, y). Furthermore,
we induce a weight on the edges of the rooted tree Tas follows: given an edge (v1, v2)E
where v1is closer to the root (or v1is the parent of v2), we let w(v1, v2) = h(v1)h(v2).
Then, the weighted tree distance between xand yis related to the heights as follows:
dT(x, y) = 2 ·h(LCA(x, y)) h(x)h(y) (3)
= 2 ·du(x, y)du(x, x)du(y, y),
i.e., du(x, y) = 1
2dT(x, y). For a discrete ultrametric space (X, du), we compute the optimal
transport distance as:
Wu(µ, ρ) = ΠT
µ,ρ, Du
where ΠT
µ,ρ Γ(µ, ρ) is the optimal coupling for the tree Trepresenting the ultramet-
ric space — constructed via the greedy matching described in Section 2 — and Du=
(du(xi, xj))i,j[N]RN×N
+. We now formalize the problem of learning an ultrametric on
Xfor 1-Wasserstein distance approximation:
Problem 3.2. Given an arbitrary discrete set Xendowed with a semimetric, we want to
find an ultrametric function du:X × X R+such that for a given set of distributions
S ⊆ P(X), we minimize the following cost function:
C(du) = X
µ,ρ∈S
W(µ, ρ)Wu(µ, ρ)2
.
5
摘要:

LearningUltrametricTreesforOptimalTransportRegressionSamanthaChen∗1,PuoyaTabaghi†2,andYusuWang‡21DepartmentofComputerScienceandEngineering,UniversityofCalifornia-SanDiego2Halıcıo˘gluDataScienceInstitute,UniversityofCalifornia-SanDiegoJanuary30,2024AbstractOptimaltransportprovidesametricwhichquantifi...

展开>> 收起<<
Learning Ultrametric Trees for Optimal Transport Regression Samantha Chen1 Puoya Tabaghi2 and Yusu Wang2.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:2.21MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注