Learning Ultrametric Trees for Optimal Transport Regression Samantha Chen1 Puoya Tabaghi2 and Yusu Wang2

2025-05-02 1 0 2.21MB 22 页 10玖币

侵权投诉

Learning Ultrametric Trees for Optimal Transport

Regression

Samantha Chen ∗1, Puoya Tabaghi †2, and Yusu Wang ‡2

1Department of Computer Science and Engineering, University of California

- San Diego

2Halıcıo˘glu Data Science Institute, University of California - San Diego

January 30, 2024

Abstract

Optimal transport provides a metric which quantiﬁes the dissimilarity between

probability measures. For measures supported in discrete metric spaces, ﬁnding the

optimal transport distance has cubic time complexity in the size of the space. How-

ever, measures supported on trees admit a closed-form optimal transport that can be

computed in linear time. In this paper, we aim to ﬁnd an optimal tree structure for

a given discrete metric space so that the tree-Wasserstein distance approximates the

optimal transport distance in the original space. One of our key ideas is to cast the

problem in ultrametric spaces. This helps us optimize over the space of ultrametric

trees — a mixed-discrete and continuous optimization problem — via projected gradi-

ent decent over the space of ultrametric matrices. During optimization, we project the

parameters to the ultrametric space via a hierarchical minimum spanning tree algo-

rithm, equivalent to the closest projection to ultrametrics under the supremum norm.

Experimental results on real datasets show that our approach outperforms previous

approaches (e.g. Flowtree, Quadtree) in approximating optimal transport distances.

Finally, experiments on synthetic data generated on ground truth trees show that our

algorithm can accurately uncover the underlying trees.

1 Introduction

First formulated by Gaspard Monge in 18th-century France, the optimal transport problem

is often explained by analogy to the problem of minimizing the time spent transporting coal

from mines to factories. More formally, given two distributions and a transportation cost,

∗sac003@ucsd.edu

†ptabaghi@ucsd.edu

‡yusuwang@ucsd.edu

arXiv:2210.12288v2 [cs.LG] 26 Jan 2024

optimal transport aims to ﬁnd the lowest-cost way of moving points from the ﬁrst distribution

to the second one. When the transportation cost is a metric, the optimal transport distance

is also called the 1-Wasserstein distance (Villani, 2009).

Optimal transport is applied in many areas such as machine learning (Solomon et al.,

2014; Frogner et al., 2015; Montavon et al., 2016; Kolouri et al., 2017; Arjovsky et al., 2017;

Genevay et al., 2016; Lee and Raginsky, 2018), statistics (El Moselhy and Marzouk, 2012;

Reich, 2013; Panaretos and Zemel, 2016), and computer graphics (Dominitz and Tannen-

baum, 2009; Rubner et al., 2000; Rabin et al., 2011; Lavenant et al., 2018; Solomon et al.,

2015). Since the optimal transport distance is computationally expensive (cubic in the num-

ber of points), several methods have emerged to eﬃciently approximate optimal transport

distance. One of the most popular methods is the Sinkhorn distance, which uses entropic

regularization to compute an approximation of optimal transport in quadratic time (Cuturi,

2013).

Another approach relies on approximating the original metric with a tree metric (Evans

and Matsen, 2012; Le et al., 2019a; Indyk and Thaper, 2003; Takezawa et al., 2021; Yamada

et al., 2022). While this approach yields a coarser approximation of the optimal transport, it

has linear time complexity with respect to the size of the metric space. The classic Quadtree

is one of the most widely used tree approximations methods. It recursively partitions a metric

space into four quadrants to construct internal nodes and represents each element of the

original discrete metric space as a leaf node. Quadtree Wasserstein distance approximates

the true optimal transport distance with a logarithmic distortion (Indyk and Thaper, 2003).

Flowtree (Backurs et al., 2019) and sliced-tree Wasserstein (Le et al., 2019b) are Quadtree-

based methods designed to improve over the 1-Wasserstein distance approximation of the

Quadtree algorithm. A main drawback of Quadtree-based methods is that they require a

Euclidean embedding of the original discrete metric space. In contrast, clustertree is a tree-

based approximation that does not require a Euclidean embedding of the original discrete

metric space (Le et al., 2019b). Given a ﬁxed Quadtree or clustertree topology, Yamada

et al. (2022) propose a state-of-the-art method based on optimizing the weights on the tree

to best approximate Wasserstein distances but does not change the topology of the input

tree.

Our goal is to ﬁnd the tree topology and weight that closely approximates the Wasserstein

distance in the original metric space. To achieve this, we introduce a projected gradient

descent procedure over the space of ultrametrics to ﬁnd a tree that approximates the original

Wasserstein distances. As constraining the problem to tree metrics is challenging in general,

we instead use ultrametrics — a subfamily of tree metrics — as a proxy for tree weights and

structure.

Problem Statement (informal) Let Xbe a point set in a metric space and W1(·,·) be

the optimal transport distance. We want to learn an ultrametric on Xsuch that ultrametric

optimal transport Wuapproximates W1. To achieve this, we cast this as a regression problem

over ultrametric trees.

The key advantage of optimizing via ultrametrics is that we can project any (semi)metric

to the ultrametrics; see Section 3. This projection allows us to optimize in the space of ultra-

metrics via a projected gradient descent-type procedure. Our contributions are as follows.

1. We deﬁne a quadratic cost function to measure the discrepancy between the true and

ultrametric optimal transport distances. This cost does not require the point positions a

Figure 1: An overall summary of the our projected gradient descent procedure. (a) Measure-

ments are points in a metric space, vertices of a weighted graph, or a distance matrix. (b0)

Hierarchical minimum spanning tree builds an ultrametric tree given a semimetric matrix.

(b1) We compute a distance matrix for the tree leaves. (b2) We update the distance matrix

by applying gradient descent on the optimal transport regression cost.

priori but rather is parameterized by pairwise dissimilarity measures between points, i.e., it

does not assume that input points are embedded, or even equipped with a metric.

2. We then propose a projected gradient descent method to perform optimization in

ultrametric spaces. The proposed optimization process learns a weighted tree structure.

Our proposed method adjusts both tree weights and structure throughout training. This

is a novel contribution to existing work on tree-Wasserstein approximation. In previous

methods, either the tree structure is ﬁxed Yamada et al. (2022); Backurs et al. (2019) or it

is determined by approximating the discrete metric — not the Wasserstein distances Indyk

and Thaper (2003); Le et al. (2019b).

3. The learned ultrametric trees provide more accurate optimal transport approximations

compared to Flowtree and Quadtree for various real-world datasets. Our method performs

slightly worse than the weight-optimized methods (Yamada et al., 2022) on real-world dis-

tributions with sparse support; however, for denser synthetic distributions, it outperforms

all aforementioned methods. The computational complexity of approximating the optimal

transport distance at inference time with our learned ultrametric tree is O(N) similar to

other tree approximation methods.

Notation. For N∈N, we deﬁne [N] = {1, . . . , N}. We denote the set of nonnegative real

numbers as R+. For a vector x∈Rd, we denote its ℓpnorm as ∥x∥p. Given a metric space

X, we denote the space of probability measures over Xas P(X). For any x, y ∈R, we let

x∧y= max{x, y}. For matrices X, Y ∈Rd1×d2, we let ⟨X, Y ⟩= tr(X⊤Y). Let Xto be

a ﬁnite discrete set. A semimetric over Xis the function ds:X × X → R+that does not

necessarily satisfy the triangle inequality.

2 Preliminaries

Wasserstein Distance. The Wasserstein distance provides a metric for the space of prob-

ability distributions supported on a compact metric space. We focus on the 1-Wasserstein

distance (or the optimal transport distance) for discrete probability distributions. For a

discrete set X={xn:n∈[N]}, we can compute the Wasserstein distance by solving the

following linear programming problem:

W1(µ, ρ) = min

Π∈RN×N

{⟨Π, D⟩: Π1 = ρ, Π⊤1 = µ}(1)

where D=d(xn1, xn2)n1,n2∈[N]∈RN×N

+is the distance matrix, measures µand ρare N-

dimensional vectors, viz., 1⊤µ= 1, µ ≥0. When Dis any arbitrary cost matrix, we can still

solve the optimal transport problem. Solving this linear programming problem has a time

complexity of O(N3log N) (Pele and Werman, 2009).

Tree Wasserstein Distance. Consider a weighted tree T= (V, E) with metric dT∈

V×V→R+. For nodes v1, v2∈V, let Pv1,v2be the unique path between them, and let

λbe the length measure on Tsuch that dT(v1, v2) = λ(Pv1,v2). We deﬁne Tvas the set of

nodes contained in the subtree of Trooted at vr∈V, i.e., Tv={v′∈V:v∈Pvr,v′}. Given

the metric space (T, dT) and measures µand ρ∈ P(T), Theorem 2.1 provides a closed-form

expression for the 1-Wasserstein distance WT(µ, ρ).

Theorem 2.1. Given two measures µ, ρ supported on T= (V, E)with metric dT, we have

WT(µ, ρ) = X

e∈E

we|µ(Tve)−ρ(Tve)|,(2)

where weis the weight of edge e∈E, and veis the node of e∈Ethat is farther from the

root (Le et al., 2019b).

From Theorem 2.1, we can compute the tree Wasserstein distance using a simple greedy

matching algorithm where mass from measures µand ρis pushed from child to parent nodes

and matched at parent nodes. This involves computing |µ(Tve)−ρ(Tve)|for all nodes {ve:

e∈E}. We denote the optimal coupling associated with the tree Wasserstein distance as

ΠT

µ,ρ ∈Γ(µ, ρ). Theorem 2.1 also provides a natural embedding for probability distributions

on a tree to the ℓ1space, as stated in Corollary 2.2. See Appendix A for discussion on tree

Wasserstein.

Corollary 2.2. Let WTbe the set of probability measures deﬁned on a tree T. Then, the

tree Wasserstein space can be isometrically embedded in the ℓ1space.

3 Optimal Transport Regression in Ultrametric Spaces

Our goal is to learn a tree metric on a discrete point set such that its optimal transport

distance approximates the measured optimal transport distances. We deﬁne an optimization

problem on ultrametrics as a proxy for tree metrics.

Deﬁnition 3.1. Consider the set X. An ultrametric function du:X × X → R+is a metric

on Xthat also satisﬁes the strong triangle inequality, i.e.,

∀x, y, z ∈ X :du(x, y)≤du(x, z)∧du(y, z).

Figure 2: A stylized example of how the tree structure vary with heights of least common

ancestors. Notice that in the second step, the heights of nodes fand chave increased while

the height of node ghas decreased. Due to these changes in height, the recomputed distance

between nodes cand dis smaller than the recomputed distance between aand bso cand d

are the ﬁrst to be clustered in the ﬁnal projection to the new ultrametric (instead of node c

clustering with aand bin the original ultrametric).

Any compact ultrametric space can be represented by a rooted tree denoted as T=

(vr, V, E), where vris the root node and Xis the set of leaves. This representation also

includes a height function h:V→Rwith the following property: if vis the lowest common

ancestor of the leaves xand y∈ X (denoted LCA(x, y)), then h(v) = du(x, y). Furthermore,

we induce a weight on the edges of the rooted tree Tas follows: given an edge (v1, v2)∈E

where v1is closer to the root (or v1is the parent of v2), we let w(v1, v2) = h(v1)−h(v2).

Then, the weighted tree distance between xand yis related to the heights as follows:

dT(x, y) = 2 ·h(LCA(x, y)) −h(x)−h(y) (3)

= 2 ·du(x, y)−du(x, x)−du(y, y),

i.e., du(x, y) = 1

2dT(x, y). For a discrete ultrametric space (X, du), we compute the optimal

transport distance as:

Wu(µ, ρ) = ⟨ΠT

µ,ρ, Du⟩

where ΠT

µ,ρ ∈Γ(µ, ρ) is the optimal coupling for the tree Trepresenting the ultramet-

ric space — constructed via the greedy matching described in Section 2 — and Du=

(du(xi, xj))i,j∈[N]∈RN×N

+. We now formalize the problem of learning an ultrametric on

Xfor 1-Wasserstein distance approximation:

Problem 3.2. Given an arbitrary discrete set Xendowed with a semimetric, we want to

ﬁnd an ultrametric function du:X × X → R+such that for a given set of distributions

S ⊆ P(X), we minimize the following cost function:

C(du) = X

µ,ρ∈S

W(µ, ρ)−Wu(µ, ρ)2

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LearningUltrametricTreesforOptimalTransportRegressionSamanthaChen∗1,PuoyaTabaghi†2,andYusuWang‡21DepartmentofComputerScienceandEngineering,UniversityofCalifornia-SanDiego2Halıcıo˘gluDataScienceInstitute,UniversityofCalifornia-SanDiegoJanuary30,2024AbstractOptimaltransportprovidesametricwhichquantifi...

展开>> 收起<<

Learning Ultrametric Trees for Optimal Transport Regression Samantha Chen1 Puoya Tabaghi2 and Yusu Wang2.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Learning Ultrametric Trees for Optimal Transport Regression Samantha Chen1 Puoya Tabaghi2 and Yusu Wang2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: