Fast Approximation of the Generalized Sliced-Wasserstein Distance Dung LeHuy NguyenyKhai NguyenyTrang NguyenxNhat Hoy

2025-05-06 0 0 697.48KB 22 页 10玖币
侵权投诉
Fast Approximation of the
Generalized Sliced-Wasserstein Distance
Dung Le?,Huy Nguyen?,Khai Nguyen?,Trang Nguyen§Nhat Ho
École Polytechnique; University of Texas, Austin;
Hanoi University of Science and Technology§
October 20, 2022
Abstract
Generalized sliced Wasserstein distance is a variant of sliced Wasserstein distance
that exploits the power of non-linear projection through a given defining function to
better capture the complex structures of the probability distributions. Similar to sliced
Wasserstein distance, generalized sliced Wasserstein is defined as an expectation over
random projections which can be approximated by the Monte Carlo method. However, the
complexity of that approximation can be expensive in high-dimensional settings. To that
end, we propose to form deterministic and fast approximations of the generalized sliced
Wasserstein distance by using the concentration of random projections when the defining
functions are polynomial function, circular function, and neural network type function.
Our approximations hinge upon an important result that one-dimensional projections of a
high-dimensional random vector are approximately Gaussian.
1 Introduction
Sliced Wasserstein (SW) [
5
] distance has become a core member in the family of probability
metrics that are based on optimal transport [
26
]. Compared to Wasserstein distance, SW
provides a lower computational cost thanks to the closed-form solution of optimal transport
in one dimension. In particular, when dealing with probability measures with at most
n
supports, the computational complexity of SW is
O
(
nlog n
)while that of Wasserstein distance
is
O
(
n3log n
)[
24
] when being solved via the interior point methods or
O
(
n2
)[
1
,
15
,
16
] when
being approximated by its entropic regularized version. Furthermore, the memory complexity
of SW is only
O
(
n
)in comparison with
O
(
n2
)of Wasserstein distance (due to the storage of a
cost matrix). Additionally, the statistical estimation rate (or the sample complexity) of SW
does not depend on the dimension (denoted as
d
) like Wasserstein distance. In particular, the
sample complexity of the former is
O
(
n1/2
)[
3
], whereas it is
O
(
n1/d
)[
11
] for the Wasserstein
distance. Therefore, the SW does not suffer from the curse of dimensionality.
Due to the practicality of SW, several improvements and variants of that distance have been
explored recently in the literature. For instance, selective discriminative projecting directions
techniques are proposed in [
8
,
22
,
23
]; a SW variant that augments original measures to higher
dimensions for better linear separation is introduced in [
6
]; a SW variant on the sphere is
defined in [
4
]; a SW variant that uses convolution slicer for projecting images is proposed
in [
21
]. However, the prevailing trend of the current works on SW is focused on its application.
?Dung Le, Huy Nguyen and Khai Nguyen contributed equally to this work.
1
arXiv:2210.10268v1 [stat.ML] 19 Oct 2022
Indeed, SW is used in generative modeling [
9
,
20
,
7
], domain adaptation [
27
], clustering [
14
],
and Bayesian inference [17,28].
To further enhance the ability of SW, Kolouri et al. [
13
] propose using non-linear projecting
defining functions for SW instead of conventional linear projecting. This extension leads to
a generalized class of sliced probability distances, named the generalized sliced Wasserstein
(GSW) distance. Despite being more expressive, GSW also needs to be approximated by the
Monte Carlo method as SW. In greater detail, the definition of GSW is an expectation over
random projections via certain defining functions of Wasserstein distance between corresponding
one-dimensional projected probability measures. In general, the expectation is intractable to
compute; hence, Monte Carlo samples are used to approximate the expectation as mentioned.
It is shown in both theory and practice that the number of Monte Carlo samples (the number
of projections) should be large for good performance and approximation of sliced probability
metrics [18,8].
Contribution.
In this work, we aim to overcome the projection complexity of the GSW by
deriving fast approximations of that distance that do not require using Monte Carlo random
projecting directions. We follow the approach of deterministic approximation of the SW in [
19
].
The key factor in our fast approximations of the GSW is the Gaussian concentration of the
distribution of low-dimensional projections of high-dimensional random variables [
25
,
10
]. Our
results cover the settings when the (non-linear) defining functions are polynomial function with
odd degree, circular function, and neural network type, which had been discussed and utilized
in [13].
Organization.
The paper is organized as follows. We provide background on Wasserstein
distance, sliced Wasserstein distance and its fast approximation, as well as the generalized sliced
Wasserstein distance in Section 2. We then study the fast approximation of generalized sliced
Wasserstein distance when the defining function is polynomial with odd degree in Section 3
and when the defining function is neural network type in Section 4. The discussion with
an approximation when the defining function is circular is in Appendix B. Finally, we give
experiment results for the approximation error of the proposed approximate generalized sliced
Wasserstein distance in Section 5and conclude the paper in Section 6. The remaining proofs
of the key results in the paper are deferred to Appendix A.
Notation.
We use the following notations throughout our paper. Firstly, we denote by
N
the
set of all positive integers. For any
dN
and
pN
,
Pp
(
Rd
)stands for the set of all probability
measures in
Rd
with finite moments of order
p
whereas
Sd1
:=
{θRd
:
kθk
= 1
}
denotes
the
d
-dimensional unit sphere where
k · k
is the Euclidean norm. Additionally,
γd
represents
the Gaussian distribution in
Rd
,
N
(0
, d1Id
)in which
Id
is an identity matrix of size
d×d
.
Meanwhile, we denote
L1
(
Rd
) :=
{f
:
RdR
:
RRd|f
(
x
)
|dx < ∞}
as the set of all absolutely
integrable functions on
Rd
. Next, for any set
A
, we denote by
|A|
its cardinality. Finally, for
any two sequences (
an
)and (
bn
), the notation
an
=
O
(
bn
)indicates that
anCbn
for all
nNwhere Cis some universal constant.
2 Backgrounds
In this section, we first revisit Wasserstein distance and the conditional central limit theorem
for Gaussian projections. We then present background on sliced Wasserstein distance and its
fast approximation. Finally, we recall the definition of generalized sliced Wasserstein distance,
which is focused on in this paper.
2
2.1 Wasserstein Distance
Let
p
1and
µ, ν
be two probability measures on
Rd
,
d
1, with finite moments of order
p
.
Then, the p-Wasserstein distance between µand νis defined as follows:
Wp(µ, ν) := inf
πΠ(µ,ν)ZRd×Rd
kxykp(x, y)1
p
,
where
k · k
denotes the Euclidean norms, and Π(
µ, ν
)is the set of all probability measures on
Rd×Rd
which admit
µ
and
ν
as their marginals with respect to the first and second variables.
Next, we review an important result about the concentration of measure phenomenon, which
states that under mild assumptions, one-dimensional projections of a high-dimensional random
vector are approximately Gaussian. Specifically, we have the following theorem.
Theorem 1
([
13
] Theorem 1)
.
For any
d
1, let
µ
denote the distribution of
X1:d
=
(
X1, . . . , Xd
)and
γd
=
N
(0
, d1Id
)be a Gaussian distribution. Assume that
µ∈ P2
(
Rd
), then
there exists a universal constant C0such that:
ZRd
W2
2θ
]µ, N(0, d1m2(µ))d(θ)CΞd(µ),
where
θ
:
RdR
denotes the linear form
x7→ hθ, xi
,
θ
]µ
indicates the push-forward measure
of µby θand
Ξd(µ) = d1nA(µ)+[m2(µ)B1(µ)]1/2+m2(µ)1/5B2(µ)4/5o,
m2(µ) = EkX1:dk2,(1)
A(µ) = EkX1:dk2m2(µ),
Bk(µ) = E1/k h|hX1:d, X0
1:di|ki,
with k∈ {1,2}and X0
1:dis an independent copy of X1:d.
It is worth noting that the above result only holds for the 2-Wasserstein distance.
2.2 Sliced-Wasserstein Distance And Its Fast Approximation
To adapt the result of Theorem 1to the sliced-Wasserstein setting, Nadjahi et al. [
19
] introduce
a new version of SW distance in which projections are sampled from the Gaussian distribution
rather than on the sphere as usual. In particular,
Sliced-Wasserstein Distance:
Let
p
1and a Gaussian measure
γd
=
N
(0
, d1Id
)where
d
1. Then, the sliced Wasserstein distance of order
p
with Gaussian projections between two
probability measures µ∈ Pp(Rd)and ν∈ Pp(Rd)is defined as follows:
SWp(µ, ν) := ZRd
Wp
p(θ
]µ, θ
]ν)d(θ)1
p
.(2)
The notation
θ
is equivalent to the Radon Transform of
µ
given the projecting direction
θ
[
13
]. By leveraging Theorem 1, Nadjahi et al. [
19
] provide the following bound for the
sliced-Wasserstein distance between any two probability measures with finite second moments.
3
Proposition 1
([
19
], Theorem 1)
.
Let
µ, ν ∈ P2
(
Rd
)be two probability measures in
Rd
.
Consider two Gaussian distributions
ηµ
=
N
(0
, d1m2
(
µ
)) and
ην
=
N
(0
, d1m2
(
ν
)), where
m2
(
µ
)
,m2
(
ν
)are given in equation
(1)
. Then, there exists a universal constant
C >
0such
that:
|SW2(µ, ν)W2(ηµ, ην)| ≤ Cd(µ)+Ξd(ν)) 1
2,(3)
where Ξd(µ)and Ξd(ν)are defined in equation (1).
Note that equation
(3)
can be simplified by using the closed-form expression of Wasserstein
distance between two Gaussians distributions ηµand ην, which is given by
W2(ηµ, ην) = d1
2pm2(µ)pm2(ν).
According to [
19
], Ξ
d
(
µ
)and Ξ
d
(
ν
)cannot be shown to converge to 0 if the data are not centered.
Fortunately, they demonstrate that there is a relation between
SW2
(
µ, ν
)and
SW2
(
¯µ, ¯ν
), where
¯µand ¯νare centered versions of µand ν, respectively.
Proposition 2.
Let
µ, ν ∈ P2
(
Rd
)be two probability measures in
Rd
with respective means
mµ
and
mν
. Then, the Sliced-Wasserstein distance of order 2 between
µ
and
ν
can be decomposed
as:
SW 2
2(µ, ν) = SW 2
2(¯µ, ¯ν) + d1kmµmνk2.(4)
As a consequence, Nadjahi et al. [
19
] successfully derive a deterministic approximation for
SW2(µ, ν)as follows:
d
SW 2
2(µ, ν) = W2
2(η¯µ, η¯ν) + d1kmµmνk2.(5)
2.3 Generalized Sliced-Wasserstein Distance
Inspired by the approximation of SW distance in equation
(5)
, we manage to extend that result
to the setting of Generalized Sliced-Wasserstein (GSW) distance in this work. Before exploring
the aforementioned extension, it is necessary to recall the definition of GSW distance.
Generalized Sliced-Wasserstein Distance:
Let
g
be a defining function [
19
] and
δ
be the
Dirac delta function, then the generalized Radon transform (GRT) of an integrable function
IL1(Rd), denoted by GI, is defined as follows:
GI(t, θ) := ZRd
I(x)δ(tg(x, θ))dx. (6)
When
g
(
x, θ
) =
hx, θi
, GRT reverts into the conventional Radon Transform which is used in
SW distance. By using the GRT, the GSW distance is given by:
GSW p(µ, ν) := ZRd
Wp
p(GIµ(·, θ),GIν(·, θ)) d(θ)1
p
,
where
Iµ, IνL1
(
Rd
)are probability density functions of measures
µ
and
ν
, respectively. Here,
with a slight abuse of notation, we use
Wp
(
µ, ν
)and
Wp
(
Iµ, Iν
)interchangeably. In this paper,
4
we will also use the pushforward measures notation to define GSW e.g.,
gθ
denotes the GRT
of µgiven the defining function gand its parameter θ.
In order for the GSW distance to become a proper metric, the GRT must be essentially an
injective function. There is a line of work [
12
,
2
] studying the sufficient and necessary conditions
for the injectivity of GRT, which finds that the GRT is injective when
g
is either a polynomial
defining function or a circular defining function. By contrast, it is non-trivial to show that
GRT is injective when
g
is a neural network type function; therefore, GSW, in this case, is a
pseudo-metric.
As mentioned in Section 2.1, the result in Theorem 1only applies to the 2-Wasserstein distance.
Thus, we only consider the GSW distance of the same order throughout this paper.
3 Polynomial Defining Function
In this section, we consider the problem of finding a deterministic approximation for the
generalized sliced-Wasserstein distance under the setting when the defining function
g
is a
polynomial function with an odd degree, which is defined as follows:
Definition 1
(Polynomial defining function)
.
For a multi-index
α
= (
α1, . . . , αd
)
Nd
and a
vector
x
= (
x1, . . . , xd
)
Rd
, we denote
|α|
=
α1
+
. . .
+
αd
and
xα
=
xα1
1. . . xαd
d
. Then, a
defining function of the form of a polynomial function with an odd degree mis given by:
gpoly(x, θ) = X
|α|=m
θαxα,
where
θ
:= (
θα
)
|α|=mSq1
with
q
=
m+d1
d1
be the number of non-negative solutions to the
equation
α1
+
. . .
+
αd
=
m
. Accordingly, the Generalized Sliced-Wasserstein distance in this
case is denoted as polyGSW .
Subsequently, we introduce some necessary notations for our analysis. Let
X
= (
X1, . . . , Xd
)
>
and
Y
= (
Y1, . . . , Yd
)
>
be random vectors following probability distributions
µ∈ P2
(
Rd
)
and
ν∈ P2
(
Rd
), respectively. For an odd positive integer
mN
, by denoting
µq
and
νq
as the probability distributions in
Rq
of random vectors
U
:= (
Xα
)
|α|=mRq
and
V
:= (
Yα
)
|α|=mRq
, we find that there is a connection between the GSW distance and the
SW distance as follows:
Proposition 3.
Let
µ, ν ∈ P2
(
Rd
)be two probability measures in
Rd
with finite second moments
and
µq, νq∈ P2
(
Rq
)be defined as above where
q
=
m+d1
d1
with
mN
is an odd positive
integer. Then, we have:
polyGSW2(µ, ν) = SW2(µq, νq).
Proof of Proposition 3.
For
θRq
, we denote
gθ
poly
:
RdR
as a function
x7→ gpoly
(
x, θ
). It
follows from the definition of polyGSW distance that
poly GSW 2
2(µ, ν) = ZRq
W2
2(gθ
poly)]µ, (gθ
poly)]νq(θ)
=ZRq
W2
2(θ
]µq, θ
]νq)q(θ) := SW 2
2(µq, νq).
Hence, we obtain the conclusion of this proposition.
5
摘要:

FastApproximationoftheGeneralizedSliced-WassersteinDistanceDungLe?;HuyNguyen?;yKhaiNguyen?;yTrangNguyenxNhatHoyÉcolePolytechnique;UniversityofTexas,Austiny;HanoiUniversityofScienceandTechnologyxOctober20,2022AbstractGeneralizedslicedWassersteindistanceisavariantofslicedWassersteindistancethatexplo...

展开>> 收起<<
Fast Approximation of the Generalized Sliced-Wasserstein Distance Dung LeHuy NguyenyKhai NguyenyTrang NguyenxNhat Hoy.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:697.48KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注