Fast Approximation of the Generalized Sliced-Wasserstein Distance Dung LeHuy NguyenyKhai NguyenyTrang NguyenxNhat Hoy

2025-05-06 0 0 697.48KB 22 页 10玖币

侵权投诉

Fast Approximation of the

Generalized Sliced-Wasserstein Distance

Dung Le?,Huy Nguyen?,†Khai Nguyen?,†Trang Nguyen§Nhat Ho†

École Polytechnique; University of Texas, Austin†;

Hanoi University of Science and Technology§

October 20, 2022

Abstract

Generalized sliced Wasserstein distance is a variant of sliced Wasserstein distance

that exploits the power of non-linear projection through a given deﬁning function to

better capture the complex structures of the probability distributions. Similar to sliced

Wasserstein distance, generalized sliced Wasserstein is deﬁned as an expectation over

random projections which can be approximated by the Monte Carlo method. However, the

complexity of that approximation can be expensive in high-dimensional settings. To that

end, we propose to form deterministic and fast approximations of the generalized sliced

Wasserstein distance by using the concentration of random projections when the deﬁning

functions are polynomial function, circular function, and neural network type function.

Our approximations hinge upon an important result that one-dimensional projections of a

high-dimensional random vector are approximately Gaussian.

1 Introduction

Sliced Wasserstein (SW) [

] distance has become a core member in the family of probability

metrics that are based on optimal transport [

]. Compared to Wasserstein distance, SW

provides a lower computational cost thanks to the closed-form solution of optimal transport

in one dimension. In particular, when dealing with probability measures with at most

supports, the computational complexity of SW is

(

nlog n

)while that of Wasserstein distance

(

n3log n

)[

] when being solved via the interior point methods or

(

)[

] when

being approximated by its entropic regularized version. Furthermore, the memory complexity

of SW is only

(

)in comparison with

(

)of Wasserstein distance (due to the storage of a

cost matrix). Additionally, the statistical estimation rate (or the sample complexity) of SW

does not depend on the dimension (denoted as

) like Wasserstein distance. In particular, the

sample complexity of the former is

(

n−1/2

)[

], whereas it is

(

n−1/d

)[

] for the Wasserstein

distance. Therefore, the SW does not suﬀer from the curse of dimensionality.

Due to the practicality of SW, several improvements and variants of that distance have been

explored recently in the literature. For instance, selective discriminative projecting directions

techniques are proposed in [

]; a SW variant that augments original measures to higher

dimensions for better linear separation is introduced in [

]; a SW variant on the sphere is

deﬁned in [

]; a SW variant that uses convolution slicer for projecting images is proposed

in [

]. However, the prevailing trend of the current works on SW is focused on its application.

?Dung Le, Huy Nguyen and Khai Nguyen contributed equally to this work.

arXiv:2210.10268v1 [stat.ML] 19 Oct 2022

Indeed, SW is used in generative modeling [

], domain adaptation [

], clustering [

and Bayesian inference [17,28].

To further enhance the ability of SW, Kolouri et al. [

] propose using non-linear projecting

deﬁning functions for SW instead of conventional linear projecting. This extension leads to

a generalized class of sliced probability distances, named the generalized sliced Wasserstein

(GSW) distance. Despite being more expressive, GSW also needs to be approximated by the

Monte Carlo method as SW. In greater detail, the deﬁnition of GSW is an expectation over

random projections via certain deﬁning functions of Wasserstein distance between corresponding

one-dimensional projected probability measures. In general, the expectation is intractable to

compute; hence, Monte Carlo samples are used to approximate the expectation as mentioned.

It is shown in both theory and practice that the number of Monte Carlo samples (the number

of projections) should be large for good performance and approximation of sliced probability

metrics [18,8].

Contribution.

In this work, we aim to overcome the projection complexity of the GSW by

deriving fast approximations of that distance that do not require using Monte Carlo random

projecting directions. We follow the approach of deterministic approximation of the SW in [

The key factor in our fast approximations of the GSW is the Gaussian concentration of the

distribution of low-dimensional projections of high-dimensional random variables [

]. Our

results cover the settings when the (non-linear) deﬁning functions are polynomial function with

odd degree, circular function, and neural network type, which had been discussed and utilized

in [13].

Organization.

The paper is organized as follows. We provide background on Wasserstein

distance, sliced Wasserstein distance and its fast approximation, as well as the generalized sliced

Wasserstein distance in Section 2. We then study the fast approximation of generalized sliced

Wasserstein distance when the deﬁning function is polynomial with odd degree in Section 3

and when the deﬁning function is neural network type in Section 4. The discussion with

an approximation when the deﬁning function is circular is in Appendix B. Finally, we give

experiment results for the approximation error of the proposed approximate generalized sliced

Wasserstein distance in Section 5and conclude the paper in Section 6. The remaining proofs

of the key results in the paper are deferred to Appendix A.

Notation.

We use the following notations throughout our paper. Firstly, we denote by

the

set of all positive integers. For any

d∈N

and

p∈N

(

)stands for the set of all probability

measures in

with ﬁnite moments of order

whereas

Sd−1

{θ∈Rd

kθk

= 1

}

denotes

the

-dimensional unit sphere where

k · k

is the Euclidean norm. Additionally,

γd

represents

the Gaussian distribution in

, d−1Id

)in which

is an identity matrix of size

d×d

Meanwhile, we denote

(

) :=

Rd→R

RRd|f

(

)

|dx < ∞}

as the set of all absolutely

integrable functions on

. Next, for any set

, we denote by

|A|

its cardinality. Finally, for

any two sequences (

)and (

), the notation

(

)indicates that

an≤Cbn

for all

n∈Nwhere Cis some universal constant.

2 Backgrounds

In this section, we ﬁrst revisit Wasserstein distance and the conditional central limit theorem

for Gaussian projections. We then present background on sliced Wasserstein distance and its

fast approximation. Finally, we recall the deﬁnition of generalized sliced Wasserstein distance,

which is focused on in this paper.

2.1 Wasserstein Distance

Let

p≥

1and

µ, ν

be two probability measures on

d≥

1, with ﬁnite moments of order

Then, the p-Wasserstein distance between µand νis deﬁned as follows:

Wp(µ, ν) := inf

π∈Π(µ,ν)ZRd×Rd

kx−ykpdπ(x, y)1

where

k · k

denotes the Euclidean norms, and Π(

µ, ν

)is the set of all probability measures on

Rd×Rd

which admit

and

as their marginals with respect to the ﬁrst and second variables.

Next, we review an important result about the concentration of measure phenomenon, which

states that under mild assumptions, one-dimensional projections of a high-dimensional random

vector are approximately Gaussian. Speciﬁcally, we have the following theorem.

Theorem 1

([

] Theorem 1)

For any

d≥

1, let

denote the distribution of

X1:d

(

X1, . . . , Xd

)and

γd

, d−1Id

)be a Gaussian distribution. Assume that

µ∈ P2

(

), then

there exists a universal constant C≥0such that:

ZRd

2θ∗

]µ, N(0, d−1m2(µ))dγd(θ)≤CΞd(µ),

where

θ∗

Rd→R

denotes the linear form

x7→ hθ, xi

θ∗

]µ

indicates the push-forward measure

of µby θ∗and

Ξd(µ) = d−1nA(µ)+[m2(µ)B1(µ)]1/2+m2(µ)1/5B2(µ)4/5o,

m2(µ) = EkX1:dk2,(1)

A(µ) = EkX1:dk2−m2(µ),

Bk(µ) = E1/k h|hX1:d, X0

1:di|ki,

with k∈ {1,2}and X0

1:dis an independent copy of X1:d.

It is worth noting that the above result only holds for the 2-Wasserstein distance.

2.2 Sliced-Wasserstein Distance And Its Fast Approximation

To adapt the result of Theorem 1to the sliced-Wasserstein setting, Nadjahi et al. [

] introduce

a new version of SW distance in which projections are sampled from the Gaussian distribution

rather than on the sphere as usual. In particular,

Sliced-Wasserstein Distance:

Let

p≥

1and a Gaussian measure

γd

, d−1Id

)where

d≥

1. Then, the sliced Wasserstein distance of order

with Gaussian projections between two

probability measures µ∈ Pp(Rd)and ν∈ Pp(Rd)is deﬁned as follows:

SWp(µ, ν) := ZRd

p(θ∗

]µ, θ∗

]ν)dγd(θ)1

.(2)

The notation

θ∗]µ

is equivalent to the Radon Transform of

given the projecting direction

θ∗

[

]. By leveraging Theorem 1, Nadjahi et al. [

] provide the following bound for the

sliced-Wasserstein distance between any two probability measures with ﬁnite second moments.

Proposition 1

([

], Theorem 1)

Let

µ, ν ∈ P2

(

)be two probability measures in

Consider two Gaussian distributions

ηµ

, d−1m2

(

)) and

ην

, d−1m2

(

)), where

(

)

,m2

(

)are given in equation

(1)

. Then, there exists a universal constant

C >

0such

that:

|SW2(µ, ν)−W2(ηµ, ην)| ≤ C(Ξd(µ)+Ξd(ν)) 1

2,(3)

where Ξd(µ)and Ξd(ν)are deﬁned in equation (1).

Note that equation

(3)

can be simpliﬁed by using the closed-form expression of Wasserstein

distance between two Gaussians distributions ηµand ην, which is given by

W2(ηµ, ην) = d−1

2pm2(µ)−pm2(ν).

According to [

], Ξ

(

)and Ξ

(

)cannot be shown to converge to 0 if the data are not centered.

Fortunately, they demonstrate that there is a relation between

SW2

(

µ, ν

)and

SW2

(

¯µ, ¯ν

), where

¯µand ¯νare centered versions of µand ν, respectively.

Proposition 2.

Let

µ, ν ∈ P2

(

)be two probability measures in

with respective means

mµ

and

mν

. Then, the Sliced-Wasserstein distance of order 2 between

and

can be decomposed

as:

SW 2

2(µ, ν) = SW 2

2(¯µ, ¯ν) + d−1kmµ−mνk2.(4)

As a consequence, Nadjahi et al. [

] successfully derive a deterministic approximation for

SW2(µ, ν)as follows:

SW 2

2(µ, ν) = W2

2(η¯µ, η¯ν) + d−1kmµ−mνk2.(5)

2.3 Generalized Sliced-Wasserstein Distance

Inspired by the approximation of SW distance in equation

(5)

, we manage to extend that result

to the setting of Generalized Sliced-Wasserstein (GSW) distance in this work. Before exploring

the aforementioned extension, it is necessary to recall the deﬁnition of GSW distance.

Generalized Sliced-Wasserstein Distance:

Let

be a deﬁning function [

] and

be the

Dirac delta function, then the generalized Radon transform (GRT) of an integrable function

I∈L1(Rd), denoted by GI, is deﬁned as follows:

GI(t, θ) := ZRd

I(x)δ(t−g(x, θ))dx. (6)

When

(

x, θ

) =

hx, θi

, GRT reverts into the conventional Radon Transform which is used in

SW distance. By using the GRT, the GSW distance is given by:

GSW p(µ, ν) := ZRd

p(GIµ(·, θ),GIν(·, θ)) dγd(θ)1

where

Iµ, Iν∈L1

(

)are probability density functions of measures

and

, respectively. Here,

with a slight abuse of notation, we use

(

µ, ν

)and

(

Iµ, Iν

)interchangeably. In this paper,

we will also use the pushforward measures notation to deﬁne GSW e.g.,

gθ]µ

denotes the GRT

of µgiven the deﬁning function gand its parameter θ.

In order for the GSW distance to become a proper metric, the GRT must be essentially an

injective function. There is a line of work [

] studying the suﬃcient and necessary conditions

for the injectivity of GRT, which ﬁnds that the GRT is injective when

is either a polynomial

deﬁning function or a circular deﬁning function. By contrast, it is non-trivial to show that

GRT is injective when

is a neural network type function; therefore, GSW, in this case, is a

pseudo-metric.

As mentioned in Section 2.1, the result in Theorem 1only applies to the 2-Wasserstein distance.

Thus, we only consider the GSW distance of the same order throughout this paper.

3 Polynomial Deﬁning Function

In this section, we consider the problem of ﬁnding a deterministic approximation for the

generalized sliced-Wasserstein distance under the setting when the deﬁning function

is a

polynomial function with an odd degree, which is deﬁned as follows:

Deﬁnition 1

(Polynomial deﬁning function)

For a multi-index

= (

α1, . . . , αd

)

∈Nd

and a

vector

= (

x1, . . . , xd

)

∈Rd

, we denote

|α|

α1

. . .

αd

and

xα

xα1

1. . . xαd

. Then, a

deﬁning function of the form of a polynomial function with an odd degree mis given by:

gpoly(x, θ) = X

|α|=m

θαxα,

where

:= (

θα

)

|α|=m∈Sq−1

with

m+d−1

d−1

be the number of non-negative solutions to the

equation

α1

. . .

αd

. Accordingly, the Generalized Sliced-Wasserstein distance in this

case is denoted as poly−GSW .

Subsequently, we introduce some necessary notations for our analysis. Let

= (

X1, . . . , Xd

)

and

= (

Y1, . . . , Yd

)

be random vectors following probability distributions

µ∈ P2

(

)

and

ν∈ P2

(

), respectively. For an odd positive integer

m∈N

, by denoting

µq

and

νq

as the probability distributions in

of random vectors

:= (

Xα

)

|α|=m∈Rq

and

:= (

Yα

)

|α|=m∈Rq

, we ﬁnd that there is a connection between the GSW distance and the

SW distance as follows:

Proposition 3.

Let

µ, ν ∈ P2

(

)be two probability measures in

with ﬁnite second moments

and

µq, νq∈ P2

(

)be deﬁned as above where

m+d−1

d−1

with

m∈N

is an odd positive

integer. Then, we have:

poly−GSW2(µ, ν) = SW2(µq, νq).

Proof of Proposition 3.

For

θ∈Rq

, we denote

gθ

poly

Rd→R

as a function

x7→ gpoly

(

x, θ

). It

follows from the deﬁnition of poly−GSW distance that

poly −GSW 2

2(µ, ν) = ZRq

2(gθ

poly)]µ, (gθ

poly)]νdγq(θ)

=ZRq

2(θ∗

]µq, θ∗

]νq)dγq(θ) := SW 2

2(µq, νq).

Hence, we obtain the conclusion of this proposition.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FastApproximationoftheGeneralizedSliced-WassersteinDistanceDungLe?;HuyNguyen?;yKhaiNguyen?;yTrangNguyenxNhatHoyÉcolePolytechnique;UniversityofTexas,Austiny;HanoiUniversityofScienceandTechnologyxOctober20,2022AbstractGeneralizedslicedWassersteindistanceisavariantofslicedWassersteindistancethatexplo...

展开>> 收起<<

Fast Approximation of the Generalized Sliced-Wasserstein Distance Dung LeHuy NguyenyKhai NguyenyTrang NguyenxNhat Hoy.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Fast Approximation of the Generalized Sliced-Wasserstein Distance Dung LeHuy NguyenyKhai NguyenyTrang NguyenxNhat Hoy

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: