ON REPRESENTATIONS OF MEAN-FIELD VARIATIONAL INFERENCE SOUMYADIP GHOSH YINGDONG LU TOMASZ NOWICKI AND EDITH ZHANG

2025-05-02 0 0 699.93KB 19 页 10玖币

侵权投诉

ON REPRESENTATIONS OF MEAN-FIELD VARIATIONAL

INFERENCE

SOUMYADIP GHOSH, YINGDONG LU, TOMASZ NOWICKI, AND EDITH ZHANG

Abstract.

The mean ﬁeld variational inference (MFVI) formulation restricts

the general Bayesian inference problem to the subspace of product measures. We

present a framework to analyse MFVI algorithms, which is inspired by a similar

development for general variational Bayesian formulations. Our approach

enables the MFVI problem to be represented in three diﬀerent manners: a

gradient ﬂow on Wasserstein space, a system of Fokker-Planck-like equations

and a diﬀusion process. Rigorous guarantees are established to show that a

time-discretized implementation of the coordinate ascent variational inference

algorithm in the product Wasserstein space of measures yields a gradient ﬂow

in the limit. A similar result is obtained for their associated densities, with

the limit being given by a quasi-linear partial diﬀerential equation. A popular

class of practical algorithms falls in this framework, which provides tools to

establish convergence. We hope this framework could be used to guarantee

convergence of algorithms in a variety of approaches, old and new, to solve

variational inference problems.

1. Introduction

Bayesian analysis posits a statistical model with observable variables

x∈Rn

and unobserved latent variables

θ∈Rd

and seeks to infer a posterior distribution

(

θ|x

)for the latent

given a dataset of observations

= (

x1, . . . , xn

). The answer

is provided, in the abstract, by Bayes’ theorem:

(

θ|x

) =

(

)

(

x|θ

)

where

(

x|θ

)represents the conditional probability of observations

given

(

)is a pre-

speciﬁed prior distribution on

and the normalizing constant

Rπ

(

)

(

x|ζ

)

dζ

is the (unconditioned) probability of observing

. Computing the denominator

is often prohibitively expensive (it is a

-complete problem even in some

special cases, see, e.g. [

]), and so an exact computation of the desired posterior

distribution

directly from Bayes’ rule is intractable. Various algorithms have

been proposed to overcome this diﬃculty in practice. These include sampling

algorithms such as Markov chain Monte Carlo (MCMC) methods [

] that aim

to estimate the true posterior

, but are challenged in practice by the possibility

of long initialization periods that are discarded and the hardness of determining

eﬀective stopping criteria. Variational Inference (VI) [

] algorithms on the other

hand can be eﬃciently implemented to quickly identify approximations of

that are

restricted to computationally advantageous forms. Each such VI approach comes

with varying degrees of theoretical guarantees for convergence. In this article, we

focus on the rigorous analysis of convergence of a subset called Mean Field VI

(MFVI), a commonly implemented practical VI approach.

The posterior distribution

is trivially re-expressed as the minimizer of the

Kullback-Leibler (KL) divergence

to itself, where

(

ξkη

) :=

Eξ

[

log

(

dξ/dη

)] for

arXiv:2210.11385v1 [stat.ML] 20 Oct 2022

2 SOUMYADIP GHOSH, YINGDONG LU, TOMASZ NOWICKI, AND EDITH ZHANG

measures ξand η. Denoting P(θ, x

x) := π(θ)P(x

x|θ), we have

p= arg min

ν∈P(Rd)D(νkp)(1)

= arg min

ν∈P(Rd){Eν[log ν]−Eν[log P(x

x, θ)]}+ log Z.

Here, the set

(

)contains absolutely continuous probability measures. The

optimization problem

(1)

over the probability space is known as the Variational

Bayes (VB) form of Bayes’ rule [

]. Denote as

(

) :=

−Eν

[

log ν

]the entropy of

the measure

, and Ψ(

) :=

Eν

[

−log P

(

x, θ

)] the expected negative log likelihood

of the joint distribution

(

x, θ

). Since

log Z

is a constant w.r.t.

, the VB 1

minimizes the evidence lower bound (ELBO) [

] objective

(

) := Ψ(

)

−H

(

Equivalently, it maximizes

−J

(

), balancing a high log likelihood Ψ(

)under

with

a regularization term that desires a high entropy solution

. [

] provide equivalent

functional representations of the objective of

(1)

that arise from other perspectives.

Existence, uniqueness and convergence results for VB can be obtained from

representations of

(1)

constructed by exploiting intriguing connections between

Bayesian inference, diﬀerential equations and diﬀusion processes. [

] provided

a seminal result that the gradient ﬂow in Wasserstein space (the metric space

(

)of probability measures endowed with 2-Wasserstein distance

) of an

objective function like

(1)

can be equivalently expressed as the solution to a Fokker-

Planck (FPE) equation, which is a parabolic partial diﬀerential equation (PDE) on

densities as

functions. These key connections allow Bayes’ rule to be expressed as

minimum of various related functionals on diﬀerent metric spaces: it can be viewed

as the stationary solution of a gradient ﬂow of

in the space

, as the stationary

solution to an FPE in the

space of density functions, and also corresponds to the

stationary distribution of a diﬀusion process. These equivalent relationships have

been depicted in Fig. 1; see [17] for further details.

Solution procedures for the several equivalent optimization representations to

obtain the posterior

that are shown in Fig. 1 are in practice hard to implement since

each still requires computationally diﬃcult operations in functional and probability

spaces. In practice, the VB problem

(1)

is approximated by Variational Inference

procedures that replace the general set

(

)with a constrained subset of feasible

probability measures

Q⊂P

where measures in

possess structural properties

that allow for practical and eﬃcient implementation of the optimization. The

solution thus obtained is an approximation of

, and will coincide only if

p∈ Q

A common choice is the mean ﬁeld VI [

] where

is taken to be the mean ﬁeld

family

(

) :=

i=1 P

(

)where the components of

are independent of each

other. The MFVI approximation of

is then obtained by solving the optimization

problem (1) over the restricted feasible set ν∈ Q.

Contributions:

Our main focus is to derive multiple representations for the

MFVI formulation similar to those displayed in Fig. 1. Our analogous representations

are recounted concisely in Fig. 2. Speciﬁcally:

•

Broadly following the alternative views available for Bayesian inference, we

describe three diﬀerent representations of the MFVI algorithm. The ﬁrst

views the mean-ﬁeld approximation of the posterior as the gradient ﬂow of

a joint set of functionals, the second as a solution to a system of quasilinear

partial diﬀerential equations and the last as a diﬀusion process that is the

stationary distribution of a system of stochastic diﬀerential equations.

ON REPRESENTATIONS OF MEAN-FIELD VARIATIONAL INFERENCE 3

•

Theorem 1 shows that a discrete process induced by the candidate solutions

of a coordinate-wise algorithm (see Sec 3) converges to an equivalent gradient

ﬂow deﬁned on the product Wasserstein space of measures when a certain

step size parameter is shrunk to zero. This is to the best of our knowledge

the ﬁrst gradient ﬂow representation of the general MFVI algorithm, and it

depends on extensions of some basic concepts of gradient ﬂows to product

Wasserstein space, which are presented in Sec. 4.1 and Sec. B.

•

We also demonstrate that the corresponding density functions converge to

the solution of a second order quasilinear evolutional (parabolic) equation

in Corollary 1. Additionally, in Theorem 2, we extend our analysis to

present new results of independent interest on existence and uniqueness of

solutions to families of quasilinear evolutional equations that satisfy similar

conditions.

•

The quasilinear evolutional equation leads to the probabilistic representation

of the MFVI by connecting its solution to the density of a stochastic process

that is the solution to a corresponding stochastic diﬀerential equation (SDE)

of Mckean-Vlasov type.

The three representations presented in this article open the possibility of multiple

new algorithmic approaches to obtaining the approximation to

in the space

, and

also provides tools to study the convergence properties of these algorithms. While a

detailed development is out of scope here, we brieﬂy summarize some possibilities.

The MFVI formulation

(1)

can be solved using a system of SGD-like iterations

produced by Euler-discretization formulations

(11)

, each of which can be solved

explicitly for further restrictions of the marginals measures to parametric families

such as Gaussian, mixed-Gaussian etc. Alternately, non-parametric particle-based

heuristics can be used to approximate the solution to the general SGD steps

(11)

. The

SDE representation on the other hand suggests that the posterior be approximated

by estimating the stationary process of the SDE by exploiting techniques from

the vast literature on SDEs. A particle ﬁlter based approach can for example be

constructed using a system of MCMCs with dynamics arising from the components

of the SDE.

Prior Work:

Convergence analysis of the MFVI approximation to the VB

problem is relatively less well established. [

] provide consistency results for

MFVI procedures by establishing that point estimates of the latent variables

(such as expectations of functions of

) constructed using MFVI estimates of the

posterior converge to the true value asymptotically as the size

grows under

the assumption that the true latent variable takes a deﬁnite value. A recent analysis

by [

] presents a convergence analysis of VI where the set

are further constrained

to be (mixtures of) Gaussian distributions, thus operating in the sub-manifold of

(

)known as the Bures-Wasserstein manifold. Their methodology closely follows

the standard VB analysis outlined in Fig. 1 restricted to this manifold. In particular,

the formulation

(1)

in this case leads to a simpliﬁed FPE equation and associated

diﬀusion process, unlike our case which requires the development and analysis of a

system of quasi-linear PDEs and associated stochastic processes.

Organization:

The rest of the paper will be organized as follows: in Sec. 2,

we provide precise deﬁnitions of various key representations of Bayesian inference

as illustrated in Fig. 1; Sec. 3 deﬁnes the optimization formulation of the MFVI

problem, including an Euler discretization scheme which forms the basis of all the

4 SOUMYADIP GHOSH, YINGDONG LU, TOMASZ NOWICKI, AND EDITH ZHANG

Figure 1. Representations of Bayesian inference

development that follows; in Sec. 4, we present our results on the convergence of

the discrete scheme to gradient ﬂow; in Sec. 5, we deﬁne the equivalent quasilinear

parabolic equation in the

space of densities and discuss its well-posedness, as

well as the probabilistic representation in the form of a Mckean-Vlasov stochastic

diﬀerential equation.

2. Representations of the Bayesian Posterior

The variational formulation

(1)

of Bayesian inference enables a fruitful exploration

of connection between the algorithms such as gradient descent and gradient ﬂows,

as well as its implications. This section presents a brief overview of three equivalent

representations of the VB formulation that yield diﬀerent characterisations of the

posterior distribution, as summarized in Figure 1, each of which lead to potential

algorithmic approaches to approximate it. In Sec. 4 & 5, a similar set of relationship

will be established for MFVI. For that purpose, we will provide precise deﬁnitions

and descriptions of these characterizations in this section.

In the classic Euclidean space setting, a curve

(

)in

is called the gradient

ﬂow for some function E:Rd→Rif it solves the following equation

(2) ∂tx(t) = −∇E(x(t)).

To extend this concept to general metric space (Wasserstein spaces included),

consider a general (energy) functional

X→R

deﬁned on a metric space (

X, d

its gradient ﬂow

(

) :

R+→X

solves the following energy dissipation equation, for

t > 0,

E(x0) = E(xt) + 1

2Zt

|˙x((r)|2dr +1

2Zt

|∇E(x(r))|2dr,(3)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ONREPRESENTATIONSOFMEAN-FIELDVARIATIONALINFERENCESOUMYADIPGHOSH,YINGDONGLU,TOMASZNOWICKI,ANDEDITHZHANGAbstract.Themeaneldvariationalinference(MFVI)formulationrestrictsthegeneralBayesianinferenceproblemtothesubspaceofproductmeasures.WepresentaframeworktoanalyseMFVIalgorithms,whichisinspiredbyasimila...

展开>> 收起<<

ON REPRESENTATIONS OF MEAN-FIELD VARIATIONAL INFERENCE SOUMYADIP GHOSH YINGDONG LU TOMASZ NOWICKI AND EDITH ZHANG.pdf

共19页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ON REPRESENTATIONS OF MEAN-FIELD VARIATIONAL INFERENCE SOUMYADIP GHOSH YINGDONG LU TOMASZ NOWICKI AND EDITH ZHANG

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: