FP-Diffusion Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

2025-04-22 0 0 4.17MB 34 页 10玖币

侵权投诉

FP-Diffusion: Improving Score-based Diffusion Models by Enforcing

the Underlying Score Fokker-Planck Equation

Chieh-Hsin Lai 1Yuhta Takida 1Naoki Murata 1Toshimitsu Uesaka 1Yuki Mitsufuji 1 2 Stefano Ermon 3

Abstract

Score-based generative models (SGMs) learn a

family of noise-conditional score functions cor-

responding to the data density perturbed with

increasingly large amounts of noise. These

perturbed data densities are linked together by

the Fokker-Planck equation (FPE), a partial dif-

ferential equation (PDE) governing the spatial-

temporal evolution of a density undergoing a dif-

fusion process. In this work, we derive a cor-

responding equation called the score FPE that

characterizes the noise-conditional scores of the

perturbed data densities (i.e., their gradients). Sur-

prisingly, despite the impressive empirical perfor-

mance, we observe that scores learned through

denoising score matching (DSM) fail to fulﬁll the

underlying score FPE, which is an inherent self-

consistency property of the ground truth score.

We prove that satisfying the score FPE is desir-

able as it improves the likelihood and the degree

of conservativity. Hence, we propose to regular-

ize the DSM objective to enforce satisfaction of

the score FPE, and we show the effectiveness of

this approach across various datasets.

1. Introduction

Score-based generative models (SGMs), also referred to as

diffusion models (Sohl-Dickstein et al.,2015;Song & Er-

mon,2019;Ho et al.,2020;Song et al.,2020b;a), have led to

major advances in the generation of synthetic images (Dhari-

wal & Nichol,2021;Saharia et al.,2022;Rombach et al.,

2022;Kim et al.,2022) and audio (Kong et al.,2020). In

addition, SGMs have been applied to various downstream

tasks such as media content editing (Meng et al.,2021b;

Sony AI, Tokyo, Japan

Sony Group Corporation, Tokyo,

Japan

Department of Computer Science, Stanford University,

Stanford, CA, USA. Correspondence to: Chieh-Hsin Lai

Chieh-

hsin.lai@sony.com>.

Proceedings of the

40 th

International Conference on Machine

Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright

2023 by the author(s).

Cheuk et al.,2022), or restoration (Kawar et al.,2022;Saito

et al.,2022;Murata et al.,2023). An SGM involves a

stochastic forward and backward process. In the forward

process, also known as the diffusion process, noise with

gradually increasing variances is added to each data point

until the original structure is lost, transforming data into

pure noise. The backward process attempts to reverse the

diffusion process by using a neural network (called a noise-

conditional score model) that is trained to gradually denoise

the data, effectively transforming pure noise into clean data

samples. The neural network is trained with a denoising

score matching objective (Hyv

arinen & Dayan,2005;Vin-

cent,2011) to estimate the score (i.e., the gradient of the

log-likelihood function) of the data density perturbed with

various amounts of noise (as in forward process).

The training can be interpreted as a joint estimation of the

scores of the original data density and all its perturbations.

Crucially, all these densities are closely related to each other,

as they correspond to the same data density perturbed with

various amounts of noise. With sufﬁciently small time steps,

the forward process is a diffusion (Song et al.,2020b) and

the spatial-temporal evolution of the data density is thus

governed by the classic Fokker-Planck partial differential

equation (PDE) (Øksendal,2003). In principle, this implies

that with knowledge of the density for a single noise level,

we could recover all the densities by solving the Fokker-

Planck equation (FPE) without any additional learning.

Our contributions Building on the above notions, we de-

rive an associated system of PDEs that characterizes the

evolution of the scores (i.e., gradients) of the perturbed data

densities; we term it as score Fokker-Planck equation (score

FPE). In theory, the ground truth scores of the perturbed

data densities must satisfy the score FPE (self-consistency

property). Hence, we mathematically study the implica-

tions of satisfying the score FPE. We prove the following

effects of reducing the score FPE error: (a) improvement

in the log-likelihood of the probability ﬂow ordinary differ-

ential equation (ODE) diffusion mode (Song et al.,2020b),

(Theorems 4.2 and 4.3); and (b) improvement in the degree

of conservativity of the models (Proposition 4.4). In addi-

tion, we prove that (c) score FPE error reduction can be

achieved by enforcing higher-order score matching (Meng

arXiv:2210.04296v4 [cs.LG] 14 Jun 2023

FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

et al.,2021a;Lu et al.,2022) (Proposition 4.6). In practice,

we observe that many existing, pre-trained score models

do not numerically satisfy the score FPE. Therefore, we

propose a new loss function for training diffusion models by

combining the traditional score matching objective with a

regularization term derived from the underlying score FPE

to enforce the consistency of models. Our proposed new

method is called FP-Diffusion. We show that FP-Diffusion

enables more accurate density estimation on synthetic data

and improves the likelihood on the MNIST, Fashion MNIST,

CIFAR-10 and ImageNet32 (ImageNet downsampled to

32 ×32) (Chrabaszcz et al.,2017) datasets.

2. Background

Song et al. (2020b) uniﬁed denoising score matching (Song

& Ermon,2019) and diffusion probabilistic models (Sohl-

Dickstein et al.,2015;Ho et al.,2020) via a stochastic

process

x(t)

with continuous time

t∈[0, T ]

. The process

is driven by the following forward SDE

dx(t) = f(x(t), t)dt +g(t)dwt,(1)

where

f(·, t): RD→RD

g(·): R→R

are pre-assigned

and

is a standard Wiener process. Under moderate con-

ditions (Anderson,1982), a reverse time SDE from

can be obtained as

dx(t)=[f(x(t), t)−g2(t)∇xlog qt(x(t))]dt +g(t)d¯

wt,

(2)

where

is a standard Wiener process in reverse time,

and

qt(x)

denotes the ground truth marginal density of

x(t)

following Eq. (1). We can train a time-conditional

neural network

sθ=sθ(x, t)

to approximate

∇xlog qt(x)

by minimizing a score matching objective (Hyv

arinen &

Dayan,2005)JSM(θ;λ(·)) :=

2ZT

λ(t)Ex∼qt(x)h∥sθ(x, t)− ∇xlog qt(x)∥2

2idt.

qt(x)

is generally inaccessible, the denoising score

matching (DSM) loss (Vincent,2011;Song et al.,2020b)

JDSM(θ;λ(·)) is exploited in practice instead

JDSM(θ;λ(·)) := 1

2ZT

λ(t)Ex(0)Eq0t(x(t)|x(0))

h∥sθ(x(t), t)− ∇xlog q0t(x(t)|x(0))∥2

2idt,

(3)

where

q0t(x(t)|x(0))

is the forward transition probabil-

ity from

x(0)

x(t)

. After

sθ(x, t)≈ ∇xlog qt(x)

learned, we replace

∇xlog qt(x)

in Eq. (2) with

sθ

and

With speciﬁc choices of

and

, there are two common

instantiations of the stochastic differential equation (SDE): VE and

VP. See Appendix Afor details.

obtain a parametrized reverse-time SDE for a stochastic

process ˆ

xθ(t)

dˆ

xθ(t)=[f(ˆ

xθ(t), t)−g2(t)sθ(ˆ

xθ(t), t)]dt +g(t)¯

wt,

(4)

Let

pSDE

t,θ

denote the marginal distribution of

xθ(t)

with an

initial distribution deﬁned as the prior

, where we suppress

the dependence on

for compactness. We can design

and

in Eq. (2), such that

qT(x)

approximates a simple prior

; samples

xθ(0) ∼pSDE

0,θ

can be generated by numerically

solving Eq. (4) backward with an initial sample from the

prior

xθ(T)∼π

. Intuitively,

xθ(0)

should be close to a

sample from the data distribution.

Song et al. (2020b) also introduced a deterministic process

(with a zero diffusion term) that describes the evolution of

samples whose trajectories share the same marginal proba-

bility densities as the forward SDE (Eq. (4)). Speciﬁcally,

the process evolves through time according to the following

probability ﬂow ODE

dt (t) = f(x(t), t)−1

2g2(t)∇xlog qt(x(t)).(5)

As in the SDE case, the ground truth score in Eq. (5) is

approximated with the learned score model

sθ(x, t)≈

∇xlog qt(x)

. This yields to the following parameterized

probability ﬂow ODE

d˜

xθ

dt (t) = f(˜

xθ(t), t)−1

2g2(t)sθ(˜

xθ(t), t)(6)

We denote the marginal density of

xθ

pODE

t,θ

with an ini-

tial condition sampled from the prior

, For compactness,

we omit the dependence on

in the notation. By solv-

ing Eq. (6) numerically using an initial value

xθ(T)∼π

we can generate a sample

xθ(0) ∼pODE

0,θ

to approximate

sampling from the data distribution. Indeed, the determinis-

tic dynamics in Eq. (6) make it possible to compute exact

likelihoods for this generative model. Let

xθ(t)∈RD

evolve in reverse time via Eq. (6), starting with

xθ(T)∼π

The “instantaneous change of variables” (Chen et al.,2018)

characterizes the temporal changes in

log pODE

t,θ

along the

trajectory ˜

xθ(t) : t∈[0, T ]via the following ODE:

dlog pODE

t,θ(˜

xθ(t))

2g2(t)divxsθ(˜

xθ(t), t)−divxf(˜

xθ(t), t).

Hence, the log-likelihood can be exactly calculated by nu-

merically solving the concatenated ODEs backward from

to 0, after initialization with ˜

xθ(0) ∼q0(x)

dt ˜

xθ(t)

log pODE

t,θ(˜

xθ(t))

=f(˜

xθ(t), t)−1

2g2(t)sθ(˜

xθ(t), t)

2g2(t)divxsθ(˜

xθ(t), t)−divxf(˜

xθ(t), t).

FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

3. Score Fokker-Planck equation for diffusion

It is well known that the evolution of the ground truth density

qt(x)

associated with Eq. (1) is governed by the Fokker-

Planck equation (FPE) (Øksendal,2003)

∂tqt(x) = −

j=1

∂xj˜

Fj(x, t)qt(x),

where

F(x, t) := f(x, t)−1

2g2(t)∇xlog qt(x)

. As there

is a one-to-one mapping (up to a constant) between densities

and their scores, we derive (in Appendix G) an equivalent

system of PDEs for the ground truth scores

∇xlog qt(x)

We designate it as the score Fokker-Planck equation or sim-

ply the score FPE.

Proposition 3.1 (Score FPE).Assume the ground truth

density

qt(x)

is sufﬁciently smooth on

RD×[0, T ]

with

its score denoted as

s(x, t) := ∇xlog qt(x)

. Then for all

(x, t)∈RD×[0, T ], its log-density satisﬁes the PDE

∂tlog qt(x) =1

2g2(t)divx(s(x, t)) + 1

2g2(t)∥s(x, t)∥2

− ⟨f(x, t),s(x, t)⟩ − divx(f(x, t))

(7)

and its score ssatisﬁes the following system of PDEs

∂ts(x, t) = ∇xh1

2g2(t)divx(s(x, t)) + 1

2g2(t)∥s(x, t)∥2

− ⟨f(x, t),s(x, t)⟩ − divx(f(x, t))i.

(8)

For notational simplicity, let

L[·] := 1

2g2divx(·) +

2g2∥·∥2

2− ⟨f,·⟩ − divx(f)

be the operator mapping vec-

tor ﬁelds to real-valued functions. Thus, Eq. (7) and

Eq. (8) can be expressed as

∂tlog qt(x) = L[s](x, t)

and

∂ts(x, t) = ∇xL[s](x, t), respectively.

Proposition 3.1 shows that the time-conditional scores

sθ(x, t)

learned by score-based models (via Eq. (3)) are

highly redundant. In principle, given a ground truth score

at an initial time

, we can theoretically recover scores for

all times

t≥t0

by solving the score FPE. We explain it

intuitively by considering the special case when

f≡0

and

g≡1

, i.e., when,

x(t)

is obtained by adding Gaussian noise.

It is well-known that the densities

and

qt0

are related in a

convolutional way as

qt=qt0∗ N(0, t)

, and that

can be

analytically obtained from

qt0

(Masry & Rice,1992) (e.g.,

by applying a Fourier transform and dividing). Hence, all

scores can in principle be obtained analytically from the

score at a single time-step, without any further learning.

We provide empirical evidence to substantiate Proposi-

tion 3.1 from two distinct perspectives, as presented in

Section 6.1 and Appendix B.1, respectively.

3.1. Pre-trained scores fail to satisfy score FPEs

Theoretically, with sufﬁcient data and model capacity, score

matching ensures that the optimal solution to Eq. (3) should

satisfy Eq. (8) as it should approximate the ground truth

score well. However, in our experiments, we observe that

pre-trained scores

sθ

learned via Eq. (3) do not fulﬁll the

score FPE. Therefore, we introduce an error term

ϵ[sθ] :=

ϵ[sθ](x, t)

to quantify how much

sθ

deviates from the score

FPE

ϵ[sθ](x, t) := ∂tsθ(x, t)− ∇xL[sθ](x, t).(9)

Set

T= 1

, we deﬁne the average residual of the score FPE,

computed over x, as a function of t∈[0,1]

rFP, trans.[sθ](t) := 1

DEx(0)Ex(t)|x(0)h∥ϵ[sθ](x, t)∥2i.

We further consider the following averaged residual for

DSM

rDSM-like[sθ](t) := 1

DEx(0)Ex(t)|x(0)h∥sθ(x(t), t)

− ∇x(t)log q0t(x(t)|x(0))∥2i.

Compared to the integrand in the standard DSM loss in

Eq. (3),

rDSM-like[sθ]

uses the

ℓ2

-norm (instead of the MSE)

and drops the time-weighting function

λ(t)

to be consistent

with the averaged residuals of the score FPE.

Figure 1plots these residuals for score models that were

pre-trained via DSM on the MNIST and CIFAR-10 datasets.

Despite achieving a low

rDSM-like

across all

(orange curve),

the pre-trained score models fail to satisfy the score FPE

equation, especially for small

(blue curve). This implies

that models learned by DSM do not satisfy the score FPE.

4. Theoretical implications of score FPE

In this section, we ﬁrst study three implications of satis-

fying the score FPE. Speciﬁcally, we show in Section 4.1

that simultaneous minimization of quantities related to the

score FPE and the conventional score matching objective

can reduce the KL divergence between the data density

and the density

pODE

0,θ

, determined by the parametrized

probability ﬂow ODE (Eq. (6)). In Section 4.2 we prove

that controlling of

ϵ[sθ]

implicitly enforces the conserva-

tivity of

sθ

. Moreover, in Section 4.3 we prove that if the

score FPE is satisﬁed, then under certain conditions,

sθ

ground truth score

∇xlog pSDE

t,θ

, and

∇xlog pODE

t,θ

must

match. Here

pSDE

t,θ

and

pODE

t,θ

were deﬁned in Section 2as

the marginal density of the parametrized diffusion process

and the probability ﬂow ODE, respectively. Finally, in Sec-

tion 4.4, we investigate the connection between higher-order

score matching (Meng et al.,2021a;Lu et al.,2022) and

the score FPE. We provide the proofs of all theorems in

Appendix G.

FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

(a) VE SDE; MNIST (b) VP SDE; MNIST (c) VE SDE; CIFAR-10 (d) VP SDE; CIFAR-10

Figure 1.

Comparison of the numerical scales of

rDSM-like[sθ](t)

and

rFP, trans.[sθ](t)

for pre-trained scores

sθ

on MNIST and CIFAR-10.

We treat these errors as functions of time. The pre-trained models do not numerically satisfy the score FPE, in contrast to their DSM-like

errors. We attempt to explain this phenomenon in Sections. 4.2 and 4.4.

4.1. Minimization DKLq0

pODE

0,θ

In this section, we show that under certain regularity con-

ditions (see Assumptions F.1 and F.2), simultaneous mini-

mization of

JSM(θ)

and certain score FPE related quantities

(see Eqs. (11) and (12)) can decrease the KL divergence

between

and

pODE

0,θ

, denoted as

DKLq0

pODE

0,θ

. This is

equivalent to improving the likelihood of data under pODE

0,θ.

First, we review an equation proposed by Lu et al. (2022)

that quantiﬁes the exact gap between

DKLq0

pODE

0,θ

and the

score matching loss

JSM(θ)

. For compactness, we denote

sODE

θ(x, t) := ∇xlog pODE

t,θ(x).

Lemma 4.1 (Lu et al. (2022)).Set

λ(t) = g2(t)

. Let

the data distribution, and

be the marginal density of

x(t)

following Eq. (1). Assume that Assumption F.1 is satisﬁed.

Then,

DKLq0

pODE

0,θ=DKLqT

pODE

T,θ+JSM(θ) + JDiff(θ),

where

JDiff(θ) = 1

2ZT

g2(t)Eqt(x)hsθ(x, t)−s(x, t)⊤

sODE

θ(x, t)−sθ(x, t)idt.

We now introduce the main theoretical results in this sec-

tion. First, we note that application of the Cauchy-Schwartz

inequality to JDiff(θ)gives

|JDiff(θ)| ≤ pJSM(θ)·pJFisher(θ).

Here,

JFisher(θ)

is a Fisher-like divergence in terms of the

two scores

sθ(x, t)

and

sODE

θ(x, t)

, deﬁned as

JFisher(θ) :=

2ZT

g2(t)Ex∼qt(x)

sθ(x, t)−sODE

θ(x, t)

2

2dt.

Next, in Theorem. 4.2, we show that under Assumption F.1,

JFisher(θ)

can be bounded from above by the averaged resid-

ual of the score FPE M(θ):

JFisher(θ)≲M(θ) + pM(θ) + C1,(10)

where

C1>0

is a constant,

≲

denotes multiplicative con-

stants independent of θare concealed, and M(θ) :=

sup

t∈[0,T ]

Ex∼qt(x)"ZT

0∥ϵ[sθ](x, τ)∥2

2dτ#.(11)

Furthermore, we can compute

M(θ)≤sup

x"ZT

0∥ϵ[sθ](x, τ)∥2

2dτ#,

meaning that this upper bound measures the worst time-

averaged score FPE error. In Appendix G.3, we consider

more interpretable quantities than

M(θ)

by introducing the

density weighting

pτ(x)

-integrand and derive similar

estimations as in Ineq. (10).

Moreover, we prove in Theorem. 4.3 that with a different

regularity condition (Assumption F.2),

JFisher(θ)

is upper

bounded by

M(θ)

and a “time-derivative taming” term that

can be derived from Eq. (7) which is deﬁned as

m(θ) := sup

xZT

0|L[sθ](x, τ)|dτ. (12)

More speciﬁcally,

JFisher(θ)≲M(θ) + m(θ) + C2,(13)

where C2is another constant, distinct from C1.

Hence, Lemma. 4.1 together with Ineq. (10) or (13) implies

that

DKLq0

pODE

0,θ

decreases when “

M(θ)

and

JSM(θ)

”

or “

M(θ)

m(θ)

, and

JSM(θ)

” are reduced simultaneously.

We now rigorously state these theorems.

Theorem 4.2. We have

JDiff(θ)2≤ JSM(θ)· JFisher(θ).(14)

Moreover, if Assumption. F.1 is fulﬁlled, then there is another

ﬁnite constant

C1>0

independent of

such that we can

FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

further bound Ineq. (14) above by

JDiff(θ)2≲JSM(θ)·M(θ) + pM(θ) + C1.(15)

Thus, DKLq0

pODE

0,θ≲DKLqT

pODE

T,θ

+JSM(θ) + J1/2

SM (θ)M(θ) + pM(θ) + C11/2.

Theorem 4.3. If Assumption. F.2 is satisﬁed, then there is

another ﬁnite constant C2>0independent of θsuch that

JDiff(θ)2≲JSM(θ)·M(θ) + m(θ) + C2.(16)

It is noticed that constants

and

involve regularity

bounds of the ground truth density and Lipschitz constants

of networks. Hence, the upper bounds in Ineq. (15) and (16)

are difﬁcult to compare.

As the ground truth score should follow the score FPE, it

is intuitive that reduction of the score FPE residual encour-

ages the network-parametrized score to approach the ground

truth score (a special case is proved in Proposition 4.5). The-

orems 4.2 and 4.3 support that reduction of these quantities

related to the score FPE may also reduce the gap (in the KL

divergence) of their corresponding densities. In Section 7,

we empirically support these claims.

4.2. Conservativity

The ground truth score

s(x, t) = ∇xlog qt(x)

is a conser-

vative vector ﬁeld. That is, it can be expressed as a gradient

of some real-valued function. However, scores learned in

practice do not satisfy this property (Salimans & Ho,2021).

Below, we prove that we can implicitly enforce conserva-

tivity by minimizing the time-averaged error

ϵ[sθ](x, τ)

the score FPE.

Proposition 4.4. If there is a

tθ∈[0, T ]

so that

sθ(x, tθ) = ∇xlog qtθ(x)

for all

x∈RD

, then there

exists a real-valued function

Ψθ:RD×[0, T ]→R

(with

an explicit expression) that satisﬁes

sθ(x, t)− ∇xΨθ(x, t) = Zt

tθ

ϵ[sθ](x, τ)dτ, (17)

for all (x, t)∈RD×[0, T ]. In particular,

∥sθ(x, t)− ∇xΨθ(x, t)∥2≤Ztθ

t∥ϵ[sθ](x, τ)∥2dτ.

(18)

Eq. (17) indicates that the error of the score FPE quan-

tiﬁes the degree of conservativity of

sθ

. We further ex-

plain this idea via Ineq. (18), from which we easily obtain

∥sθ(x, t)− ∇xΨθ(x, t)∥2≤Rtθ

t∥ϵ[sθ](x, τ)∥2dτ≤

0∥ϵ[sθ](x, τ)∥2dτ

, for any

and

. Thus, if the

parametrized score approximately satisﬁes the score FPE,

giving a small score FPE error

0∥ϵ[sθ](x, τ)∥2dτ

, then

the estimated score should nearly be conservative, i.e., close

to the gradient of a scalar function

Ψθ(x, t)

. We empirically

support this fact in Section 6.2.

Proposition 4.4 necessitates a precise alignment of scores

at a given timestep. However, we propose a modiﬁcation

that allows for a small discrepancy by incorporating an error

term into the score matching process. As a result, we present

an expanded proposition, namely Proposition G.4, which is

detailed in Appendix G.5.

4.3. Equivalence of scores

We now investigate another implication of satisfying the

score FPE which connects the score

sθ

with the ground

truth

sSDE

, and

sODE

. The following proposition provides

conditions under which all of these scores are identical if

we train to reach a zero residual for the score FPE for all

(x, t).

Proposition 4.5. (1) Suppose in some suitable function

space,

is the unique strong solution to the PDEs

∂tv−

∇x1

2g2divx(v) + 1

2g2∥v∥2

2+ 2⟨v,s⟩− ⟨f,v⟩= 0

with a zero initial condition

v(x,0) ≡0

and a zero

boundary condition. If there is some

θ0

so that for all

(x, t)ϵ[sθ0](x, t)=0

and that

sθ0(x,0) = s(x,0)

, then

sθ0(x, t) = s(x, t), for all (x, t).

(2) Moreover, suppose the PDEs

∂tv+∇x1

2g2divx(v) +

2g2∥v∥2

2+⟨f,v⟩= 0

with zero initial and boundary con-

dition have

as the unique strong solution. Then

ϵ[sθ0]≡0

and sθ0(x,0) ≡sSDE

θ0(x,0) implies sθ0≡sSDE

θ0.

(3) Lastly, if there is some

θ0

such that

∂tv−∇x⟨1

2g2sθ0−

f,v⟩= 0

with zero initial and boundary conditions ad-

mit

as the unique strong solution, then

ϵ[sθ0]≡0

and

sθ0(x,0) ≡sODE

θ0(x,0) implies sθ0≡sODE

θ0.

Proposition 4.5 implies that if the parametric scores match

with the ground truth score at the initial time, the only global

minimum is the ground truth score. Essentially, the scores

at any given time can be obtained solely by achieving a

ﬂawless alignment of scores at a single timestep through

the dynamics of PDE. This indicates the score FPE resid-

ual is a proper quantity to measure the gaps between the

ground truth and parametric scores. Indeed, this proposi-

tion is an extreme case of “the continuous dependence of

PDE solutions on parameters

” (Artstein,1975). A more

sophisticated analysis (Lunardi,2012;Papageorgiou,1994)

can be applied to prove for instance, that as

∥ϵ[sθ]∥ → 0



sθ−sSDE

θ

→0

f≡0

(with a careful choice of

norms). However, such technical generalization is outside

this work’s scope.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FP-Diffusion:ImprovingScore-basedDiffusionModelsbyEnforcingtheUnderlyingScoreFokker-PlanckEquationChieh-HsinLai1YuhtaTakida1NaokiMurata1ToshimitsuUesaka1YukiMitsufuji12StefanoErmon3AbstractScore-basedgenerativemodels(SGMs)learnafamilyofnoise-conditionalscorefunctionscor-respondingtothedatadensityper...

展开>> 收起<<

FP-Diffusion Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation.pdf

共34页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

FP-Diffusion Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: