FP-Diffusion Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

2025-04-22 0 0 4.17MB 34 页 10玖币
侵权投诉
FP-Diffusion: Improving Score-based Diffusion Models by Enforcing
the Underlying Score Fokker-Planck Equation
Chieh-Hsin Lai 1Yuhta Takida 1Naoki Murata 1Toshimitsu Uesaka 1Yuki Mitsufuji 1 2 Stefano Ermon 3
Abstract
Score-based generative models (SGMs) learn a
family of noise-conditional score functions cor-
responding to the data density perturbed with
increasingly large amounts of noise. These
perturbed data densities are linked together by
the Fokker-Planck equation (FPE), a partial dif-
ferential equation (PDE) governing the spatial-
temporal evolution of a density undergoing a dif-
fusion process. In this work, we derive a cor-
responding equation called the score FPE that
characterizes the noise-conditional scores of the
perturbed data densities (i.e., their gradients). Sur-
prisingly, despite the impressive empirical perfor-
mance, we observe that scores learned through
denoising score matching (DSM) fail to fulfill the
underlying score FPE, which is an inherent self-
consistency property of the ground truth score.
We prove that satisfying the score FPE is desir-
able as it improves the likelihood and the degree
of conservativity. Hence, we propose to regular-
ize the DSM objective to enforce satisfaction of
the score FPE, and we show the effectiveness of
this approach across various datasets.
1. Introduction
Score-based generative models (SGMs), also referred to as
diffusion models (Sohl-Dickstein et al.,2015;Song & Er-
mon,2019;Ho et al.,2020;Song et al.,2020b;a), have led to
major advances in the generation of synthetic images (Dhari-
wal & Nichol,2021;Saharia et al.,2022;Rombach et al.,
2022;Kim et al.,2022) and audio (Kong et al.,2020). In
addition, SGMs have been applied to various downstream
tasks such as media content editing (Meng et al.,2021b;
1
Sony AI, Tokyo, Japan
2
Sony Group Corporation, Tokyo,
Japan
3
Department of Computer Science, Stanford University,
Stanford, CA, USA. Correspondence to: Chieh-Hsin Lai
<
Chieh-
hsin.lai@sony.com>.
Proceedings of the
40 th
International Conference on Machine
Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright
2023 by the author(s).
Cheuk et al.,2022), or restoration (Kawar et al.,2022;Saito
et al.,2022;Murata et al.,2023). An SGM involves a
stochastic forward and backward process. In the forward
process, also known as the diffusion process, noise with
gradually increasing variances is added to each data point
until the original structure is lost, transforming data into
pure noise. The backward process attempts to reverse the
diffusion process by using a neural network (called a noise-
conditional score model) that is trained to gradually denoise
the data, effectively transforming pure noise into clean data
samples. The neural network is trained with a denoising
score matching objective (Hyv
¨
arinen & Dayan,2005;Vin-
cent,2011) to estimate the score (i.e., the gradient of the
log-likelihood function) of the data density perturbed with
various amounts of noise (as in forward process).
The training can be interpreted as a joint estimation of the
scores of the original data density and all its perturbations.
Crucially, all these densities are closely related to each other,
as they correspond to the same data density perturbed with
various amounts of noise. With sufficiently small time steps,
the forward process is a diffusion (Song et al.,2020b) and
the spatial-temporal evolution of the data density is thus
governed by the classic Fokker-Planck partial differential
equation (PDE) (Øksendal,2003). In principle, this implies
that with knowledge of the density for a single noise level,
we could recover all the densities by solving the Fokker-
Planck equation (FPE) without any additional learning.
Our contributions Building on the above notions, we de-
rive an associated system of PDEs that characterizes the
evolution of the scores (i.e., gradients) of the perturbed data
densities; we term it as score Fokker-Planck equation (score
FPE). In theory, the ground truth scores of the perturbed
data densities must satisfy the score FPE (self-consistency
property). Hence, we mathematically study the implica-
tions of satisfying the score FPE. We prove the following
effects of reducing the score FPE error: (a) improvement
in the log-likelihood of the probability flow ordinary differ-
ential equation (ODE) diffusion mode (Song et al.,2020b),
(Theorems 4.2 and 4.3); and (b) improvement in the degree
of conservativity of the models (Proposition 4.4). In addi-
tion, we prove that (c) score FPE error reduction can be
achieved by enforcing higher-order score matching (Meng
1
arXiv:2210.04296v4 [cs.LG] 14 Jun 2023
FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation
et al.,2021a;Lu et al.,2022) (Proposition 4.6). In practice,
we observe that many existing, pre-trained score models
do not numerically satisfy the score FPE. Therefore, we
propose a new loss function for training diffusion models by
combining the traditional score matching objective with a
regularization term derived from the underlying score FPE
to enforce the consistency of models. Our proposed new
method is called FP-Diffusion. We show that FP-Diffusion
enables more accurate density estimation on synthetic data
and improves the likelihood on the MNIST, Fashion MNIST,
CIFAR-10 and ImageNet32 (ImageNet downsampled to
32 ×32) (Chrabaszcz et al.,2017) datasets.
2. Background
Song et al. (2020b) unified denoising score matching (Song
& Ermon,2019) and diffusion probabilistic models (Sohl-
Dickstein et al.,2015;Ho et al.,2020) via a stochastic
process
x(t)
with continuous time
t[0, T ]
. The process
is driven by the following forward SDE
dx(t) = f(x(t), t)dt +g(t)dwt,(1)
where
f(·, t): RDRD
,
g(·): RR
are pre-assigned
1
and
wt
is a standard Wiener process. Under moderate con-
ditions (Anderson,1982), a reverse time SDE from
T
to
0
can be obtained as
dx(t)=[f(x(t), t)g2(t)xlog qt(x(t))]dt +g(t)d¯
wt,
(2)
where
¯
wt
is a standard Wiener process in reverse time,
and
qt(x)
denotes the ground truth marginal density of
x(t)
following Eq. (1). We can train a time-conditional
neural network
sθ=sθ(x, t)
to approximate
xlog qt(x)
by minimizing a score matching objective (Hyv
¨
arinen &
Dayan,2005)JSM(θ;λ(·)) :=
1
2ZT
0
λ(t)Exqt(x)hsθ(x, t)− ∇xlog qt(x)2
2idt.
As
qt(x)
is generally inaccessible, the denoising score
matching (DSM) loss (Vincent,2011;Song et al.,2020b)
JDSM(θ;λ(·)) is exploited in practice instead
JDSM(θ;λ(·)) := 1
2ZT
0
λ(t)Ex(0)Eq0t(x(t)|x(0))
hsθ(x(t), t)− ∇xlog q0t(x(t)|x(0))2
2idt,
(3)
where
q0t(x(t)|x(0))
is the forward transition probabil-
ity from
x(0)
to
x(t)
. After
sθ(x, t)≈ ∇xlog qt(x)
is
learned, we replace
xlog qt(x)
in Eq. (2) with
sθ
and
1
With specific choices of
f
and
g
, there are two common
instantiations of the stochastic differential equation (SDE): VE and
VP. See Appendix Afor details.
obtain a parametrized reverse-time SDE for a stochastic
process ˆ
xθ(t)
dˆ
xθ(t)=[f(ˆ
xθ(t), t)g2(t)sθ(ˆ
xθ(t), t)]dt +g(t)¯
wt,
(4)
Let
pSDE
t,θ
denote the marginal distribution of
ˆ
xθ(t)
with an
initial distribution defined as the prior
π
, where we suppress
the dependence on
π
for compactness. We can design
f
and
g
in Eq. (2), such that
qT(x)
approximates a simple prior
π
; samples
ˆ
xθ(0) pSDE
0,θ
can be generated by numerically
solving Eq. (4) backward with an initial sample from the
prior
ˆ
xθ(T)π
. Intuitively,
ˆ
xθ(0)
should be close to a
sample from the data distribution.
Song et al. (2020b) also introduced a deterministic process
(with a zero diffusion term) that describes the evolution of
samples whose trajectories share the same marginal proba-
bility densities as the forward SDE (Eq. (4)). Specifically,
the process evolves through time according to the following
probability flow ODE
dx
dt (t) = f(x(t), t)1
2g2(t)xlog qt(x(t)).(5)
As in the SDE case, the ground truth score in Eq. (5) is
approximated with the learned score model
sθ(x, t)
xlog qt(x)
. This yields to the following parameterized
probability flow ODE
d˜
xθ
dt (t) = f(˜
xθ(t), t)1
2g2(t)sθ(˜
xθ(t), t)(6)
We denote the marginal density of
˜
xθ
as
pODE
t,θ
with an ini-
tial condition sampled from the prior
π
, For compactness,
we omit the dependence on
π
in the notation. By solv-
ing Eq. (6) numerically using an initial value
˜
xθ(T)π
,
we can generate a sample
˜
xθ(0) pODE
0,θ
to approximate
sampling from the data distribution. Indeed, the determinis-
tic dynamics in Eq. (6) make it possible to compute exact
likelihoods for this generative model. Let
˜
xθ(t)RD
evolve in reverse time via Eq. (6), starting with
˜
xθ(T)π
.
The “instantaneous change of variables” (Chen et al.,2018)
characterizes the temporal changes in
log pODE
t,θ
along the
trajectory ˜
xθ(t) : t[0, T ]via the following ODE:
dlog pODE
t,θ(˜
xθ(t))
dt
=1
2g2(t)divxsθ(˜
xθ(t), t)divxf(˜
xθ(t), t).
Hence, the log-likelihood can be exactly calculated by nu-
merically solving the concatenated ODEs backward from
T
to 0, after initialization with ˜
xθ(0) q0(x)
d
dt ˜
xθ(t)
log pODE
t,θ(˜
xθ(t))
=f(˜
xθ(t), t)1
2g2(t)sθ(˜
xθ(t), t)
1
2g2(t)divxsθ(˜
xθ(t), t)divxf(˜
xθ(t), t).
2
FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation
3. Score Fokker-Planck equation for diffusion
It is well known that the evolution of the ground truth density
qt(x)
associated with Eq. (1) is governed by the Fokker-
Planck equation (FPE) (Øksendal,2003)
tqt(x) =
D
X
j=1
xj˜
Fj(x, t)qt(x),
where
˜
F(x, t) := f(x, t)1
2g2(t)xlog qt(x)
. As there
is a one-to-one mapping (up to a constant) between densities
and their scores, we derive (in Appendix G) an equivalent
system of PDEs for the ground truth scores
xlog qt(x)
.
We designate it as the score Fokker-Planck equation or sim-
ply the score FPE.
Proposition 3.1 (Score FPE).Assume the ground truth
density
qt(x)
is sufficiently smooth on
RD×[0, T ]
with
its score denoted as
s(x, t) := xlog qt(x)
. Then for all
(x, t)RD×[0, T ], its log-density satisfies the PDE
tlog qt(x) =1
2g2(t)divx(s(x, t)) + 1
2g2(t)s(x, t)2
2
− ⟨f(x, t),s(x, t)⟩ − divx(f(x, t))
(7)
and its score ssatisfies the following system of PDEs
ts(x, t) = xh1
2g2(t)divx(s(x, t)) + 1
2g2(t)s(x, t)2
2
− ⟨f(x, t),s(x, t)⟩ − divx(f(x, t))i.
(8)
For notational simplicity, let
L[·] := 1
2g2divx(·) +
1
2g2∥·∥2
2− ⟨f,·⟩ − divx(f)
be the operator mapping vec-
tor fields to real-valued functions. Thus, Eq. (7) and
Eq. (8) can be expressed as
tlog qt(x) = L[s](x, t)
and
ts(x, t) = xL[s](x, t), respectively.
Proposition 3.1 shows that the time-conditional scores
sθ(x, t)
learned by score-based models (via Eq. (3)) are
highly redundant. In principle, given a ground truth score
at an initial time
t0
, we can theoretically recover scores for
all times
tt0
by solving the score FPE. We explain it
intuitively by considering the special case when
f0
and
g1
, i.e., when,
x(t)
is obtained by adding Gaussian noise.
It is well-known that the densities
qt
and
qt0
are related in a
convolutional way as
qt=qt0∗ N(0, t)
, and that
qt
can be
analytically obtained from
qt0
(Masry & Rice,1992) (e.g.,
by applying a Fourier transform and dividing). Hence, all
scores can in principle be obtained analytically from the
score at a single time-step, without any further learning.
We provide empirical evidence to substantiate Proposi-
tion 3.1 from two distinct perspectives, as presented in
Section 6.1 and Appendix B.1, respectively.
3.1. Pre-trained scores fail to satisfy score FPEs
Theoretically, with sufficient data and model capacity, score
matching ensures that the optimal solution to Eq. (3) should
satisfy Eq. (8) as it should approximate the ground truth
score well. However, in our experiments, we observe that
pre-trained scores
sθ
learned via Eq. (3) do not fulfill the
score FPE. Therefore, we introduce an error term
ϵ[sθ] :=
ϵ[sθ](x, t)
to quantify how much
sθ
deviates from the score
FPE
ϵ[sθ](x, t) := tsθ(x, t)− ∇xL[sθ](x, t).(9)
Set
T= 1
, we define the average residual of the score FPE,
computed over x, as a function of t[0,1]
rFP, trans.[sθ](t) := 1
DEx(0)Ex(t)|x(0)hϵ[sθ](x, t)2i.
We further consider the following averaged residual for
DSM
rDSM-like[sθ](t) := 1
DEx(0)Ex(t)|x(0)hsθ(x(t), t)
− ∇x(t)log q0t(x(t)|x(0))2i.
Compared to the integrand in the standard DSM loss in
Eq. (3),
rDSM-like[sθ]
uses the
2
-norm (instead of the MSE)
and drops the time-weighting function
λ(t)
to be consistent
with the averaged residuals of the score FPE.
Figure 1plots these residuals for score models that were
pre-trained via DSM on the MNIST and CIFAR-10 datasets.
Despite achieving a low
rDSM-like
across all
t
(orange curve),
the pre-trained score models fail to satisfy the score FPE
equation, especially for small
t
(blue curve). This implies
that models learned by DSM do not satisfy the score FPE.
4. Theoretical implications of score FPE
In this section, we first study three implications of satis-
fying the score FPE. Specifically, we show in Section 4.1
that simultaneous minimization of quantities related to the
score FPE and the conventional score matching objective
can reduce the KL divergence between the data density
q0
and the density
pODE
0,θ
, determined by the parametrized
probability flow ODE (Eq. (6)). In Section 4.2 we prove
that controlling of
ϵ[sθ]
implicitly enforces the conserva-
tivity of
sθ
. Moreover, in Section 4.3 we prove that if the
score FPE is satisfied, then under certain conditions,
sθ
,
ground truth score
s
,
xlog pSDE
t,θ
, and
xlog pODE
t,θ
must
match. Here
pSDE
t,θ
and
pODE
t,θ
were defined in Section 2as
the marginal density of the parametrized diffusion process
and the probability flow ODE, respectively. Finally, in Sec-
tion 4.4, we investigate the connection between higher-order
score matching (Meng et al.,2021a;Lu et al.,2022) and
the score FPE. We provide the proofs of all theorems in
Appendix G.
3
FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation
(a) VE SDE; MNIST (b) VP SDE; MNIST (c) VE SDE; CIFAR-10 (d) VP SDE; CIFAR-10
Figure 1.
Comparison of the numerical scales of
rDSM-like[sθ](t)
and
rFP, trans.[sθ](t)
for pre-trained scores
sθ
on MNIST and CIFAR-10.
We treat these errors as functions of time. The pre-trained models do not numerically satisfy the score FPE, in contrast to their DSM-like
errors. We attempt to explain this phenomenon in Sections. 4.2 and 4.4.
4.1. Minimization DKLq0
pODE
0,θ
In this section, we show that under certain regularity con-
ditions (see Assumptions F.1 and F.2), simultaneous mini-
mization of
JSM(θ)
and certain score FPE related quantities
(see Eqs. (11) and (12)) can decrease the KL divergence
between
q0
and
pODE
0,θ
, denoted as
DKLq0
pODE
0,θ
. This is
equivalent to improving the likelihood of data under pODE
0,θ.
First, we review an equation proposed by Lu et al. (2022)
that quantifies the exact gap between
DKLq0
pODE
0,θ
and the
score matching loss
JSM(θ)
. For compactness, we denote
sODE
θ(x, t) := xlog pODE
t,θ(x).
Lemma 4.1 (Lu et al. (2022)).Set
λ(t) = g2(t)
. Let
q0
be
the data distribution, and
qt
be the marginal density of
x(t)
following Eq. (1). Assume that Assumption F.1 is satisfied.
Then,
DKLq0
pODE
0,θ=DKLqT
pODE
T,θ+JSM(θ) + JDiff(θ),
where
JDiff(θ) = 1
2ZT
0
g2(t)Eqt(x)hsθ(x, t)s(x, t)
sODE
θ(x, t)sθ(x, t)idt.
We now introduce the main theoretical results in this sec-
tion. First, we note that application of the Cauchy-Schwartz
inequality to JDiff(θ)gives
|JDiff(θ)| ≤ pJSM(θ)·pJFisher(θ).
Here,
JFisher(θ)
is a Fisher-like divergence in terms of the
two scores
sθ(x, t)
and
sODE
θ(x, t)
, defined as
JFisher(θ) :=
1
2ZT
0
g2(t)Exqt(x)
sθ(x, t)sODE
θ(x, t)
2
2dt.
Next, in Theorem. 4.2, we show that under Assumption F.1,
JFisher(θ)
can be bounded from above by the averaged resid-
ual of the score FPE M(θ):
JFisher(θ)M(θ) + pM(θ) + C1,(10)
where
C1>0
is a constant,
denotes multiplicative con-
stants independent of θare concealed, and M(θ) :=
sup
t[0,T ]
Exqt(x)"ZT
0ϵ[sθ](x, τ)2
2#.(11)
Furthermore, we can compute
M(θ)sup
x"ZT
0ϵ[sθ](x, τ)2
2#,
meaning that this upper bound measures the worst time-
averaged score FPE error. In Appendix G.3, we consider
more interpretable quantities than
M(θ)
by introducing the
density weighting
pτ(x)
in
τ
-integrand and derive similar
estimations as in Ineq. (10).
Moreover, we prove in Theorem. 4.3 that with a different
regularity condition (Assumption F.2),
JFisher(θ)
is upper
bounded by
M(θ)
and a “time-derivative taming” term that
can be derived from Eq. (7) which is defined as
m(θ) := sup
xZT
0|L[sθ](x, τ)|. (12)
More specifically,
JFisher(θ)M(θ) + m(θ) + C2,(13)
where C2is another constant, distinct from C1.
Hence, Lemma. 4.1 together with Ineq. (10) or (13) implies
that
DKLq0
pODE
0,θ
decreases when “
M(θ)
and
JSM(θ)
or “
M(θ)
,
m(θ)
, and
JSM(θ)
” are reduced simultaneously.
We now rigorously state these theorems.
Theorem 4.2. We have
JDiff(θ)2≤ JSM(θ)· JFisher(θ).(14)
Moreover, if Assumption. F.1 is fulfilled, then there is another
finite constant
C1>0
independent of
θ
such that we can
4
FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation
further bound Ineq. (14) above by
JDiff(θ)2JSM(θ)·M(θ) + pM(θ) + C1.(15)
Thus, DKLq0
pODE
0,θDKLqT
pODE
T,θ
+JSM(θ) + J1/2
SM (θ)M(θ) + pM(θ) + C11/2.
Theorem 4.3. If Assumption. F.2 is satisfied, then there is
another finite constant C2>0independent of θsuch that
JDiff(θ)2JSM(θ)·M(θ) + m(θ) + C2.(16)
It is noticed that constants
C1
and
C2
involve regularity
bounds of the ground truth density and Lipschitz constants
of networks. Hence, the upper bounds in Ineq. (15) and (16)
are difficult to compare.
As the ground truth score should follow the score FPE, it
is intuitive that reduction of the score FPE residual encour-
ages the network-parametrized score to approach the ground
truth score (a special case is proved in Proposition 4.5). The-
orems 4.2 and 4.3 support that reduction of these quantities
related to the score FPE may also reduce the gap (in the KL
divergence) of their corresponding densities. In Section 7,
we empirically support these claims.
4.2. Conservativity
The ground truth score
s(x, t) = xlog qt(x)
is a conser-
vative vector field. That is, it can be expressed as a gradient
of some real-valued function. However, scores learned in
practice do not satisfy this property (Salimans & Ho,2021).
Below, we prove that we can implicitly enforce conserva-
tivity by minimizing the time-averaged error
ϵ[sθ](x, τ)
of
the score FPE.
Proposition 4.4. If there is a
tθ[0, T ]
so that
sθ(x, tθ) = xlog qtθ(x)
for all
xRD
, then there
exists a real-valued function
Ψθ:RD×[0, T ]R
(with
an explicit expression) that satisfies
sθ(x, t)− ∇xΨθ(x, t) = Zt
tθ
ϵ[sθ](x, τ), (17)
for all (x, t)RD×[0, T ]. In particular,
sθ(x, t)− ∇xΨθ(x, t)2Ztθ
tϵ[sθ](x, τ)2.
(18)
Eq. (17) indicates that the error of the score FPE quan-
tifies the degree of conservativity of
sθ
. We further ex-
plain this idea via Ineq. (18), from which we easily obtain
sθ(x, t)− ∇xΨθ(x, t)2Rtθ
tϵ[sθ](x, τ)2
RT
0ϵ[sθ](x, τ)2
, for any
x
and
t
. Thus, if the
θ
-
parametrized score approximately satisfies the score FPE,
giving a small score FPE error
RT
0ϵ[sθ](x, τ)2
, then
the estimated score should nearly be conservative, i.e., close
to the gradient of a scalar function
Ψθ(x, t)
. We empirically
support this fact in Section 6.2.
Proposition 4.4 necessitates a precise alignment of scores
at a given timestep. However, we propose a modification
that allows for a small discrepancy by incorporating an error
term into the score matching process. As a result, we present
an expanded proposition, namely Proposition G.4, which is
detailed in Appendix G.5.
4.3. Equivalence of scores
We now investigate another implication of satisfying the
score FPE which connects the score
sθ
with the ground
truth
s
,
sSDE
θ
, and
sODE
θ
. The following proposition provides
conditions under which all of these scores are identical if
we train to reach a zero residual for the score FPE for all
(x, t).
Proposition 4.5. (1) Suppose in some suitable function
space,
0
is the unique strong solution to the PDEs
tv
x1
2g2divx(v) + 1
2g2v2
2+ 2v,s− ⟨f,v= 0
with a zero initial condition
v(x,0) 0
and a zero
boundary condition. If there is some
θ0
so that for all
(x, t)ϵ[sθ0](x, t)=0
and that
sθ0(x,0) = s(x,0)
, then
sθ0(x, t) = s(x, t), for all (x, t).
(2) Moreover, suppose the PDEs
tv+x1
2g2divx(v) +
1
2g2v2
2+f,v= 0
with zero initial and boundary con-
dition have
0
as the unique strong solution. Then
ϵ[sθ0]0
and sθ0(x,0) sSDE
θ0(x,0) implies sθ0sSDE
θ0.
(3) Lastly, if there is some
θ0
such that
tvx1
2g2sθ0
f,v= 0
with zero initial and boundary conditions ad-
mit
0
as the unique strong solution, then
ϵ[sθ0]0
and
sθ0(x,0) sODE
θ0(x,0) implies sθ0sODE
θ0.
Proposition 4.5 implies that if the parametric scores match
with the ground truth score at the initial time, the only global
minimum is the ground truth score. Essentially, the scores
at any given time can be obtained solely by achieving a
flawless alignment of scores at a single timestep through
the dynamics of PDE. This indicates the score FPE resid-
ual is a proper quantity to measure the gaps between the
ground truth and parametric scores. Indeed, this proposi-
tion is an extreme case of “the continuous dependence of
PDE solutions on parameters
θ
” (Artstein,1975). A more
sophisticated analysis (Lunardi,2012;Papageorgiou,1994)
can be applied to prove for instance, that as
ϵ[sθ]∥ → 0
,
sθsSDE
θ
0
if
f0
(with a careful choice of
norms). However, such technical generalization is outside
this work’s scope.
5
摘要:

FP-Diffusion:ImprovingScore-basedDiffusionModelsbyEnforcingtheUnderlyingScoreFokker-PlanckEquationChieh-HsinLai1YuhtaTakida1NaokiMurata1ToshimitsuUesaka1YukiMitsufuji12StefanoErmon3AbstractScore-basedgenerativemodels(SGMs)learnafamilyofnoise-conditionalscorefunctionscor-respondingtothedatadensityper...

展开>> 收起<<
FP-Diffusion Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation.pdf

共34页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:34 页 大小:4.17MB 格式:PDF 时间:2025-04-22

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 34
客服
关注