
FP-Diffusion: Improving Score-based Diffusion Models by Enforcing
the Underlying Score Fokker-Planck Equation
Chieh-Hsin Lai 1Yuhta Takida 1Naoki Murata 1Toshimitsu Uesaka 1Yuki Mitsufuji 1 2 Stefano Ermon 3
Abstract
Score-based generative models (SGMs) learn a
family of noise-conditional score functions cor-
responding to the data density perturbed with
increasingly large amounts of noise. These
perturbed data densities are linked together by
the Fokker-Planck equation (FPE), a partial dif-
ferential equation (PDE) governing the spatial-
temporal evolution of a density undergoing a dif-
fusion process. In this work, we derive a cor-
responding equation called the score FPE that
characterizes the noise-conditional scores of the
perturbed data densities (i.e., their gradients). Sur-
prisingly, despite the impressive empirical perfor-
mance, we observe that scores learned through
denoising score matching (DSM) fail to fulfill the
underlying score FPE, which is an inherent self-
consistency property of the ground truth score.
We prove that satisfying the score FPE is desir-
able as it improves the likelihood and the degree
of conservativity. Hence, we propose to regular-
ize the DSM objective to enforce satisfaction of
the score FPE, and we show the effectiveness of
this approach across various datasets.
1. Introduction
Score-based generative models (SGMs), also referred to as
diffusion models (Sohl-Dickstein et al.,2015;Song & Er-
mon,2019;Ho et al.,2020;Song et al.,2020b;a), have led to
major advances in the generation of synthetic images (Dhari-
wal & Nichol,2021;Saharia et al.,2022;Rombach et al.,
2022;Kim et al.,2022) and audio (Kong et al.,2020). In
addition, SGMs have been applied to various downstream
tasks such as media content editing (Meng et al.,2021b;
1
Sony AI, Tokyo, Japan
2
Sony Group Corporation, Tokyo,
Japan
3
Department of Computer Science, Stanford University,
Stanford, CA, USA. Correspondence to: Chieh-Hsin Lai
<
Chieh-
hsin.lai@sony.com>.
Proceedings of the
40 th
International Conference on Machine
Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright
2023 by the author(s).
Cheuk et al.,2022), or restoration (Kawar et al.,2022;Saito
et al.,2022;Murata et al.,2023). An SGM involves a
stochastic forward and backward process. In the forward
process, also known as the diffusion process, noise with
gradually increasing variances is added to each data point
until the original structure is lost, transforming data into
pure noise. The backward process attempts to reverse the
diffusion process by using a neural network (called a noise-
conditional score model) that is trained to gradually denoise
the data, effectively transforming pure noise into clean data
samples. The neural network is trained with a denoising
score matching objective (Hyv
¨
arinen & Dayan,2005;Vin-
cent,2011) to estimate the score (i.e., the gradient of the
log-likelihood function) of the data density perturbed with
various amounts of noise (as in forward process).
The training can be interpreted as a joint estimation of the
scores of the original data density and all its perturbations.
Crucially, all these densities are closely related to each other,
as they correspond to the same data density perturbed with
various amounts of noise. With sufficiently small time steps,
the forward process is a diffusion (Song et al.,2020b) and
the spatial-temporal evolution of the data density is thus
governed by the classic Fokker-Planck partial differential
equation (PDE) (Øksendal,2003). In principle, this implies
that with knowledge of the density for a single noise level,
we could recover all the densities by solving the Fokker-
Planck equation (FPE) without any additional learning.
Our contributions Building on the above notions, we de-
rive an associated system of PDEs that characterizes the
evolution of the scores (i.e., gradients) of the perturbed data
densities; we term it as score Fokker-Planck equation (score
FPE). In theory, the ground truth scores of the perturbed
data densities must satisfy the score FPE (self-consistency
property). Hence, we mathematically study the implica-
tions of satisfying the score FPE. We prove the following
effects of reducing the score FPE error: (a) improvement
in the log-likelihood of the probability flow ordinary differ-
ential equation (ODE) diffusion mode (Song et al.,2020b),
(Theorems 4.2 and 4.3); and (b) improvement in the degree
of conservativity of the models (Proposition 4.4). In addi-
tion, we prove that (c) score FPE error reduction can be
achieved by enforcing higher-order score matching (Meng
1
arXiv:2210.04296v4 [cs.LG] 14 Jun 2023