Score-based Denoising Diffusion with Non-Isotropic Gaussian Noise Models Vikram Voleti

2025-05-03 0 0 1.55MB 23 页 10玖币
侵权投诉
Score-based Denoising Diffusion with
Non-Isotropic Gaussian Noise Models
Vikram Voleti
Mila, University of Montreal
Canada
Adam Oberman
Mila, McGill University
Canada
Christopher Pal
Mila, Polytechnique Montréal
CIFAR AI Chair, Service Now
Canada
Abstract
Generative models based on denoising diffusion techniques have led to an unprece-
dented increase in the quality and diversity of imagery that is now possible to
create with neural generative models. However, most contemporary state-of-the-art
methods are derived from a standard isotropic Gaussian formulation. In this work
we examine the situation where non-isotropic Gaussian distributions are used. We
present the key mathematical derivations for creating denoising diffusion models
using an underlying non-isotropic Gaussian noise model. We also provide initial
experiments with the CIFAR10 dataset to help verify empirically that this more
general modelling approach can also yield high-quality samples.
1 Introduction
(a) Isotropic (b) Non-isotropic
Figure 1: Gaussian noise samples.
Score-based denoising diffusion models [
16
,
6
,
18
] have
seen great success as generative models for images [
4
,
17
],
as well as other modes such as video [
7
,
22
,
19
], audio [
9
,
3
],
etc. The underlying framework relies on a noising "forward"
process that adds noise to real images (or other data), and a
denoising "reverse" process that iteratively removes noise.
In most cases, the noise distribution used is the isotropic
Gaussian i.e. noise samples are independently and identi-
cally distributed (IID) as the standard normal at each pixel.
In this work, we lay the theoretical foundations and derive the key mathematics for a non-isotropic
Gaussian formulation for denoising diffusion models. It is our hope that these insights may open the
door to new classes of models. One type of non-isotropic Gaussian noise arises in a family of models
known as Gaussian Free Fields (GFFs) [
14
,
1
,
2
,
20
] (a.k.a. Gaussian Random Fields). GFF noise can
be obtained by either convolving isotropic Gaussian noise with a filter, or applying frequency masking
of noise. In either case this procedure allows one to model or generate smoother and correlated types
of Gaussian noise. In Figures 1 and 2, we compare examples of isotropic Gaussian noise with GFF
noise obtained using a frequency space window function consisting of w(f) = 1
f.
Our contributions here consist of the following: (1) deriving the key mathematics for score-based
denoising diffusion models using non-isotropic multivariate Gaussian distributions, (2) examining the
special case of a GFF and the corresponding non-Isotropic Gaussian noise model, and (3) showing
that diffusion models trained (eg. on the CIFAR-10 dataset [
10
]) using a GFF noise process are also
capable of yielding high-quality samples comparable to models based on isotropic Gaussian noise.
Appendix A and Appendix B contain more detailed derivations of the above equations for DDPM [
6
]
and our NI-DDPM. See Appendix D and Appendix E for the equivalent derivations for Score
Matching Langevin Dynamics (SMLD) [16, 17], and our Non-Isotropic SMLD (NI-SMLD).
NeurIPS 2022 Workshop on Score-Based Methods.
arXiv:2210.12254v2 [cs.LG] 23 Nov 2022
2 Isotropic Gaussian denoising diffusion models
We perform our analysis below within the Denoising Diffusion Probabilistic Models (DDPM) [
6
]
framework, but our analysis is valid for all other types of score-based denoising diffusion models.
In DDPM, for a fixed sequence of positive scales
0< β1<··· < βL<1
,
¯αt=Qt
s=1(1 βs)
, and
a noise sample ∼ N(0,I), the cumulative “forward” noising process is:
qt(xt|x0) = N(¯αtx0,(1 ¯αt)I) =xt=¯αtx0+1¯αt(1)
The “
reverse
” process involves iteratively
sampling xt1
from
xt
conditioned on
x0
i.e.
pt1(xt1|
xt,x0)
, obtained from
qt(xt|x0)
using Bayes’ rule. For this, first
is estimated using a neural
network θ(xt, t). Then, using ˆ
x0=xt1¯αtθ(xt, t)/¯αtfrom eq. (1), xt1is sampled:
pt1(xt1|xt,ˆ
x0) = N(˜
µt(xt,ˆ
x0),˜
βtI) =xt1=˜
µt(xt,ˆ
x0) + q˜
βtzt; where (2)
˜
µt(xt,ˆ
x0) = ¯αt1βt
1¯αt
ˆ
x0+1βt(1 ¯αt1)
1¯αt
xt;˜
βt=1¯αt1
1¯αt
βt;zt∼ N(0,I)(3)
The objective function to train θ(xt, t)is simply an expected reconstruction loss with the true :
L(θ) = Et∼U(1,··· ,L),x0p(x0),∼N (0,I)h
θ¯αtx0+1¯αt, t
2
2i(4)
From the perspective of score matching, the score of the DDPM forward process is:
Score s=xtlog qt(xt|x0) = 1
(1 ¯αt)(xt¯αtx0) = 1
1¯αt
(5)
Thus, the overall
score-matching objective
for a score estimation network
sθ(xt, t)
is the weighted
sum of the loss
`s(θ;t)
for each
t
, the weight being the inverse of the score
variance
at
t
i.e.
(1 ¯αt)
:
Ls(θ) = Et(1 ¯αt)`s(θ;t) = Et,x0,
1¯αtsθ(xt, t) +
2
2(6)
When the score network output is redefined as per the score-noise relationship in eq. (5):
sθ(xt, t) = 1
1¯αt
θ(xt, t) =⇒ Ls(θ) = Et,x0,k−θ(xt, t) + k2
2=L(θ)(7)
Thus, Ls=Li.e. the score-matching and noise reconstruction objectives are equivalent.
From [
13
], the
Expected Denoised Sample
(EDS)
x
0(xt, t),Ex0pt(x0|xt)[x0]
and the score
s
,
estimated optimally as sθ, are related as:
sθ(xt, t) = Ehk∇xtlog qt(xt|x0)k2
2ix
0(xt, t)xt=1
1¯αtx
0(xt, t)xt(8)
=x
0(xt, t) = xt+ (1 ¯αt)sθ(xt, t) = xt1¯αtθ(xt, t)(9)
The EDS is often used to further improve the quality of the final image at t= 0.
3 Non-isotropic Gaussian denoising diffusion models
We formulate the Non-Isotropic DDPM (
NI-DDPM
) using a non-isotropic Gaussian noise distribution
with a positive semi-definite covariance matrix Σin the place of I. The forward noising process is:
qt(xt|x0) = N(¯αtx0,(1 ¯αt)Σ) =xt=¯αtx0+1¯αtΣ(10)
Thus, the score of NI-DDPM is (see Appendix B.1 for derivation):
Score s=xtlog qt(xt|x0) = Σ1xt¯αtx0
1¯αt
=1
1¯αt
Σ1(11)
The score-matching objective for a score estimation network sθ(xt, t)at each noise level tis now:
`(θ;t) = Ex0p(x0),∼N (0,I)
sθ(¯αtx0+1¯αt, t) + 1
1¯αt
Σ1
2
2(12)
2
The variance of this score is:
Ehk∇xtlog qt(xt|x0)k2
2i=E"
1
1¯αt
Σ1
2
2#=1
1¯αt
Σ1Ehkk2
2i(13)
The
overall objective
is a weighted sum, the weight being the inverse of the score variance
(1¯αt)Σ
:
L(θ) = Et∼U(1,··· ,L)(1 ¯αt)Σ`(θ;t) = Et,x0,
1¯αtΣsθ(xt, t) +
2
2(14)
Following the score-noise relationship in eq. (11):
sθ(xt, t) = 1
1¯αt
Σ1θ(xt, t)(15)
The objective function now becomes (expanding sθas per eq. (15)):
L(θ) = Et∼U(1,··· ,L),x0p(x0),∼N (0,I)
θ(¯αtx0+1¯αtΣ, t) +
2
2(16)
This objective function for NI-DDPM seems like
L
of DDPM, but DDPM’s
θ
network cannot be
re-used here since their forward processes are different. DDPM produces
xt
from
x0
using eq. (1),
while NI-DDPM uses eq. (10). See Appendix B.4 for alternate formulations of the score network.
Sampling involves computing pt1(xt1|xt,ˆ
x0)(see Appendix B.6 for derivation):
qt(xt|x0) = N(¯αtx0,(1 ¯αt)Σ) =ˆ
x0=1
¯αtxt1¯αtΣθ(xt, t)(17)
pt1(xt1|xt,ˆ
x0) = N(˜
µt(xt,ˆ
x0),˜
βtΣ) =xt1=˜
µt(xt,ˆ
x0) + q˜
βtΣzt(18)
where ˜
µt,˜
βtand ztare the same as eq. (3).
Alternatively, [18] mentions using βtinstead of ˜
βt:
pβt
t1(xt1|xt,ˆ
x0) = N(˜
µt(xt,ˆ
x0), βtΣ) =xt1=˜
µt(xt,ˆ
x0) + pβtΣzt(19)
Alternatively, sampling using DDIM [15] invokes the following distribution for xt1:
pDDIM
t1(xt1|xt,ˆ
x0) = N¯αt1ˆ
x0+p1¯αt1
xt¯αtˆ
x0
1¯αt
,0(20)
=xt1=¯αt1ˆ
x0+p1¯αt1Σθ(xt, t)(21)
The Expected Denoised Sample x
0(xt, t)and the optimal score sθare now related as:
sθ(xt, t) = Ehk∇xtlog qt(xt|x0)k2
2ix
0(xt, t)xt=1
1¯αt
Σ1x
0(xt, t)xt(22)
=x
0(xt, t) = xt+ (1 ¯αt)Σsθ(xt, t) = xt1¯αtΣθ(xt, t)(23)
SDE formulation
: Score-based diffusion models have also been analyzed as stochastic differential
equations (SDEs) [
18
]. The SDE version of NI-DDPM, which we call Non-Isotropic Variance
Preserving (NIVP) SDE, is (see Appendix B.9 for derivation):
dx=1
2β(t)xdt+pβ(t)Σdw(24)
=p0tx(t)|x(0)=Nx(0) e1
2Rt
0β(s)ds,Σ(IIeRt
0β(s)ds)(25)
Finally, Appendix A and Appendix B contain more detailed derivations of the above equations for
DDPM [
6
] and our NI-DDPM. See Appendix D and Appendix E for the equivalent derivations for
Score Matching Langevin Dynamics (SMLD) [16, 17], and our Non-Isotropic SMLD (NI-SMLD).
3
4 Gaussian Free Field (GFF) images
A GFF image
g
can be obtained from a normal noise image
z
as follows [
14
] (see Appendix C for
more details):
1.
First, sample an
n×n
noise image
z
from the standard complex normal distribution with
covariance matrix
Γ = IN
where
N=n2
is the total number of pixels, and pseudo-
covariance matrix C=0:z∼ CN(0,IN,0). (In principle, real noise could be used.)
2. Apply the Discrete Fourier Transform using its N×Nweights matrix WN:WNz.
3.
Consider a diagonal
N×N
matrix of the reciprocal of an index value
|kij |
per pixel
(i, j)
in
Fourier space : K1= [1/|kij |](i,j), and multiply this with the above: K1WNz.
4.
Take its Inverse Discrete Fourier Transform (
W1
N
) to make the raw GFF image:
W1
NK1WNz. However, this results in a GFF image with a small non-unit variance.
5.
Normalize the above GFF image with the standard deviation
σN
at its resolution
N
, so that it
has unit variance (see Appendix C.1 for derivation of σN): gcomplex =1
σNW1
NK1WNz
6. Extract only the real part of gcomplex, and normalize (see Appendix C.2 for derivation):
g=1
2NσN
Real W1
NK1WNz(26)
See Figures 1 and 2 for examples of GFF images. Effectively, this procedure prioritizes lower
frequencies over higher frequencies, thereby making the noise smoother, and hence correlated. The
probability distribution of GFF images
g
can be seen as a non-isotropic multivariate Gaussian with
mean 0, and a non-diagonal covariance matrix Σ(see Appendices C.1 and C.2 for derivation):
p(g) = N(0,Σ); Σ=ΣΣT;Σ=1
2NσN
Real W1
NK1WN=g=Σz (27)
5 Results
Table 1: Image generation metrics FID, Precision
(P), and Recall (R) for CIFAR10 using DDPM and
NI-DDPM, with different generation steps.
CIFAR10 steps FID PR
DDPM
1000 6.05 0.66 0.54
100 12.25 0.62 0.48
50 16.61 0.60 0.43
20 26.35 0.56 0.24
10 44.95 0.49 0.24
NI-DDPM
1000 6.95 0.62 0.53
100 12.68 0.60 0.49
50 16.91 0.57 0.45
20 30.41 0.52 0.35
10 60.32 0.43 0.23
We train two models on CIFAR10, one using
DDPM and the other using NI-DDPM with the
exact same hyperparameters (batch size, learn-
ing rate, etc.) for 300,000 iterations. We then
sample 50,000 images from each, and calculate
the image generation metrics of Fréchet Incep-
tion Distance (FID) [
5
], Precision (P), and Re-
call (R). Although the models were trained on
1000 steps between data and noise, we report
these metrics while sampling images using 1000,
and smaller steps: 100, 50, 20, 10.
As can be seen from Table 1, our non-isotropic
variant performs comparable to the isotropic
baseline. The difference between them increases
with decreasing number of steps between noise
and data. This provides a reasonable proof-of-concept that non-isotropic Gaussian noise works just
as well as isotropic noise when used in denoising diffusion models for image generation.
6 Conclusion
We have presented the key mathematics behind non-isotropic Gaussian DDPMs, as well as a complete
example using a GFF. We then noted quantitative comparison of using GFF noise vs. regular noise on
the CIFAR-10 dataset. In the appendix, we also include further derivations for non-isotropic SMLD
models. GFFs are just one example of a well known class of models that are a subset of non-isotropic
Gaussian distributions. In the same way that other work has examined non-Gaussian distributions
such as the Gamma distribution [
11
], Poisson distribution [
21
], and Heat dissipation processes [
12
],
we hope that our work here may lay the foundation for other new denoising diffusion formulations.
4
References
[1]
Nathanaël Berestycki. Introduction to the gaussian free field and liouville quantum gravity.
Lecture notes, 2015.
[2]
Maury Bramson, Jian Ding, and Ofer Zeitouni. Convergence in law of the maximum of the two-
dimensional discrete gaussian free field. Communications on Pure and Applied Mathematics,
69(1):62–123, 2016.
[3]
Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, and William Chan.
Wavegrad: Estimating gradients for waveform generation. In International Conference on
Learning Representations, 2021.
[4]
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.
Advances in Neural Information Processing Systems, 34, 2021.
[5]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter.
Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances
in neural information processing systems, pages 6626–6637, 2017.
[6]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances
in Neural Information Processing Systems, 2020.
[7]
Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and
David J. Fleet. Video diffusion models, 2022.
[8]
Alexia Jolicoeur-Martineau, Rémi Piché-Taillefer, Rémi Tachet des Combes, and Ioannis
Mitliagkas. Adversarial score matching and improved sampling for image generation. Interna-
tional Conference on Learning Representations, 2021.
[9]
Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. Diffwave: A versatile
diffusion model for audio synthesis. In International Conference on Learning Representations,
2021.
[10]
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images.
Technical Report, University of Toronto, 2009.
[11]
Eliya Nachmani, Robin San Roman, and Lior Wolf. Denoising diffusion gamma models, 2021.
[12]
Severi Rissanen, Markus Heinonen, and Arno Solin. Generative modelling with inverse heat
dissipation, 2022.
[13]
Saeed Saremi and Aapo Hyvarinen. Neural empirical bayes. Journal of Machine Learning
Research, 20:1–23, 2019.
[14]
Scott Sheffield. Gaussian free fields for mathematicians. Probability theory and related fields,
139(3):521–541, 2007.
[15]
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.
International Conference on Learning Representations, 2021.
[16]
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data
distribution. Advances in Neural Information Processing Systems, 2019.
[17]
Yang Song and Stefano Ermon. Improved techniques for training score-based generative models.
Advances in Neural Information Processing Systems, 2020.
[18]
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon,
and Ben Poole. Score-based generative modeling through stochastic differential equations.
International Conference on Learning Representations, 2021.
[19]
Vikram Voleti, Alexia Jolicoeur-Martineau, and Christopher Pal. Mcvd: Masked conditional
video diffusion for prediction, generation, and interpolation. In (NeurIPS) Advances in Neural
Information Processing Systems, 2022.
[20]
Wendelin Werner and Ellen Powell. Lecture notes on the gaussian free field. arXiv preprint
arXiv:2004.04720, 2020.
[21]
Yilun Xu, Ziming Liu, Max Tegmark, and Tommi Jaakkola. Poisson flow generative models,
2022.
[22]
Ruihan Yang, Prakhar Srivastava, and Stephan Mandt. Diffusion probabilistic modeling for
video generation. arXiv preprint arXiv:2203.09481, 2022.
5
摘要:

Score-basedDenoisingDiffusionwithNon-IsotropicGaussianNoiseModelsVikramVoletiMila,UniversityofMontrealCanadaAdamObermanMila,McGillUniversityCanadaChristopherPalMila,PolytechniqueMontréalCIFARAIChair,ServiceNowCanadaAbstractGenerativemodelsbasedondenoisingdiffusiontechniqueshaveledtoanunprece-dentedi...

展开>> 收起<<
Score-based Denoising Diffusion with Non-Isotropic Gaussian Noise Models Vikram Voleti.pdf

共23页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:23 页 大小:1.55MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 23
客服
关注