Score-based Denoising Diffusion with Non-Isotropic Gaussian Noise Models Vikram Voleti

2025-05-03 3 0 1.55MB 23 页 10玖币

侵权投诉

Score-based Denoising Diffusion with

Non-Isotropic Gaussian Noise Models

Vikram Voleti

Mila, University of Montreal

Canada

Adam Oberman

Mila, McGill University

Canada

Christopher Pal

Mila, Polytechnique Montréal

CIFAR AI Chair, Service Now

Canada

Abstract

Generative models based on denoising diffusion techniques have led to an unprece-

dented increase in the quality and diversity of imagery that is now possible to

create with neural generative models. However, most contemporary state-of-the-art

methods are derived from a standard isotropic Gaussian formulation. In this work

we examine the situation where non-isotropic Gaussian distributions are used. We

present the key mathematical derivations for creating denoising diffusion models

using an underlying non-isotropic Gaussian noise model. We also provide initial

experiments with the CIFAR10 dataset to help verify empirically that this more

general modelling approach can also yield high-quality samples.

1 Introduction

(a) Isotropic (b) Non-isotropic

Figure 1: Gaussian noise samples.

Score-based denoising diffusion models [

] have

seen great success as generative models for images [

as well as other modes such as video [

], audio [

etc. The underlying framework relies on a noising "forward"

process that adds noise to real images (or other data), and a

denoising "reverse" process that iteratively removes noise.

In most cases, the noise distribution used is the isotropic

Gaussian i.e. noise samples are independently and identi-

cally distributed (IID) as the standard normal at each pixel.

In this work, we lay the theoretical foundations and derive the key mathematics for a non-isotropic

Gaussian formulation for denoising diffusion models. It is our hope that these insights may open the

door to new classes of models. One type of non-isotropic Gaussian noise arises in a family of models

known as Gaussian Free Fields (GFFs) [

] (a.k.a. Gaussian Random Fields). GFF noise can

be obtained by either convolving isotropic Gaussian noise with a ﬁlter, or applying frequency masking

of noise. In either case this procedure allows one to model or generate smoother and correlated types

of Gaussian noise. In Figures 1 and 2, we compare examples of isotropic Gaussian noise with GFF

noise obtained using a frequency space window function consisting of w(f) = 1

Our contributions here consist of the following: (1) deriving the key mathematics for score-based

denoising diffusion models using non-isotropic multivariate Gaussian distributions, (2) examining the

special case of a GFF and the corresponding non-Isotropic Gaussian noise model, and (3) showing

that diffusion models trained (eg. on the CIFAR-10 dataset [

]) using a GFF noise process are also

capable of yielding high-quality samples comparable to models based on isotropic Gaussian noise.

Appendix A and Appendix B contain more detailed derivations of the above equations for DDPM [

]

and our NI-DDPM. See Appendix D and Appendix E for the equivalent derivations for Score

Matching Langevin Dynamics (SMLD) [16, 17], and our Non-Isotropic SMLD (NI-SMLD).

NeurIPS 2022 Workshop on Score-Based Methods.

arXiv:2210.12254v2 [cs.LG] 23 Nov 2022

2 Isotropic Gaussian denoising diffusion models

We perform our analysis below within the Denoising Diffusion Probabilistic Models (DDPM) [

]

framework, but our analysis is valid for all other types of score-based denoising diffusion models.

In DDPM, for a ﬁxed sequence of positive scales

0< β1<··· < βL<1

¯αt=Qt

s=1(1 −βs)

, and

a noise sample ∼ N(0,I), the cumulative “forward” noising process is:

qt(xt|x0) = N(√¯αtx0,(1 −¯αt)I) =⇒xt=√¯αtx0+√1−¯αt(1)

The “

reverse

” process involves iteratively

sampling xt−1

from

conditioned on

i.e.

pt−1(xt−1|

xt,x0)

, obtained from

qt(xt|x0)

using Bayes’ rule. For this, ﬁrst



is estimated using a neural

network θ(xt, t). Then, using ˆ

x0=xt−√1−¯αtθ(xt, t)/√¯αtfrom eq. (1), xt−1is sampled:

pt−1(xt−1|xt,ˆ

x0) = N(˜

µt(xt,ˆ

x0),˜

βtI) =⇒xt−1=˜

µt(xt,ˆ

x0) + q˜

βtzt; where (2)

µt(xt,ˆ

x0) = √¯αt−1βt

1−¯αt

x0+√1−βt(1 −¯αt−1)

1−¯αt

xt;˜

βt=1−¯αt−1

1−¯αt

βt;zt∼ N(0,I)(3)

The objective function to train θ(xt, t)is simply an expected reconstruction loss with the true :

L(θ) = Et∼U(1,··· ,L),x0∼p(x0),∼N (0,I)h

−θ√¯αtx0+√1−¯αt, t



2i(4)

From the perspective of score matching, the score of the DDPM forward process is:

Score s=∇xtlog qt(xt|x0) = −1

(1 −¯αt)(xt−√¯αtx0) = −1

√1−¯αt

(5)

Thus, the overall

score-matching objective

for a score estimation network

sθ(xt, t)

is the weighted

sum of the loss

`s(θ;t)

for each

, the weight being the inverse of the score

variance

i.e.

(1 −¯αt)

Ls(θ) = Et(1 −¯αt)`s(θ;t) = Et,x0,

√1−¯αtsθ(xt, t) + 



2(6)

When the score network output is redeﬁned as per the score-noise relationship in eq. (5):

sθ(xt, t) = −1

√1−¯αt

θ(xt, t) =⇒ Ls(θ) = Et,x0,k−θ(xt, t) + k2

2=L(θ)(7)

Thus, Ls=Li.e. the score-matching and noise reconstruction objectives are equivalent.

From [

], the

Expected Denoised Sample

(EDS)

x∗

0(xt, t),Ex0∼pt(x0|xt)[x0]

and the score

estimated optimally as sθ∗, are related as:

sθ∗(xt, t) = Ehk∇xtlog qt(xt|x0)k2

2ix∗

0(xt, t)−xt=1

1−¯αtx∗

0(xt, t)−xt(8)

=⇒x∗

0(xt, t) = xt+ (1 −¯αt)sθ∗(xt, t) = xt−√1−¯αtθ∗(xt, t)(9)

The EDS is often used to further improve the quality of the ﬁnal image at t= 0.

3 Non-isotropic Gaussian denoising diffusion models

We formulate the Non-Isotropic DDPM (

NI-DDPM

) using a non-isotropic Gaussian noise distribution

with a positive semi-deﬁnite covariance matrix Σin the place of I. The forward noising process is:

qt(xt|x0) = N(√¯αtx0,(1 −¯αt)Σ) =⇒xt=√¯αtx0+√1−¯αt√Σ(10)

Thus, the score of NI-DDPM is (see Appendix B.1 for derivation):

Score s=∇xtlog qt(xt|x0) = −Σ−1xt−√¯αtx0

1−¯αt

=−1

√1−¯αt

√Σ−1(11)

The score-matching objective for a score estimation network sθ(xt, t)at each noise level tis now:

`(θ;t) = Ex0∼p(x0),∼N (0,I)



sθ(√¯αtx0+√1−¯αt, t) + 1

√1−¯αt

√Σ−1



2(12)

The variance of this score is:

Ehk∇xtlog qt(xt|x0)k2

2i=E"



−1

√1−¯αt

√Σ−1



2#=1

1−¯αt

Σ−1Ehkk2

2i(13)

The

overall objective

is a weighted sum, the weight being the inverse of the score variance

(1−¯αt)Σ

L(θ) = Et∼U(1,··· ,L)(1 −¯αt)Σ`(θ;t) = Et,x0,



√1−¯αt√Σsθ(xt, t) + 



2(14)

Following the score-noise relationship in eq. (11):

sθ(xt, t) = −1

√1−¯αt

√Σ−1θ(xt, t)(15)

The objective function now becomes (expanding sθas per eq. (15)):

L(θ) = Et∼U(1,··· ,L),x0∼p(x0),∼N (0,I)



−θ(√¯αtx0+√1−¯αt√Σ, t) + 



2(16)

This objective function for NI-DDPM seems like

L

of DDPM, but DDPM’s

θ

network cannot be

re-used here since their forward processes are different. DDPM produces

from

using eq. (1),

while NI-DDPM uses eq. (10). See Appendix B.4 for alternate formulations of the score network.

Sampling involves computing pt−1(xt−1|xt,ˆ

x0)(see Appendix B.6 for derivation):

qt(xt|x0) = N(√¯αtx0,(1 −¯αt)Σ) =⇒ˆ

x0=1

√¯αtxt−√1−¯αt√Σθ(xt, t)(17)

pt−1(xt−1|xt,ˆ

x0) = N(˜

µt(xt,ˆ

x0),˜

βtΣ) =⇒xt−1=˜

µt(xt,ˆ

x0) + q˜

βt√Σzt(18)

where ˜

µt,˜

βtand ztare the same as eq. (3).

Alternatively, [18] mentions using βtinstead of ˜

βt:

pβt

t−1(xt−1|xt,ˆ

x0) = N(˜

µt(xt,ˆ

x0), βtΣ) =⇒xt−1=˜

µt(xt,ˆ

x0) + pβt√Σzt(19)

Alternatively, sampling using DDIM [15] invokes the following distribution for xt−1:

pDDIM

t−1(xt−1|xt,ˆ

x0) = N√¯αt−1ˆ

x0+p1−¯αt−1

xt−√¯αtˆ

√1−¯αt

,0(20)

=⇒xt−1=√¯αt−1ˆ

x0+p1−¯αt−1√Σθ(xt, t)(21)

The Expected Denoised Sample x∗

0(xt, t)and the optimal score sθ∗are now related as:

sθ∗(xt, t) = Ehk∇xtlog qt(xt|x0)k2

2ix∗

0(xt, t)−xt=1

1−¯αt

Σ−1x∗

0(xt, t)−xt(22)

=⇒x∗

0(xt, t) = xt+ (1 −¯αt)Σsθ∗(xt, t) = xt−√1−¯αt√Σθ∗(xt, t)(23)

SDE formulation

: Score-based diffusion models have also been analyzed as stochastic differential

equations (SDEs) [

]. The SDE version of NI-DDPM, which we call Non-Isotropic Variance

Preserving (NIVP) SDE, is (see Appendix B.9 for derivation):

dx=−1

2β(t)xdt+pβ(t)√Σdw(24)

=⇒p0tx(t)|x(0)=Nx(0) e−1

2Rt

0β(s)ds,Σ(I−Ie−Rt

0β(s)ds)(25)

Finally, Appendix A and Appendix B contain more detailed derivations of the above equations for

DDPM [

] and our NI-DDPM. See Appendix D and Appendix E for the equivalent derivations for

Score Matching Langevin Dynamics (SMLD) [16, 17], and our Non-Isotropic SMLD (NI-SMLD).

4 Gaussian Free Field (GFF) images

A GFF image

can be obtained from a normal noise image

as follows [

] (see Appendix C for

more details):

First, sample an

n×n

noise image

from the standard complex normal distribution with

covariance matrix

Γ = IN

where

N=n2

is the total number of pixels, and pseudo-

covariance matrix C=0:z∼ CN(0,IN,0). (In principle, real noise could be used.)

2. Apply the Discrete Fourier Transform using its N×Nweights matrix WN:WNz.

Consider a diagonal

N×N

matrix of the reciprocal of an index value

|kij |

per pixel

(i, j)

Fourier space : K−1= [1/|kij |](i,j), and multiply this with the above: K−1WNz.

Take its Inverse Discrete Fourier Transform (

W−1

) to make the raw GFF image:

W−1

NK−1WNz. However, this results in a GFF image with a small non-unit variance.

Normalize the above GFF image with the standard deviation

σN

at its resolution

, so that it

has unit variance (see Appendix C.1 for derivation of σN): gcomplex =1

σNW−1

NK−1WNz

6. Extract only the real part of gcomplex, and normalize (see Appendix C.2 for derivation):

g=1

√2NσN

Real W−1

NK−1WNz(26)

See Figures 1 and 2 for examples of GFF images. Effectively, this procedure prioritizes lower

frequencies over higher frequencies, thereby making the noise smoother, and hence correlated. The

probability distribution of GFF images

can be seen as a non-isotropic multivariate Gaussian with

mean 0, and a non-diagonal covariance matrix Σ(see Appendices C.1 and C.2 for derivation):

p(g) = N(0,Σ); Σ=√Σ√ΣT;√Σ=1

√2NσN

Real W−1

NK−1WN=⇒g=√Σz (27)

5 Results

Table 1: Image generation metrics FID, Precision

(P), and Recall (R) for CIFAR10 using DDPM and

NI-DDPM, with different generation steps.

CIFAR10 steps FID ↓P↑R↑

DDPM

1000 6.05 0.66 0.54

100 12.25 0.62 0.48

50 16.61 0.60 0.43

20 26.35 0.56 0.24

10 44.95 0.49 0.24

NI-DDPM

1000 6.95 0.62 0.53

100 12.68 0.60 0.49

50 16.91 0.57 0.45

20 30.41 0.52 0.35

10 60.32 0.43 0.23

We train two models on CIFAR10, one using

DDPM and the other using NI-DDPM with the

exact same hyperparameters (batch size, learn-

ing rate, etc.) for 300,000 iterations. We then

sample 50,000 images from each, and calculate

the image generation metrics of Fréchet Incep-

tion Distance (FID) [

], Precision (P), and Re-

call (R). Although the models were trained on

1000 steps between data and noise, we report

these metrics while sampling images using 1000,

and smaller steps: 100, 50, 20, 10.

As can be seen from Table 1, our non-isotropic

variant performs comparable to the isotropic

baseline. The difference between them increases

with decreasing number of steps between noise

and data. This provides a reasonable proof-of-concept that non-isotropic Gaussian noise works just

as well as isotropic noise when used in denoising diffusion models for image generation.

6 Conclusion

We have presented the key mathematics behind non-isotropic Gaussian DDPMs, as well as a complete

example using a GFF. We then noted quantitative comparison of using GFF noise vs. regular noise on

the CIFAR-10 dataset. In the appendix, we also include further derivations for non-isotropic SMLD

models. GFFs are just one example of a well known class of models that are a subset of non-isotropic

Gaussian distributions. In the same way that other work has examined non-Gaussian distributions

such as the Gamma distribution [

], Poisson distribution [

], and Heat dissipation processes [

we hope that our work here may lay the foundation for other new denoising diffusion formulations.

References

[1]

Nathanaël Berestycki. Introduction to the gaussian free ﬁeld and liouville quantum gravity.

Lecture notes, 2015.

[2]

Maury Bramson, Jian Ding, and Ofer Zeitouni. Convergence in law of the maximum of the two-

dimensional discrete gaussian free ﬁeld. Communications on Pure and Applied Mathematics,

69(1):62–123, 2016.

[3]

Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, and William Chan.

Wavegrad: Estimating gradients for waveform generation. In International Conference on

Learning Representations, 2021.

[4]

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.

Advances in Neural Information Processing Systems, 34, 2021.

[5]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter.

Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances

in neural information processing systems, pages 6626–6637, 2017.

[6]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances

in Neural Information Processing Systems, 2020.

[7]

Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and

David J. Fleet. Video diffusion models, 2022.

[8]

Alexia Jolicoeur-Martineau, Rémi Piché-Taillefer, Rémi Tachet des Combes, and Ioannis

Mitliagkas. Adversarial score matching and improved sampling for image generation. Interna-

tional Conference on Learning Representations, 2021.

[9]

Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. Diffwave: A versatile

diffusion model for audio synthesis. In International Conference on Learning Representations,

2021.

[10]

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images.

Technical Report, University of Toronto, 2009.

[11]

Eliya Nachmani, Robin San Roman, and Lior Wolf. Denoising diffusion gamma models, 2021.

[12]

Severi Rissanen, Markus Heinonen, and Arno Solin. Generative modelling with inverse heat

dissipation, 2022.

[13]

Saeed Saremi and Aapo Hyvarinen. Neural empirical bayes. Journal of Machine Learning

Research, 20:1–23, 2019.

[14]

Scott Shefﬁeld. Gaussian free ﬁelds for mathematicians. Probability theory and related ﬁelds,

139(3):521–541, 2007.

[15]

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.

International Conference on Learning Representations, 2021.

[16]

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data

distribution. Advances in Neural Information Processing Systems, 2019.

[17]

Yang Song and Stefano Ermon. Improved techniques for training score-based generative models.

Advances in Neural Information Processing Systems, 2020.

[18]

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon,

and Ben Poole. Score-based generative modeling through stochastic differential equations.

International Conference on Learning Representations, 2021.

[19]

Vikram Voleti, Alexia Jolicoeur-Martineau, and Christopher Pal. Mcvd: Masked conditional

video diffusion for prediction, generation, and interpolation. In (NeurIPS) Advances in Neural

Information Processing Systems, 2022.

[20]

Wendelin Werner and Ellen Powell. Lecture notes on the gaussian free ﬁeld. arXiv preprint

arXiv:2004.04720, 2020.

[21]

Yilun Xu, Ziming Liu, Max Tegmark, and Tommi Jaakkola. Poisson ﬂow generative models,

2022.

[22]

Ruihan Yang, Prakhar Srivastava, and Stephan Mandt. Diffusion probabilistic modeling for

video generation. arXiv preprint arXiv:2203.09481, 2022.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Score-basedDenoisingDiffusionwithNon-IsotropicGaussianNoiseModelsVikramVoletiMila,UniversityofMontrealCanadaAdamObermanMila,McGillUniversityCanadaChristopherPalMila,PolytechniqueMontréalCIFARAIChair,ServiceNowCanadaAbstractGenerativemodelsbasedondenoisingdiffusiontechniqueshaveledtoanunprece-dentedi...

展开>> 收起<<

Score-based Denoising Diffusion with Non-Isotropic Gaussian Noise Models Vikram Voleti.pdf

共23页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Score-based Denoising Diffusion with Non-Isotropic Gaussian Noise Models Vikram Voleti

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: