Evaluating Unsupervised Denoising Requires Unsupervised Metrics

2025-04-24 0 0 7.92MB 21 页 10玖币

侵权投诉

Evaluating Unsupervised Denoising

Requires Unsupervised Metrics

Adri`

a Marcos Morales 123 Matan Leibovich 4Sreyas Mohan 1Joshua Lawrence Vincent 5Piyush Haluai 5

Mai Tan 5Peter Crozier 5Carlos Fernandez-Granda 1 4

Abstract

Unsupervised denoising is a crucial challenge in

real-world imaging applications. Unsupervised

deep-learning methods have demonstrated impres-

sive performance on benchmarks based on syn-

thetic noise. However, no metrics exist to eval-

uate these methods in an unsupervised fashion.

This is highly problematic for the many practi-

cal applications where ground-truth clean images

are not available. In this work, we propose two

novel metrics: the unsupervised mean squared

error (MSE) and the unsupervised peak signal-

to-noise ratio (PSNR), which are computed us-

ing only noisy data. We provide a theoretical

analysis of these metrics, showing that they are

asymptotically consistent estimators of the super-

vised MSE and PSNR. Controlled numerical ex-

periments with synthetic noise conﬁrm that they

provide accurate approximations in practice. We

validate our approach on real-world data from

two imaging modalities: videos in raw format and

transmission electron microscopy. Our results

demonstrate that the proposed metrics enable un-

supervised evaluation of denoising methods based

exclusively on noisy data.

1. Introduction

Image denoising is a fundamental challenge in image and

signal processing, as well as a key preprocessing step for

Center for Data Science, New York University, New York,

Centre de Formaci

o Interdisciplin

aria Superior, Universitat

Polit

ecnica de Catalunya, Barcelona, Spain

Radiomics Group,

Vall d’Hebron Institute of Oncology, Vall d’Hebron Barcelona

Hospital Campus, Barcelona, Spain

Courant Institute of Mathe-

matical Sciences, New York University, New York, NY

School

for Engineering of Matter, Transport & Energy, Arizona State Uni-

versity, Tempe, AZ. Correspondence to: Adri

a Marcos Morales

<adriamm98@gmail.com>.

Proceedings of the

40 th

International Conference on Machine

Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright

2023 by the author(s).

computer vision tasks. Convolutional neural networks

achieve state-of-the-art performance for this problem, when

trained using databases of clean images corrupted with sim-

ulated noise (Zhang et al.,2017a). However, in real-world

imaging applications such as microscopy, noiseless ground

truth videos are often not available. This has motivated the

development of unsupervised denoising approaches that can

be trained using only noisy measurements (Lehtinen et al.,

2018;Xie et al.,2020;Laine et al.,2019;Sheth et al.,2021;

Huang et al.,2021). These methods have demonstrated

impressive performance on natural-image benchmarks, es-

sentially on par with the supervised state of the art. However,

to the best of our knowledge, no unsupervised metrics are

currently available to evaluate them using only noisy data.

Reliance on supervised metrics makes it very challenging to

create benchmark datasets using real-world measurements,

because obtaining the ground-truth clean images required by

these metrics is often either impossible or very constraining.

In practice, clean images are typically estimated through

temporal averaging, which suppresses dynamic information

that is often crucial in scientiﬁc applications. Consequently,

quantitative evaluation of unsupervised denoising methods

is currently almost completely dominated by natural image

benchmark datasets with simulated noise (Lehtinen et al.,

2018;Xie et al.,2020;Laine et al.,2019;Sheth et al.,2021;

Huang et al.,2021), which are not always representative of

the signal and noise characteristics that arise in real-world

imaging applications.

The lack of unsupervised metrics also limits the applica-

tion of unsupervised denoising techniques in practice. In

the absence of quantitative metrics, domain scientists must

often rely on visual inspection to evaluate performance on

real measurements. This is particularly restrictive for deep-

learning approaches, because it makes it impossible to per-

form systematic hyperparameter optimization and model

selection on the data of interest.

In this work, we propose two novel unsupervised metrics

to address these issues: the unsupervised mean-squared er-

ror (uMSE) and the unsupervised peak signal-to-noise ratio

(uPSNR), which are computed exclusively from noisy data.

These metrics build upon existing unsupervised denoising

arXiv:2210.05553v3 [cs.CV] 30 May 2023

Evaluating Unsupervised Denoising Requires Unsupervised Metrics

methods, which minimize an unsupervised cost function

equal to the difference between the denoised estimate and

additional noisy copies of the signal of interest (Lehtinen

et al.,2018). The uMSE is equal to this cost function mod-

iﬁed with a correction term, which renders it an unbiased

estimator of the supervised MSE.

We provide a theoretical analysis of the uMSE and uPSNR,

proving that they are asymptotically consistent estimators of

the supervised MSE and PSNR respectively. Controlled ex-

periments on supervised benchmarks, where the true MSE

and PSNR can be computed exactly, conﬁrm that the uMSE

and uPSNR provide accurate approximations. In addition,

we validate the metrics on video data in RAW format, con-

taminated with real noise that does not follow a known

predeﬁned model.

In order to illustrate the potential impact of the pro-

posed metrics on imaging applications where no ground-

truth is available, we apply them to transmission-electron-

microscopy (TEM) data. Recent advances in direct electron

detection systems make it possible for experimentalists to

acquire highly time-resolved movies of dynamic events at

frame rates in the kilohertz range (Faruqi & McMullan,

2018;Ercius et al.,2020), which is critical to advance our

understanding of functional materials. Acquisition at such

high temporal resolution results in severe degradation by

shot noise. We show that unsupervised methods based on

deep learning can be effective in removing this noise, and

that our proposed metrics can be used to evaluate their per-

formance quantitatively using only noisy data.

To summarize, our contributions are (1) two novel unsu-

pervised metrics presented in Section 3, (2) a theoretical

analysis providing an asymptotic characterization of their

statistical properties (Section 4), (3) experiments showing

the accuracy of the metrics in a controlled situation where

ground-truth clean images are available (Section 5), (4) val-

idation on real-world videos in RAW format (Section 6),

and (5) an application to a real-world electron-microscopy

dataset, which illustrates the challenges of unsupervised

denoising in scientiﬁc imaging (Section 7).

Code to reproduce all computational experiments is avail-

able at https://github.com/adriamm98/umse

2. Background and Related work

Unsupervised denoising The past few years have seen

ground-breaking progress in unsupervised denoising, pio-

neered by Noise2Noise, a technique where a neural network

is trained on pairs of noisy images (Lehtinen et al.,2018).

Our unsupervised metrics are inspired by Noise2Noise,

which optimizes a cost function equal to our proposed

unsupervised MSE, but without a correction term (which

is not needed for training models). Subsequent work fo-

cused on performing unsupervised denoising from single

images using variations of the blind-spot method, where

a model is trained to estimate each noisy pixel value us-

ing its neighborhood but not the noisy pixel itself (to avoid

the trivial identity solution) (Krull et al.,2019;Laine et al.,

2019;Batson & Royer,2019a;Sheth et al.,2021;Xie et al.,

2020). More recently, Neighbor2Neighbor revisited the

Noise2Noise method, generating noisy image pairs from a

single noisy image via spatial subsampling (Huang et al.,

2021), an insight that can also be leveraged in combination

with our proposed metrics, as explained in Section B. Our

contribution with respect to these methods is a novel un-

supervised metric that can be used for evaluation, as it is

designed to be an unbiased and consistent estimator of the

MSE.

Stein’s unbiased risk estimator (SURE) provides an

asymptotically unbiased estimator of the MSE for i.i.d.

Gaussian noise (Donoho & Johnstone,1995). This cost

function has been used for training unsupervised denois-

ers (Metzler et al.,2018;Soltanayev & Chun,2018;Zhussip

et al.,2019;Mohan et al.,2021). In principle, SURE could

be used to compute the MSE for evaluation, but it has certain

limitations: (1) a closed form expression of the noise likeli-

hood is required, including the value of the noise parameters

(for example, this is not known for the real-world datasets in

Sections 6and 7), (2) computing SURE requires approximat-

ing the divergence of a denoiser (usually via Monte Carlo

methods (Ramani et al.,2008)), which is computationally

very expensive. Developing practical unsupervised metrics

based on SURE and studying their theoretical properties is

an interesting direction for future research.

Existing evaluation approaches In the literature, quanti-

tative evaluation of unsupervised denoising techniques has

mostly relied on images and videos corrupted with synthetic

noise (Lehtinen et al.,2018;Krull et al.,2019;Laine et al.,

2019;Batson & Royer,2019a;Sheth et al.,2021;Xie et al.,

2020). Recently, a few datasets containing real noisy data

have been created (Abdelhamed et al.,2018;Plotz & Roth,

2017;Xu et al.,2018;Zhang et al.,2019). Evaluation on

these datasets is based on supervised MSE and PSNR com-

puted from estimated clean images obtained by averaging

multiple noisy frames. Unfortunately, as a result, the metrics

cannot capture dynamically-changing features, which are

of interest in many applied domains. In addition, unless the

signal-to-noise ratio is quite high, it is necessary to average

over a large number of frames to approximate the MSE. For

example, as explained in Section D, for an image corrupted

by additive Gaussian noise with standard deviation σ= 15

we need to average

>1500

noisy images to achieve the

same approximation accuracy as our proposed approach

(see Figure 10), which only requires 3 noisy images, and

can also be computed from a single noisy image.

Evaluating Unsupervised Denoising Requires Unsupervised Metrics

Figure 1. MSE vs uMSE. The traditional supervised mean squared error (MSE) is computed by comparing the denoised estimate to the

clean ground truth (left). The proposed unsupervised MSE is computed only from noisy data, via comparison with a noisy reference

corresponding to the same ground-truth but corrupted with independent noise (right). A correction term based on two additional noisy

references debiases the estimator.

Noise-Level Estimation. The correction term in uMSE can

be interpreted as an estimate of the noise level, obtained by

cancelling out the clean signal. In this sense, it is related

to noise-level estimation methods (Liu et al.,2013;Lebrun

et al.,2015;Arias & Morel,2018). However, unlike uMSE,

these methods typically assume a parametric model for the

noise, and are not used for evaluation.

No-reference image quality assessment methods evaluate

the perceptual quality of an image (Li,2002;Mittal et al.,

2012), but not whether it is consistent with an underlying

ground-truth corresponding to the observed noisy measure-

ments, which is the goal of our proposed metrics.

3. Unsupervised Metrics For Unsupervised

Denoising

3.1. The Unsupervised Mean Squared Error

The goal of denoising is to estimate a clean signal from noisy

measurements. Let

x∈Rn

be a signal or a set of signals

with

total entries. We denote the corresponding noisy data

y∈Rn

. A denoiser

f:Rn→Rn

is a function that

maps the input

to an estimate of

. A common metric to

evaluate the quality of a denoiser is the mean squared error

between the clean signal and the estimate,

MSE := 1

i=1

(xi−f(y)i)2.(1)

Unfortunately, in most real-world scenarios clean ground-

truth signals are not available and evaluation can only be

carried out in an unsupervised fashion, i.e. exclusively from

the noisy measurements. In this section we propose an

unsupervised estimator of MSE inspired by recent advances

in unsupervised denoising (Lehtinen et al.,2018). The key

idea is to compare the denoised signal to a noisy reference,

which corresponds to the same clean signal corrupted by

independent noise.

In order to motivate our approach, let us assume that the

noise is additive, so that

y:= x+z

for a zero-mean noise

vector

z∈Rn

. Imagine that we have access to a noisy

reference

a:= x+w

corresponding to the same underlying

signal

, but corrupted with a different noise realization

w∈Rn

independent from

(Section 3.3 explains how to

obtain such references in practice). The mean squared dif-

ference between the denoised estimate and the reference is

approximately equal to the sum of the MSE and the variance

σ2of the noise,

i=1

(ai−f(y)i)2=1

i=1

(xi+wi−f(y)i)2

≈1

i=1

(xi−f(y)i)2+1

i=1

i≈MSE + σ2,(2)

because the cross-term

nPn

i=1 wi(xi−f(y)i)2

cancels

out if

and

(and hence

f(yi)

) are independent (and the

mean of the noise is zero).

Approximations to equation 2are used by different unsuper-

vised methods to train neural networks for denoising (Lehti-

nen et al.,2018;Xie et al.,2020;Laine et al.,2019;Huang

et al.,2021). The noise term

nPn

i=1 w2

in equation 2

is not problematic for training denoisers as long as it is

independent from the input

. However, it is deﬁnitely

problematic for evaluating denoisers, as the additional term

would change for different images and datasets, making it

impossible to perform quantitative comparisons. In order to

address this limitation we propose to modify the cost func-

tion to neutralize the noise term. This can be achieved by

using two other noisy references

b:= x+v

and

c:= x+u

which are noisy measurements corresponding to the clean

signal

, but corrupted with different, independent noise re-

alizations

and

(just like

). Subtracting these references

and dividing by two yields an estimate of the noise variance,

i=1

(bi−ci)2

2=1

i=1

(vi−ui)2

≈1

i=1

i+1

i=1

i≈σ2,(3)

which can then be subtracted from equation 2to estimate the

MSE. This yields our proposed unsupervised metric, which

Evaluating Unsupervised Denoising Requires Unsupervised Metrics

Spatial subsampling Difference

Consecutive frames Difference

Figure 2.

Noisy references. The proposed metrics require noisy references corresponding to the same clean image corrupted by

independent noise. These references can be obtained from a single image via spatial subsampling (above) or from consecutive frames

(below). In both cases, there may be small differences in the signal content of the references, shown by the corresponding heatmaps.

we call unsupervised mean squared error (uMSE), depicted

in Figure 1.

Deﬁnition 3.1 (Unsupervised mean squared error).Given a

noisy input signal

y∈Rn

and three noisy references

c∈Rn

the unsupervised mean squared error of a denoiser

f:Rn→Rnis

uMSE := 1

i=1

(ai−f(y)i)2−(bi−ci)2

2.(4)

Theorem 4.2 in Section 4establishes that the uMSE is a con-

sistent estimator of the MSE as long as (1) the noisy input

and the noisy references are independent, (2) their means

equal the corresponding entries of the ground-truth clean

signal, and (3) their higher-order moments are bounded.

These conditions are satisﬁed by most noise models of in-

terest in signal and image processing, such as Poisson shot

noise or additive Gaussian noise. In Section 3.3 we address

the question of how to obtain the noisy references required

to estimate the uMSE. Section Aexplains how to compute

conﬁdence intervals for the uMSE via bootstrapping.

3.2. The Unsupervised Peak Signal-To-Noise Ratio

Peak signal-to-noise ratio (PSNR) is currently the most pop-

ular metric to evaluate denoising quality. It is a logarithmic

function of MSE deﬁned on a decibel scale,

PSNR := 10 log M2

MSE,(5)

where

is a ﬁxed constant representing the maximum

possible value of the signal of interest, which is usually set

equal to 255 for images. Our deﬁnition of uMSE can be

naturally extended to yield an unsupervised PSNR (uPSNR).

Deﬁnition 3.2 (Unsupervised peak signal-to-noise ratio).

Given a noisy input signal

y∈Rn

and three noisy ref-

erences

c∈Rn

the peak signal-to-noise ratio of a

denoiser f:Rn→Rnis

uPSNR := 10 log M2

uMSE,(6)

where

is the maximum possible value of the signal of

interest.

Corollary 4.3 establishes that the uPSNR is a consistent

estimator of the PSNR, under the same conditions that guar-

antee consistency of the uMSE. Section Aexplains how to

compute conﬁdence intervals for the uPSNR via bootstrap-

ping.

3.3. Computing Noisy References In Practice

Our proposed metrics rely on the availability of three noisy

references, which ideally should correspond to the same

clean image contaminated with independent noise. Devi-

ations between the clean signal in each reference violate

Condition 2 in Section 4, and introduce a bias in the metrics.

We propose two approaches to compute the references in

practice, illustrated in Figure 2.

Multiple images: The references can be computed from

consecutive frames acquired within a short time interval.

Evaluating Unsupervised Denoising Requires Unsupervised Metrics

20 pixels 100 pixels 1,000 pixels

Figure 3.

The uMSE is a consistent estimator of the MSE. The histograms at the top show the distribution of the uMSE computed from

pixels (

n∈ {20,100,1000}

) of a natural image corrupted with additive Gaussian noise (

σ= 55

) and denoised via a deep-learning

denoiser (DnCNN). Each point in the histogram corresponds to a different sample of the three noisy references used to compute the uMSE

(

˜ai

and

˜ci

in Eq. 8for

1≤i≤n

), with the same underlying clean pixels. The distributions are centered at the MSE, showing that the

estimator is unbiased (Theorem 4.1), and are well approximated by a Gaussian ﬁt (Theorem 4.4). As the number of pixels

grows, the

standard deviation of the uMSE decreases proportionally to

n−1/2

, and the uMSE converges asymptotically to the MSE (Theorem 4.2), as

depicted in the scatterplot below (αis a constant).

This approach is preferable for datasets where the image

content does not experience rapid dynamic changes from

frame to frame. We apply this approach to the RAW videos

in Section 6, where the content is static.

Single image: The references can be computed from a sin-

gle image via spatial subsampling, as described in Section B.

Section Bshows that this approach is effective as long as

the image content is sufﬁciently smooth with respect to the

pixel resolution. We apply this approach to the electron-

microscopy data in Section 7, where preserving dynamic

content is important.

4. Statistical Properties of the Proposed

Metrics

In this section, we establish that the proposed unsupervised

metrics provide a consistent estimate of the MSE and PSNR.

In our analysis, the ground truth signal or set of signals is

represented as a deterministic vector

x∈Rn

. The corre-

sponding noisy data are also modeled as a deterministic

vector

y∈Rn

that is fed into a denoiser

f:Rn→Rn

to produce the denoised estimate

f(y)

. The MSE of the

estimate is a deterministic quantity equal to

MSE := 1

i=1

SEi,SEi:= (xi−f(y)i)2.(7)

Noise Model. The uMSE estimator in Deﬁnition 3.1 de-

pends on three noisy references

˜a

˜c

, which we model as

random variables.

Our analysis assumes that these random

variables satisfy two conditions:

Condition 1 (independence): The entries of

˜a

˜c

are all

mutually independent.

Condition 2 (centered noise): The mean of the

th entry

˜a

˜c

equals the corresponding entry of the clean signal,

E[˜ai] = E[˜

bi] = E[˜ci] = xi,1≤i≤n.

Two popular noise models that satisfy these conditions are:

•

Additive Gaussian, where

˜ai:= xi+ ˜wi

bi:= xi+ ˜vi

˜ci:= xi+ ˜ui, for i.i.d. Gaussian ˜wi,˜vi,˜ui.

•

Poisson, where

˜ai

˜ci

are i.i.d. Poisson random vari-

ables with parameter xi.

1In our analysis, all random quantities are marked with a tilde

for clarity.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EvaluatingUnsupervisedDenoisingRequiresUnsupervisedMetricsAdri`aMarcosMorales123MatanLeibovich4SreyasMohan1JoshuaLawrenceVincent5PiyushHaluai5MaiTan5PeterCrozier5CarlosFernandez-Granda14AbstractUnsuperviseddenoisingisacrucialchallengeinreal-worldimagingapplications.Unsuperviseddeep-learningmethodsha...

展开>> 收起<<

Evaluating Unsupervised Denoising Requires Unsupervised Metrics.pdf

共21页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Evaluating Unsupervised Denoising Requires Unsupervised Metrics

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: