Evaluating Unsupervised Denoising Requires Unsupervised Metrics

2025-04-24 0 0 7.92MB 21 页 10玖币
侵权投诉
Evaluating Unsupervised Denoising
Requires Unsupervised Metrics
Adri`
a Marcos Morales 123 Matan Leibovich 4Sreyas Mohan 1Joshua Lawrence Vincent 5Piyush Haluai 5
Mai Tan 5Peter Crozier 5Carlos Fernandez-Granda 1 4
Abstract
Unsupervised denoising is a crucial challenge in
real-world imaging applications. Unsupervised
deep-learning methods have demonstrated impres-
sive performance on benchmarks based on syn-
thetic noise. However, no metrics exist to eval-
uate these methods in an unsupervised fashion.
This is highly problematic for the many practi-
cal applications where ground-truth clean images
are not available. In this work, we propose two
novel metrics: the unsupervised mean squared
error (MSE) and the unsupervised peak signal-
to-noise ratio (PSNR), which are computed us-
ing only noisy data. We provide a theoretical
analysis of these metrics, showing that they are
asymptotically consistent estimators of the super-
vised MSE and PSNR. Controlled numerical ex-
periments with synthetic noise confirm that they
provide accurate approximations in practice. We
validate our approach on real-world data from
two imaging modalities: videos in raw format and
transmission electron microscopy. Our results
demonstrate that the proposed metrics enable un-
supervised evaluation of denoising methods based
exclusively on noisy data.
1. Introduction
Image denoising is a fundamental challenge in image and
signal processing, as well as a key preprocessing step for
1
Center for Data Science, New York University, New York,
NY
2
Centre de Formaci
´
o Interdisciplin
`
aria Superior, Universitat
Polit
`
ecnica de Catalunya, Barcelona, Spain
3
Radiomics Group,
Vall d’Hebron Institute of Oncology, Vall d’Hebron Barcelona
Hospital Campus, Barcelona, Spain
4
Courant Institute of Mathe-
matical Sciences, New York University, New York, NY
5
School
for Engineering of Matter, Transport & Energy, Arizona State Uni-
versity, Tempe, AZ. Correspondence to: Adri
`
a Marcos Morales
<adriamm98@gmail.com>.
Proceedings of the
40 th
International Conference on Machine
Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright
2023 by the author(s).
computer vision tasks. Convolutional neural networks
achieve state-of-the-art performance for this problem, when
trained using databases of clean images corrupted with sim-
ulated noise (Zhang et al.,2017a). However, in real-world
imaging applications such as microscopy, noiseless ground
truth videos are often not available. This has motivated the
development of unsupervised denoising approaches that can
be trained using only noisy measurements (Lehtinen et al.,
2018;Xie et al.,2020;Laine et al.,2019;Sheth et al.,2021;
Huang et al.,2021). These methods have demonstrated
impressive performance on natural-image benchmarks, es-
sentially on par with the supervised state of the art. However,
to the best of our knowledge, no unsupervised metrics are
currently available to evaluate them using only noisy data.
Reliance on supervised metrics makes it very challenging to
create benchmark datasets using real-world measurements,
because obtaining the ground-truth clean images required by
these metrics is often either impossible or very constraining.
In practice, clean images are typically estimated through
temporal averaging, which suppresses dynamic information
that is often crucial in scientific applications. Consequently,
quantitative evaluation of unsupervised denoising methods
is currently almost completely dominated by natural image
benchmark datasets with simulated noise (Lehtinen et al.,
2018;Xie et al.,2020;Laine et al.,2019;Sheth et al.,2021;
Huang et al.,2021), which are not always representative of
the signal and noise characteristics that arise in real-world
imaging applications.
The lack of unsupervised metrics also limits the applica-
tion of unsupervised denoising techniques in practice. In
the absence of quantitative metrics, domain scientists must
often rely on visual inspection to evaluate performance on
real measurements. This is particularly restrictive for deep-
learning approaches, because it makes it impossible to per-
form systematic hyperparameter optimization and model
selection on the data of interest.
In this work, we propose two novel unsupervised metrics
to address these issues: the unsupervised mean-squared er-
ror (uMSE) and the unsupervised peak signal-to-noise ratio
(uPSNR), which are computed exclusively from noisy data.
These metrics build upon existing unsupervised denoising
1
arXiv:2210.05553v3 [cs.CV] 30 May 2023
Evaluating Unsupervised Denoising Requires Unsupervised Metrics
methods, which minimize an unsupervised cost function
equal to the difference between the denoised estimate and
additional noisy copies of the signal of interest (Lehtinen
et al.,2018). The uMSE is equal to this cost function mod-
ified with a correction term, which renders it an unbiased
estimator of the supervised MSE.
We provide a theoretical analysis of the uMSE and uPSNR,
proving that they are asymptotically consistent estimators of
the supervised MSE and PSNR respectively. Controlled ex-
periments on supervised benchmarks, where the true MSE
and PSNR can be computed exactly, confirm that the uMSE
and uPSNR provide accurate approximations. In addition,
we validate the metrics on video data in RAW format, con-
taminated with real noise that does not follow a known
predefined model.
In order to illustrate the potential impact of the pro-
posed metrics on imaging applications where no ground-
truth is available, we apply them to transmission-electron-
microscopy (TEM) data. Recent advances in direct electron
detection systems make it possible for experimentalists to
acquire highly time-resolved movies of dynamic events at
frame rates in the kilohertz range (Faruqi & McMullan,
2018;Ercius et al.,2020), which is critical to advance our
understanding of functional materials. Acquisition at such
high temporal resolution results in severe degradation by
shot noise. We show that unsupervised methods based on
deep learning can be effective in removing this noise, and
that our proposed metrics can be used to evaluate their per-
formance quantitatively using only noisy data.
To summarize, our contributions are (1) two novel unsu-
pervised metrics presented in Section 3, (2) a theoretical
analysis providing an asymptotic characterization of their
statistical properties (Section 4), (3) experiments showing
the accuracy of the metrics in a controlled situation where
ground-truth clean images are available (Section 5), (4) val-
idation on real-world videos in RAW format (Section 6),
and (5) an application to a real-world electron-microscopy
dataset, which illustrates the challenges of unsupervised
denoising in scientific imaging (Section 7).
Code to reproduce all computational experiments is avail-
able at https://github.com/adriamm98/umse
2. Background and Related work
Unsupervised denoising The past few years have seen
ground-breaking progress in unsupervised denoising, pio-
neered by Noise2Noise, a technique where a neural network
is trained on pairs of noisy images (Lehtinen et al.,2018).
Our unsupervised metrics are inspired by Noise2Noise,
which optimizes a cost function equal to our proposed
unsupervised MSE, but without a correction term (which
is not needed for training models). Subsequent work fo-
cused on performing unsupervised denoising from single
images using variations of the blind-spot method, where
a model is trained to estimate each noisy pixel value us-
ing its neighborhood but not the noisy pixel itself (to avoid
the trivial identity solution) (Krull et al.,2019;Laine et al.,
2019;Batson & Royer,2019a;Sheth et al.,2021;Xie et al.,
2020). More recently, Neighbor2Neighbor revisited the
Noise2Noise method, generating noisy image pairs from a
single noisy image via spatial subsampling (Huang et al.,
2021), an insight that can also be leveraged in combination
with our proposed metrics, as explained in Section B. Our
contribution with respect to these methods is a novel un-
supervised metric that can be used for evaluation, as it is
designed to be an unbiased and consistent estimator of the
MSE.
Stein’s unbiased risk estimator (SURE) provides an
asymptotically unbiased estimator of the MSE for i.i.d.
Gaussian noise (Donoho & Johnstone,1995). This cost
function has been used for training unsupervised denois-
ers (Metzler et al.,2018;Soltanayev & Chun,2018;Zhussip
et al.,2019;Mohan et al.,2021). In principle, SURE could
be used to compute the MSE for evaluation, but it has certain
limitations: (1) a closed form expression of the noise likeli-
hood is required, including the value of the noise parameters
(for example, this is not known for the real-world datasets in
Sections 6and 7), (2) computing SURE requires approximat-
ing the divergence of a denoiser (usually via Monte Carlo
methods (Ramani et al.,2008)), which is computationally
very expensive. Developing practical unsupervised metrics
based on SURE and studying their theoretical properties is
an interesting direction for future research.
Existing evaluation approaches In the literature, quanti-
tative evaluation of unsupervised denoising techniques has
mostly relied on images and videos corrupted with synthetic
noise (Lehtinen et al.,2018;Krull et al.,2019;Laine et al.,
2019;Batson & Royer,2019a;Sheth et al.,2021;Xie et al.,
2020). Recently, a few datasets containing real noisy data
have been created (Abdelhamed et al.,2018;Plotz & Roth,
2017;Xu et al.,2018;Zhang et al.,2019). Evaluation on
these datasets is based on supervised MSE and PSNR com-
puted from estimated clean images obtained by averaging
multiple noisy frames. Unfortunately, as a result, the metrics
cannot capture dynamically-changing features, which are
of interest in many applied domains. In addition, unless the
signal-to-noise ratio is quite high, it is necessary to average
over a large number of frames to approximate the MSE. For
example, as explained in Section D, for an image corrupted
by additive Gaussian noise with standard deviation σ= 15
we need to average
>1500
noisy images to achieve the
same approximation accuracy as our proposed approach
(see Figure 10), which only requires 3 noisy images, and
can also be computed from a single noisy image.
2
Evaluating Unsupervised Denoising Requires Unsupervised Metrics
Figure 1. MSE vs uMSE. The traditional supervised mean squared error (MSE) is computed by comparing the denoised estimate to the
clean ground truth (left). The proposed unsupervised MSE is computed only from noisy data, via comparison with a noisy reference
corresponding to the same ground-truth but corrupted with independent noise (right). A correction term based on two additional noisy
references debiases the estimator.
Noise-Level Estimation. The correction term in uMSE can
be interpreted as an estimate of the noise level, obtained by
cancelling out the clean signal. In this sense, it is related
to noise-level estimation methods (Liu et al.,2013;Lebrun
et al.,2015;Arias & Morel,2018). However, unlike uMSE,
these methods typically assume a parametric model for the
noise, and are not used for evaluation.
No-reference image quality assessment methods evaluate
the perceptual quality of an image (Li,2002;Mittal et al.,
2012), but not whether it is consistent with an underlying
ground-truth corresponding to the observed noisy measure-
ments, which is the goal of our proposed metrics.
3. Unsupervised Metrics For Unsupervised
Denoising
3.1. The Unsupervised Mean Squared Error
The goal of denoising is to estimate a clean signal from noisy
measurements. Let
xRn
be a signal or a set of signals
with
n
total entries. We denote the corresponding noisy data
by
yRn
. A denoiser
f:RnRn
is a function that
maps the input
y
to an estimate of
x
. A common metric to
evaluate the quality of a denoiser is the mean squared error
between the clean signal and the estimate,
MSE := 1
n
n
X
i=1
(xif(y)i)2.(1)
Unfortunately, in most real-world scenarios clean ground-
truth signals are not available and evaluation can only be
carried out in an unsupervised fashion, i.e. exclusively from
the noisy measurements. In this section we propose an
unsupervised estimator of MSE inspired by recent advances
in unsupervised denoising (Lehtinen et al.,2018). The key
idea is to compare the denoised signal to a noisy reference,
which corresponds to the same clean signal corrupted by
independent noise.
In order to motivate our approach, let us assume that the
noise is additive, so that
y:= x+z
for a zero-mean noise
vector
zRn
. Imagine that we have access to a noisy
reference
a:= x+w
corresponding to the same underlying
signal
x
, but corrupted with a different noise realization
wRn
independent from
z
(Section 3.3 explains how to
obtain such references in practice). The mean squared dif-
ference between the denoised estimate and the reference is
approximately equal to the sum of the MSE and the variance
σ2of the noise,
1
n
n
X
i=1
(aif(y)i)2=1
n
n
X
i=1
(xi+wif(y)i)2
1
n
n
X
i=1
(xif(y)i)2+1
n
n
X
i=1
w2
iMSE + σ2,(2)
because the cross-term
1
nPn
i=1 wi(xif(y)i)2
cancels
out if
wi
and
yi
(and hence
f(yi)
) are independent (and the
mean of the noise is zero).
Approximations to equation 2are used by different unsuper-
vised methods to train neural networks for denoising (Lehti-
nen et al.,2018;Xie et al.,2020;Laine et al.,2019;Huang
et al.,2021). The noise term
1
nPn
i=1 w2
i
in equation 2
is not problematic for training denoisers as long as it is
independent from the input
y
. However, it is definitely
problematic for evaluating denoisers, as the additional term
would change for different images and datasets, making it
impossible to perform quantitative comparisons. In order to
address this limitation we propose to modify the cost func-
tion to neutralize the noise term. This can be achieved by
using two other noisy references
b:= x+v
and
c:= x+u
,
which are noisy measurements corresponding to the clean
signal
x
, but corrupted with different, independent noise re-
alizations
v
and
u
(just like
a
). Subtracting these references
and dividing by two yields an estimate of the noise variance,
1
n
n
X
i=1
(bici)2
2=1
n
n
X
i=1
(viui)2
2
1
2n
n
X
i=1
v2
i+1
2n
n
X
i=1
u2
iσ2,(3)
which can then be subtracted from equation 2to estimate the
MSE. This yields our proposed unsupervised metric, which
3
Evaluating Unsupervised Denoising Requires Unsupervised Metrics
Spatial subsampling Difference
S
S
Consecutive frames Difference
Figure 2.
Noisy references. The proposed metrics require noisy references corresponding to the same clean image corrupted by
independent noise. These references can be obtained from a single image via spatial subsampling (above) or from consecutive frames
(below). In both cases, there may be small differences in the signal content of the references, shown by the corresponding heatmaps.
we call unsupervised mean squared error (uMSE), depicted
in Figure 1.
Definition 3.1 (Unsupervised mean squared error).Given a
noisy input signal
yRn
and three noisy references
a
,
b
,
cRn
the unsupervised mean squared error of a denoiser
f:RnRnis
uMSE := 1
n
n
X
i=1
(aif(y)i)2(bici)2
2.(4)
Theorem 4.2 in Section 4establishes that the uMSE is a con-
sistent estimator of the MSE as long as (1) the noisy input
and the noisy references are independent, (2) their means
equal the corresponding entries of the ground-truth clean
signal, and (3) their higher-order moments are bounded.
These conditions are satisfied by most noise models of in-
terest in signal and image processing, such as Poisson shot
noise or additive Gaussian noise. In Section 3.3 we address
the question of how to obtain the noisy references required
to estimate the uMSE. Section Aexplains how to compute
confidence intervals for the uMSE via bootstrapping.
3.2. The Unsupervised Peak Signal-To-Noise Ratio
Peak signal-to-noise ratio (PSNR) is currently the most pop-
ular metric to evaluate denoising quality. It is a logarithmic
function of MSE defined on a decibel scale,
PSNR := 10 log M2
MSE,(5)
where
M
is a fixed constant representing the maximum
possible value of the signal of interest, which is usually set
equal to 255 for images. Our definition of uMSE can be
naturally extended to yield an unsupervised PSNR (uPSNR).
Definition 3.2 (Unsupervised peak signal-to-noise ratio).
Given a noisy input signal
yRn
and three noisy ref-
erences
a
,
b
,
cRn
the peak signal-to-noise ratio of a
denoiser f:RnRnis
uPSNR := 10 log M2
uMSE,(6)
where
M
is the maximum possible value of the signal of
interest.
Corollary 4.3 establishes that the uPSNR is a consistent
estimator of the PSNR, under the same conditions that guar-
antee consistency of the uMSE. Section Aexplains how to
compute confidence intervals for the uPSNR via bootstrap-
ping.
3.3. Computing Noisy References In Practice
Our proposed metrics rely on the availability of three noisy
references, which ideally should correspond to the same
clean image contaminated with independent noise. Devi-
ations between the clean signal in each reference violate
Condition 2 in Section 4, and introduce a bias in the metrics.
We propose two approaches to compute the references in
practice, illustrated in Figure 2.
Multiple images: The references can be computed from
consecutive frames acquired within a short time interval.
4
Evaluating Unsupervised Denoising Requires Unsupervised Metrics
20 pixels 100 pixels 1,000 pixels
Figure 3.
The uMSE is a consistent estimator of the MSE. The histograms at the top show the distribution of the uMSE computed from
n
pixels (
n∈ {20,100,1000}
) of a natural image corrupted with additive Gaussian noise (
σ= 55
) and denoised via a deep-learning
denoiser (DnCNN). Each point in the histogram corresponds to a different sample of the three noisy references used to compute the uMSE
(
˜ai
,
˜
bi
and
˜ci
in Eq. 8for
1in
), with the same underlying clean pixels. The distributions are centered at the MSE, showing that the
estimator is unbiased (Theorem 4.1), and are well approximated by a Gaussian fit (Theorem 4.4). As the number of pixels
n
grows, the
standard deviation of the uMSE decreases proportionally to
n1/2
, and the uMSE converges asymptotically to the MSE (Theorem 4.2), as
depicted in the scatterplot below (αis a constant).
This approach is preferable for datasets where the image
content does not experience rapid dynamic changes from
frame to frame. We apply this approach to the RAW videos
in Section 6, where the content is static.
Single image: The references can be computed from a sin-
gle image via spatial subsampling, as described in Section B.
Section Bshows that this approach is effective as long as
the image content is sufficiently smooth with respect to the
pixel resolution. We apply this approach to the electron-
microscopy data in Section 7, where preserving dynamic
content is important.
4. Statistical Properties of the Proposed
Metrics
In this section, we establish that the proposed unsupervised
metrics provide a consistent estimate of the MSE and PSNR.
In our analysis, the ground truth signal or set of signals is
represented as a deterministic vector
xRn
. The corre-
sponding noisy data are also modeled as a deterministic
vector
yRn
that is fed into a denoiser
f:RnRn
to produce the denoised estimate
f(y)
. The MSE of the
estimate is a deterministic quantity equal to
MSE := 1
n
n
X
i=1
SEi,SEi:= (xif(y)i)2.(7)
Noise Model. The uMSE estimator in Definition 3.1 de-
pends on three noisy references
˜a
,
˜
b
,
˜c
, which we model as
random variables.
1
Our analysis assumes that these random
variables satisfy two conditions:
Condition 1 (independence): The entries of
˜a
,
˜
b
,
˜c
are all
mutually independent.
Condition 2 (centered noise): The mean of the
i
th entry
of
˜a
,
˜
b
,
˜c
equals the corresponding entry of the clean signal,
Eai] = E[˜
bi] = Eci] = xi,1in.
Two popular noise models that satisfy these conditions are:
Additive Gaussian, where
˜ai:= xi+ ˜wi
,
˜
bi:= xi+ ˜vi
,
˜ci:= xi+ ˜ui, for i.i.d. Gaussian ˜wi,˜vi,˜ui.
Poisson, where
˜ai
,
˜
bi
,
˜ci
are i.i.d. Poisson random vari-
ables with parameter xi.
1In our analysis, all random quantities are marked with a tilde
for clarity.
5
摘要:

EvaluatingUnsupervisedDenoisingRequiresUnsupervisedMetricsAdri`aMarcosMorales123MatanLeibovich4SreyasMohan1JoshuaLawrenceVincent5PiyushHaluai5MaiTan5PeterCrozier5CarlosFernandez-Granda14AbstractUnsuperviseddenoisingisacrucialchallengeinreal-worldimagingapplications.Unsuperviseddeep-learningmethodsha...

展开>> 收起<<
Evaluating Unsupervised Denoising Requires Unsupervised Metrics.pdf

共21页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:21 页 大小:7.92MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 21
客服
关注