
Evaluating Unsupervised Denoising Requires Unsupervised Metrics
methods, which minimize an unsupervised cost function
equal to the difference between the denoised estimate and
additional noisy copies of the signal of interest (Lehtinen
et al.,2018). The uMSE is equal to this cost function mod-
ified with a correction term, which renders it an unbiased
estimator of the supervised MSE.
We provide a theoretical analysis of the uMSE and uPSNR,
proving that they are asymptotically consistent estimators of
the supervised MSE and PSNR respectively. Controlled ex-
periments on supervised benchmarks, where the true MSE
and PSNR can be computed exactly, confirm that the uMSE
and uPSNR provide accurate approximations. In addition,
we validate the metrics on video data in RAW format, con-
taminated with real noise that does not follow a known
predefined model.
In order to illustrate the potential impact of the pro-
posed metrics on imaging applications where no ground-
truth is available, we apply them to transmission-electron-
microscopy (TEM) data. Recent advances in direct electron
detection systems make it possible for experimentalists to
acquire highly time-resolved movies of dynamic events at
frame rates in the kilohertz range (Faruqi & McMullan,
2018;Ercius et al.,2020), which is critical to advance our
understanding of functional materials. Acquisition at such
high temporal resolution results in severe degradation by
shot noise. We show that unsupervised methods based on
deep learning can be effective in removing this noise, and
that our proposed metrics can be used to evaluate their per-
formance quantitatively using only noisy data.
To summarize, our contributions are (1) two novel unsu-
pervised metrics presented in Section 3, (2) a theoretical
analysis providing an asymptotic characterization of their
statistical properties (Section 4), (3) experiments showing
the accuracy of the metrics in a controlled situation where
ground-truth clean images are available (Section 5), (4) val-
idation on real-world videos in RAW format (Section 6),
and (5) an application to a real-world electron-microscopy
dataset, which illustrates the challenges of unsupervised
denoising in scientific imaging (Section 7).
Code to reproduce all computational experiments is avail-
able at https://github.com/adriamm98/umse
2. Background and Related work
Unsupervised denoising The past few years have seen
ground-breaking progress in unsupervised denoising, pio-
neered by Noise2Noise, a technique where a neural network
is trained on pairs of noisy images (Lehtinen et al.,2018).
Our unsupervised metrics are inspired by Noise2Noise,
which optimizes a cost function equal to our proposed
unsupervised MSE, but without a correction term (which
is not needed for training models). Subsequent work fo-
cused on performing unsupervised denoising from single
images using variations of the blind-spot method, where
a model is trained to estimate each noisy pixel value us-
ing its neighborhood but not the noisy pixel itself (to avoid
the trivial identity solution) (Krull et al.,2019;Laine et al.,
2019;Batson & Royer,2019a;Sheth et al.,2021;Xie et al.,
2020). More recently, Neighbor2Neighbor revisited the
Noise2Noise method, generating noisy image pairs from a
single noisy image via spatial subsampling (Huang et al.,
2021), an insight that can also be leveraged in combination
with our proposed metrics, as explained in Section B. Our
contribution with respect to these methods is a novel un-
supervised metric that can be used for evaluation, as it is
designed to be an unbiased and consistent estimator of the
MSE.
Stein’s unbiased risk estimator (SURE) provides an
asymptotically unbiased estimator of the MSE for i.i.d.
Gaussian noise (Donoho & Johnstone,1995). This cost
function has been used for training unsupervised denois-
ers (Metzler et al.,2018;Soltanayev & Chun,2018;Zhussip
et al.,2019;Mohan et al.,2021). In principle, SURE could
be used to compute the MSE for evaluation, but it has certain
limitations: (1) a closed form expression of the noise likeli-
hood is required, including the value of the noise parameters
(for example, this is not known for the real-world datasets in
Sections 6and 7), (2) computing SURE requires approximat-
ing the divergence of a denoiser (usually via Monte Carlo
methods (Ramani et al.,2008)), which is computationally
very expensive. Developing practical unsupervised metrics
based on SURE and studying their theoretical properties is
an interesting direction for future research.
Existing evaluation approaches In the literature, quanti-
tative evaluation of unsupervised denoising techniques has
mostly relied on images and videos corrupted with synthetic
noise (Lehtinen et al.,2018;Krull et al.,2019;Laine et al.,
2019;Batson & Royer,2019a;Sheth et al.,2021;Xie et al.,
2020). Recently, a few datasets containing real noisy data
have been created (Abdelhamed et al.,2018;Plotz & Roth,
2017;Xu et al.,2018;Zhang et al.,2019). Evaluation on
these datasets is based on supervised MSE and PSNR com-
puted from estimated clean images obtained by averaging
multiple noisy frames. Unfortunately, as a result, the metrics
cannot capture dynamically-changing features, which are
of interest in many applied domains. In addition, unless the
signal-to-noise ratio is quite high, it is necessary to average
over a large number of frames to approximate the MSE. For
example, as explained in Section D, for an image corrupted
by additive Gaussian noise with standard deviation σ= 15
we need to average
>1500
noisy images to achieve the
same approximation accuracy as our proposed approach
(see Figure 10), which only requires 3 noisy images, and
can also be computed from a single noisy image.
2