
Evaluating the Robustness of Real-World Super Resolution
LR images for testing because no ground-truth is available.
These measures are objective and do not take any subjective
evaluation of human perception. We will thus use other
perceptual-based metrics for qualitative evaluation on our
dataset which consists of real-world images from a variety
of sources.
Additionally, we note that there is a lack of datasets that con-
tain a wide variety of naturally downsampled, real images
from different types of sources. Having a wide variety of
sources will help obtain more accurate measurements of gen-
eralisability of real-world super-resolution models. Popular
existing datasets such as DPED and RealSR do not achieve
this. DPED contains images taken from 3 smartphone cam-
eras, which together might have different noises compared
to TV streams, CCTV footage and satellite images. RealSR
contains images taken from the same camera and only helps
evaluate the performance of models on the particular down-
sampling method of such cameras. In this paper, we aim to
introduce a carefully curated dataset, WideRealSR, that can
be used as a test set for thoroughly evaluating real-world
performance.
In short, the goal of this paper can be summed as the follow-
ing:
1.
Evaluate the generalisability of various supervised and
unsupervised super-resolution models,
2.
Identify the reasons why the models perform poorly or
well, and
3.
Investigate a method to devise a practical solution to
potentially alleviate the generalization problem.
2. Dataset and task
As mentioned in the previous section, most of the existing
datasets used in the real-world super-resolution field lack a
diverse range of sensor noises. The DPED dataset (Ignatov
et al.,2017a), for example, only contains images from 3
different smartphone cameras - iPhone 3GS, BlackBerry
Passport, and Sony Xperia Z - as the Low-Resolution images.
The respective High-Resolution counterparts are captured
using a Canon 70D DSLR. This dataset provided support for
the authors to present an end-to-end deep learning approach
that bridges the gap between ordinary photos into higher-
quality DSLR-like images. The authors have proposed to
learn the translation function using a residual convolutional
neural network that improves both color rendition and image
sharpness (Ignatov et al.,2017b). Since its release, the
DPED dataset has been the base dataset for numerous super-
resolution model proposals, including the award-winning
RealSR model.
A popular alternative to the DPED dataset is DIV2K
(Agustsson & Timofte,2017), containing 800 high-
resolution images and their corresponding low-resolution
images that are obtained artificially through a variety of
downsampling methods. Recent work has emerged (Cai
et al.,2019b) in the super-resolution field which aim to
identify the best image downgrading methods that best gen-
eralise to images in the real world, i.e. images with unknown
sensor noises. The DIV2K dataset is commonly used in this
particular field of super-resolution.
To evaluate the performance of existing super-resolution
models, it is not sufficient to sample from a limited num-
ber of sensor noises and downsampling techniques. This
is because, as mentioned previously, images in the ‘real-
world’ inherently come with arbitrary kernels and noises.
Thus, following this limitation, we decided to meticulously
curate a dataset to be used for evaluating several super-
resolution models. We have obtained 1-3 images for 35
different sensor noises, from sources such as (but not lim-
ited to) Google Maps, satellites, drones, microscopes, smart-
phones (iPhone, BlackBerry, Samsung Galaxy), WhatsApp,
Facebook, tablets (iPad, Samsung Galaxy Tab), BBC broad-
casts. To maintain the ‘real-world’ factor, we decided to do
minimal preprocessing and only cropping the image when
necessary. We call this dataset WideRealSR. We have dis-
played sample images from our dataset collected from 10
different sources in Figure 1.
The main task in this paper is to perform real-world super-
resolution using state-of-the-art models on images obtained
with a diversity of sensor noises. From this, we will be able
to identify which models generalise best, and thus performs
real-world super resolution to a satisfactory level. Due to the
lack of ‘ground-truth’ high-resolution images, we will not
be able to use quantitative metrics such as PSNR to evaluate
the performance of the models. Instead, we decided to
perform an extensive user study to obtain a human rating on
the super-resolved images, which was also done for (Rad
et al.,2021). The details of the survey will be discussed in
the proceeding sections.
3. Methodology
3.1. Real-World Super Resolution Methods
RealSR
The authors of the RealSR paper (Cai et al.,2019a) propose
a novel degradation framework based on kernel estimation
and noise injection. The method they propose is mainly
divided into two stages; the first stage is to estimate the
degradation from real data and generate realistic LR images,
the second is to train the SR model based on the constructed
data. They take two sets of images as input: a real-world
image set
X
, and an HR image set
Y
. They estimate the ker-
nel (using KernelGAN (Bell-Kligler et al.,2019)) and noise