switch to deep learning based image restoration algorithms,
as these methods have proven to be very powerful to gener-
alize priors from a large number of images.
Unfortunately, despite the great advances and perfor-
mance, research on image restoration and enhancement us-
ing deep learning usually forgets the previous defined need
for obtaining algorithms that have low computational com-
plexity and execution time; and therefore many of them can-
not be integrated into modern smartphones due to their com-
plexity i.e. FLOPs or memory requirements.
In this paper, we therefore aim at defining a new image
enhancement algorithm that achieves competitive results in
comparison with state-of-the-art methods on different re-
lated tasks, yet, at the same time, presents a low-complexity
and a competitive execution time in current off-the-shelf
smartphones as proven by the use of the AIScore [32]. A
first example of this behaviour is shown in Figure 1, where
we compare our method to the current state-of-the-art in im-
age denoising. As it can be seen, our methods is as close
as 0.02 in SSIM to the state-of-the-art, while having at least
×50 less MACs. More details will appear later in Section 5.
In summary, our contributions are as follows:
• We propose a lightweight U-Net based architecture
characterized by the inverted residual attention (IRA)
block. Similar to contemporary works in this field, yet
more efficient and smaller.
• We optimize our model in terms of used parameters
and computational cost (i.e. FLOPs, MACs), thus be-
ing able to achieve real-time performance on current
smartphone GPUs at FullHD input image resolution.
This improvement is illustrated in Figure 1.
• We propose a new type of analysis, from a production
point of view, observing the behaviour of our model
when deployed on commercial smartphones.
2. Related Work
Image restoration is split in a large number of sub-
problems, and in this paper we focus on four of the most
popular in current research: image denoising, image de-
blurring, HDR image reconstruction from a single image,
and Under-Display-Camera (UDC) image restoration.
Image denoising Image denoising has been a topic of re-
search for more than 30 years. The most famous traditional
image denoising methods are the non-local ones, such as
Non-Local-Means [8] and BM3D [19]. More recently, mul-
tiple methods have studied different image representations
to facilitate the denoising problem for this well-behaved al-
gorithms [25,52].
As in other image restoration problems, research on im-
age denoising has moved towards deep learning models.
The first remarkable work on denoising with deep learning
is probably Zhang et al. [67] DnCNN, where they proposed
to learn a CNN to estimate the noise distribution of the input
image. Since then, plenty of other deep learning methods
have appeared [4,11,29,36,55,61,62,64,65]. This has also
been possible thanks to the appearance of challenges such
as [1,2] that facilitate a benchmark for compering meth-
ods together with training and testing images. For a deeper
analysis we refer the reader to the survey in [50].
Image deblurring Image deblurring is a traditional prob-
lem in image restoration. Its main objective is to remove the
blur that appears in the input image, this can be caused by
different factors (i.e. camera shake, object motion, or lack
of focus) and output a sharp image. As it is the case for im-
age denoising, initially the different algorithms were based
on hand-made priors or constraints, mostly treating image
restoration as an inverse filtering problem [15,24,58], but
these methods were surpassed with the appearance of deep
learning models [16,35,41,49,51]. For a more in-depth
analysis of deep learning methods applied to this problem,
we point the reader to the survey in [66]
HDR reconstruction Digital cameras can only capture
around two orders of magnitude in luminance, and have
therefore a very limited Dynamic Range. This is far beyond
the luminance differences that appear in the real world, and
the reason why, when capturing images, we may end up
with highlights in very bright regions or information loss in
very dark ones.
The first works aiming at recovering the dynamic range
of images were based on a set of multiple images from the
same scene. The seminar work of Debevek and Malik [20]
assumed that from a set of multiples images of the same
scene, it is possible to recover a single Camera Response
Function (CRF) and undo the camera process. This hypoth-
esis was true for film cameras, but as recently proved by Gil
Rodriguez et al. [45] this is not true for current digital cam-
eras, as color channels are not independent and the camera
modifies the non-linearity for different exposure values.
Currently, thanks to deep learning and different bench-
mark challenges [44], there has been a surge of methods
aiming at recovering the full High Dynamic Range (HDR)
from a single input image, a problem also named inverse
tone mapping [5,6]. This rising started with the work of
Eilersten et al. [21], where they proposed a U-Net architec-
ture for solving this problem. Other deep learning based
methods for HDR single-image reconstruction are the ones
in [37,48]. Regarding deep learning methods for multiple
images input, we should also mention the work in [10].