Perceptual Image Enhancement for Smartphone Real-Time Applications Marcos V . Conde1 Florin Vasluianu1 Javier Vazquez-Corral2 Radu Timofte1 1Computer Vision Lab CAIDAS IFI University of W urzburg Germany

2025-05-02 0 0 7.85MB 11 页 10玖币
侵权投诉
Perceptual Image Enhancement for Smartphone Real-Time Applications
Marcos V. Conde 1, Florin Vasluianu 1, Javier Vazquez-Corral 2, Radu Timofte 1
1Computer Vision Lab, CAIDAS & IFI, University of W¨
urzburg, Germany
2Computer Vision Center and Computer Science Dept., Universitat Aut`
onoma de Barcelona, Spain
{marcos.conde-osorio,radu.timofte}@uni-wuerzburg.de
https://github.com/mv-lab/AISP
Abstract
Recent advances in camera designs and imaging
pipelines allow us to capture high-quality images using
smartphones. However, due to the small size and lens lim-
itations of the smartphone cameras, we commonly find ar-
tifacts or degradation in the processed images. The most
common unpleasant effects are noise artifacts, diffraction
artifacts, blur, and HDR overexposure. Deep learning meth-
ods for image restoration can successfully remove these ar-
tifacts. However, most approaches are not suitable for real-
time applications on mobile devices due to their heavy com-
putation and memory requirements.
In this paper, we propose LPIENet, a lightweight net-
work for perceptual image enhancement, with the focus on
deploying it on smartphones. Our experiments show that,
with much fewer parameters and operations, our model
can deal with the mentioned artifacts and achieve compet-
itive performance compared with state-of-the-art methods
on standard benchmarks. Moreover, to prove the efficiency
and reliability of our approach, we deployed the model di-
rectly on commercial smartphones and evaluated its perfor-
mance. Our model can process 2K resolution images under
1 second in mid-level commercial smartphones.
1. Introduction
In recent years the number of images that are captured
has increased exponentially. The main reason for this surge
comes from the ubiquitous presence of smartphones in our
daily life. Phone manufacturers are continuously competing
with the goal of delivering better images to their consumers
in order to increase their sales. Therefore, a lot of research
has been focused on improving the perceptual quality of
these sRGB images.
Image restoration aims at improving the images captured
by the cameras by removing different degradations intro-
duced during image acquisition. These degradations can be
introduced due to the physical limitations of cameras, for
50x less MACs
Figure 1. Comparison of computational cost and performance of
state-of-the-art methods for image denoising (SIDD) [2,12,51,54,
62]. We can process 2K resolution images in 0.4s, and 4K images
in 1.5 seconds on regular smartphone GPUs.
example the small aperture and limited dynamic range of
smartphone cameras [64], or by inappropriate lighting con-
ditions (i.e. images captured in low-light). To solve these
problems, image restoration is usually understood as an ill-
posed problem, in which, given the degraded image the al-
gorithm needs to output a clean image.
To be embedded in-camera by a manufacturer, an image
restoration algorithm should comply with strong require-
ments in terms of quality, robustness, computational com-
plexity, and execution time. In general, digital cameras have
a set of resources in which to allocate all the operations in
the ISP pipeline [18]. Therefore, any new operation to be
introduced in this pipeline should be of good enough qual-
ity to “pay” for the resources it will consume. Moreover,
for an algorithm to be embedded in a camera, it is required
to always improve over the input image, i.e. to be robust for
any possible circumstance and input signal.
Image restoration is a traditional problem, and its study
began as soon as we started to capture images, and many fa-
mous methods, such as Non-local Means for image denois-
ing [8], are almost 20 years old. These traditional methods
were usually defined by hand-crafted priors that narrowed
the ill-posed nature of the problems by reducing the set of
plausible solutions. However, since 2012 there has been a
arXiv:2210.13552v2 [cs.CV] 23 Nov 2023
switch to deep learning based image restoration algorithms,
as these methods have proven to be very powerful to gener-
alize priors from a large number of images.
Unfortunately, despite the great advances and perfor-
mance, research on image restoration and enhancement us-
ing deep learning usually forgets the previous defined need
for obtaining algorithms that have low computational com-
plexity and execution time; and therefore many of them can-
not be integrated into modern smartphones due to their com-
plexity i.e. FLOPs or memory requirements.
In this paper, we therefore aim at defining a new image
enhancement algorithm that achieves competitive results in
comparison with state-of-the-art methods on different re-
lated tasks, yet, at the same time, presents a low-complexity
and a competitive execution time in current off-the-shelf
smartphones as proven by the use of the AIScore [32]. A
first example of this behaviour is shown in Figure 1, where
we compare our method to the current state-of-the-art in im-
age denoising. As it can be seen, our methods is as close
as 0.02 in SSIM to the state-of-the-art, while having at least
×50 less MACs. More details will appear later in Section 5.
In summary, our contributions are as follows:
We propose a lightweight U-Net based architecture
characterized by the inverted residual attention (IRA)
block. Similar to contemporary works in this field, yet
more efficient and smaller.
We optimize our model in terms of used parameters
and computational cost (i.e. FLOPs, MACs), thus be-
ing able to achieve real-time performance on current
smartphone GPUs at FullHD input image resolution.
This improvement is illustrated in Figure 1.
We propose a new type of analysis, from a production
point of view, observing the behaviour of our model
when deployed on commercial smartphones.
2. Related Work
Image restoration is split in a large number of sub-
problems, and in this paper we focus on four of the most
popular in current research: image denoising, image de-
blurring, HDR image reconstruction from a single image,
and Under-Display-Camera (UDC) image restoration.
Image denoising Image denoising has been a topic of re-
search for more than 30 years. The most famous traditional
image denoising methods are the non-local ones, such as
Non-Local-Means [8] and BM3D [19]. More recently, mul-
tiple methods have studied different image representations
to facilitate the denoising problem for this well-behaved al-
gorithms [25,52].
As in other image restoration problems, research on im-
age denoising has moved towards deep learning models.
The first remarkable work on denoising with deep learning
is probably Zhang et al. [67] DnCNN, where they proposed
to learn a CNN to estimate the noise distribution of the input
image. Since then, plenty of other deep learning methods
have appeared [4,11,29,36,55,61,62,64,65]. This has also
been possible thanks to the appearance of challenges such
as [1,2] that facilitate a benchmark for compering meth-
ods together with training and testing images. For a deeper
analysis we refer the reader to the survey in [50].
Image deblurring Image deblurring is a traditional prob-
lem in image restoration. Its main objective is to remove the
blur that appears in the input image, this can be caused by
different factors (i.e. camera shake, object motion, or lack
of focus) and output a sharp image. As it is the case for im-
age denoising, initially the different algorithms were based
on hand-made priors or constraints, mostly treating image
restoration as an inverse filtering problem [15,24,58], but
these methods were surpassed with the appearance of deep
learning models [16,35,41,49,51]. For a more in-depth
analysis of deep learning methods applied to this problem,
we point the reader to the survey in [66]
HDR reconstruction Digital cameras can only capture
around two orders of magnitude in luminance, and have
therefore a very limited Dynamic Range. This is far beyond
the luminance differences that appear in the real world, and
the reason why, when capturing images, we may end up
with highlights in very bright regions or information loss in
very dark ones.
The first works aiming at recovering the dynamic range
of images were based on a set of multiple images from the
same scene. The seminar work of Debevek and Malik [20]
assumed that from a set of multiples images of the same
scene, it is possible to recover a single Camera Response
Function (CRF) and undo the camera process. This hypoth-
esis was true for film cameras, but as recently proved by Gil
Rodriguez et al. [45] this is not true for current digital cam-
eras, as color channels are not independent and the camera
modifies the non-linearity for different exposure values.
Currently, thanks to deep learning and different bench-
mark challenges [44], there has been a surge of methods
aiming at recovering the full High Dynamic Range (HDR)
from a single input image, a problem also named inverse
tone mapping [5,6]. This rising started with the work of
Eilersten et al. [21], where they proposed a U-Net architec-
ture for solving this problem. Other deep learning based
methods for HDR single-image reconstruction are the ones
in [37,48]. Regarding deep learning methods for multiple
images input, we should also mention the work in [10].
IRA Block
Channel
Attention
Spatial
Attention
C
CONV 1x1
Sigmoid
IRB IRB +
CONV 1x1
DWCONV 3x3
ReLU
CONV 1x1
CONV 1x1
ReLU
CONV 1x1
++
E1 E2 E3 D1 D2
Skip Connections
CONV 3x3
+
IRA DOWN IRA UP
E D
Figure 2. Architecture of the proposed LPIENet network and the IRA Block. Our model is designed considering current TFLITE supported
operations and mobile devices limitations. The attention mechanisms [12,57] have been optimized to require less memory and computation.
UDC image restoration Recently, a new imaging sys-
tem, the Under-Display Camera (UDC) has appeared. The
UDC system consists of a camera module placed under-
neath and closely attached to the semi-transparent Organic
Light-Emitting Diode (OLED) display [72]. This solution
provides an advantage when it comes to the user experience
analysis, with the full-screen design providing a higher level
of comfort. The disadvantage of this solution is that the
OLED display acts as an obstacle for the light interacting
with the camera sensor, inducing additional reflections, re-
fractions and other effects connected to the Image Signal
Processing (ISP) [18] model characterizing the camera.
Zhou et al. [73] and their 2020 ECCV challenge [72]
were the first works that directly addressed this novel
restoration problem using deep learning. Baidu Research
team [72] proposed the Residual dense based on Shade-
Correction for T-OLED UDC Image Restoration. In [73],
the authors devised an MCIS to capture paired images, and
solve the UDC image restoration problem as a blind decon-
volution problem. More recently, Feng et al. [22] proposed
one of the world’s first production UDC device for data col-
lection, experiments, and evaluations. They also proposed a
new model called DISCNet [22], and provided a benchmark
for multiple blind and non-blind methods on UDC datasets.
General problem formulation It is important to note that
the UDC problem can be seen as a generalization of the
other three listed above. Following the formulation pro-
posed in [22], the UDC problem can be formulated as:
y=γ(C(xk+n)),(1)
where xrepresents the clean image with high dynamic
range, kis the point spread function (PSF) (i.e. the blur-
ring kernel), represents the 2D convolution operator, and
ndenotes the camera noise. Also, C(·)emulates the reduc-
tion of dynamic range, following C(x) = min(x, xmax ),
where xmax is a range threshold, and γ(·)represents a
Tone-Mapping Function.
From this formulation we can clearly see that when we
suppose there is not noise (n= 0 at all pixels), xmax is large
enough and we do not perform any Tone-Mapping, we end
up with the traditional deblurring formulation y=xk.
Similarly, if we suppose that the kernel kis a Dirac Delta,
xmax is equal to the maximum possible value of the input
signal and we do not perform any Tone-Mapping we end up
with the traditional image denoising formulation y=x+n.
As we will be using in this paper the SIDD dataset [2], we
will suppose that n=N(0, β), where β2(y) = β1y+β2
and β1represents the shot noise and β2represents the in-
dependent additive Gaussian noise. Finally, supposing that
there is not noise (n= 0 at all pixels), and that the kernel
kis a Dirac Delta, we end up with the HDR image recon-
struction problem.
However, as explained in the introduction, none of
the aforementioned methods have analyzed these problems
from the efficiency point of view. The proposed models can
generate high quality results but cannot be integrated into
modern smartphones due to their complexity i.e. FLOPs.
3. Proposed Method
We propose a new model called LPIENet, following a
U-Net [46] like architecture, standard in image restora-
tion [22,55,62,66,73]. The main building blocks are in-
verted residual attention blocks, we refer to them as IRA
blocks. These are selected due to their efficiency [31]. This
is a blind image restoration method, and therefore, we do
not rely on the PSF or other information about the cam-
era sensor. This architecture is illustrated in Figure 2. The
initial model consists on 5 blocks (3 encoders and 2 de-
coders) with [16,32,64,32,16] channels respectively, and
0.13M parameters.
摘要:

PerceptualImageEnhancementforSmartphoneReal-TimeApplicationsMarcosV.Conde1,FlorinVasluianu1,JavierVazquez-Corral2,RaduTimofte11ComputerVisionLab,CAIDAS&IFI,UniversityofW¨urzburg,Germany2ComputerVisionCenterandComputerScienceDept.,UniversitatAut`onomadeBarcelona,Spain{marcos.conde-osorio,radu.timofte...

展开>> 收起<<
Perceptual Image Enhancement for Smartphone Real-Time Applications Marcos V . Conde1 Florin Vasluianu1 Javier Vazquez-Corral2 Radu Timofte1 1Computer Vision Lab CAIDAS IFI University of W urzburg Germany.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:7.85MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注