Perceptual Image Enhancement for Smartphone Real-Time Applications Marcos V . Conde1 Florin Vasluianu1 Javier Vazquez-Corral2 Radu Timofte1 1Computer Vision Lab CAIDAS IFI University of W urzburg Germany

2025-05-02 0 0 7.85MB 11 页 10玖币

侵权投诉

Perceptual Image Enhancement for Smartphone Real-Time Applications

Marcos V. Conde 1, Florin Vasluianu 1, Javier Vazquez-Corral 2, Radu Timofte 1

1Computer Vision Lab, CAIDAS & IFI, University of W¨

urzburg, Germany

2Computer Vision Center and Computer Science Dept., Universitat Aut`

onoma de Barcelona, Spain

{marcos.conde-osorio,radu.timofte}@uni-wuerzburg.de

https://github.com/mv-lab/AISP

Abstract

Recent advances in camera designs and imaging

pipelines allow us to capture high-quality images using

smartphones. However, due to the small size and lens lim-

itations of the smartphone cameras, we commonly ﬁnd ar-

tifacts or degradation in the processed images. The most

common unpleasant effects are noise artifacts, diffraction

artifacts, blur, and HDR overexposure. Deep learning meth-

ods for image restoration can successfully remove these ar-

tifacts. However, most approaches are not suitable for real-

time applications on mobile devices due to their heavy com-

putation and memory requirements.

In this paper, we propose LPIENet, a lightweight net-

work for perceptual image enhancement, with the focus on

deploying it on smartphones. Our experiments show that,

with much fewer parameters and operations, our model

can deal with the mentioned artifacts and achieve compet-

itive performance compared with state-of-the-art methods

on standard benchmarks. Moreover, to prove the efﬁciency

and reliability of our approach, we deployed the model di-

rectly on commercial smartphones and evaluated its perfor-

mance. Our model can process 2K resolution images under

1 second in mid-level commercial smartphones.

1. Introduction

In recent years the number of images that are captured

has increased exponentially. The main reason for this surge

comes from the ubiquitous presence of smartphones in our

daily life. Phone manufacturers are continuously competing

with the goal of delivering better images to their consumers

in order to increase their sales. Therefore, a lot of research

has been focused on improving the perceptual quality of

these sRGB images.

Image restoration aims at improving the images captured

by the cameras by removing different degradations intro-

duced during image acquisition. These degradations can be

introduced due to the physical limitations of cameras, for

50x less MACs

Figure 1. Comparison of computational cost and performance of

state-of-the-art methods for image denoising (SIDD) [2,12,51,54,

62]. We can process 2K resolution images in 0.4s, and 4K images

in 1.5 seconds on regular smartphone GPUs.

example the small aperture and limited dynamic range of

smartphone cameras [64], or by inappropriate lighting con-

ditions (i.e. images captured in low-light). To solve these

problems, image restoration is usually understood as an ill-

posed problem, in which, given the degraded image the al-

gorithm needs to output a clean image.

To be embedded in-camera by a manufacturer, an image

restoration algorithm should comply with strong require-

ments in terms of quality, robustness, computational com-

plexity, and execution time. In general, digital cameras have

a set of resources in which to allocate all the operations in

the ISP pipeline [18]. Therefore, any new operation to be

introduced in this pipeline should be of good enough qual-

ity to “pay” for the resources it will consume. Moreover,

for an algorithm to be embedded in a camera, it is required

to always improve over the input image, i.e. to be robust for

any possible circumstance and input signal.

Image restoration is a traditional problem, and its study

began as soon as we started to capture images, and many fa-

mous methods, such as Non-local Means for image denois-

ing [8], are almost 20 years old. These traditional methods

were usually deﬁned by hand-crafted priors that narrowed

the ill-posed nature of the problems by reducing the set of

plausible solutions. However, since 2012 there has been a

arXiv:2210.13552v2 [cs.CV] 23 Nov 2023

switch to deep learning based image restoration algorithms,

as these methods have proven to be very powerful to gener-

alize priors from a large number of images.

Unfortunately, despite the great advances and perfor-

mance, research on image restoration and enhancement us-

ing deep learning usually forgets the previous deﬁned need

for obtaining algorithms that have low computational com-

plexity and execution time; and therefore many of them can-

not be integrated into modern smartphones due to their com-

plexity i.e. FLOPs or memory requirements.

In this paper, we therefore aim at deﬁning a new image

enhancement algorithm that achieves competitive results in

comparison with state-of-the-art methods on different re-

lated tasks, yet, at the same time, presents a low-complexity

and a competitive execution time in current off-the-shelf

smartphones as proven by the use of the AIScore [32]. A

ﬁrst example of this behaviour is shown in Figure 1, where

we compare our method to the current state-of-the-art in im-

age denoising. As it can be seen, our methods is as close

as 0.02 in SSIM to the state-of-the-art, while having at least

×50 less MACs. More details will appear later in Section 5.

In summary, our contributions are as follows:

• We propose a lightweight U-Net based architecture

characterized by the inverted residual attention (IRA)

block. Similar to contemporary works in this ﬁeld, yet

more efﬁcient and smaller.

• We optimize our model in terms of used parameters

and computational cost (i.e. FLOPs, MACs), thus be-

ing able to achieve real-time performance on current

smartphone GPUs at FullHD input image resolution.

This improvement is illustrated in Figure 1.

• We propose a new type of analysis, from a production

point of view, observing the behaviour of our model

when deployed on commercial smartphones.

2. Related Work

Image restoration is split in a large number of sub-

problems, and in this paper we focus on four of the most

popular in current research: image denoising, image de-

blurring, HDR image reconstruction from a single image,

and Under-Display-Camera (UDC) image restoration.

Image denoising Image denoising has been a topic of re-

search for more than 30 years. The most famous traditional

image denoising methods are the non-local ones, such as

Non-Local-Means [8] and BM3D [19]. More recently, mul-

tiple methods have studied different image representations

to facilitate the denoising problem for this well-behaved al-

gorithms [25,52].

As in other image restoration problems, research on im-

age denoising has moved towards deep learning models.

The ﬁrst remarkable work on denoising with deep learning

is probably Zhang et al. [67] DnCNN, where they proposed

to learn a CNN to estimate the noise distribution of the input

image. Since then, plenty of other deep learning methods

have appeared [4,11,29,36,55,61,62,64,65]. This has also

been possible thanks to the appearance of challenges such

as [1,2] that facilitate a benchmark for compering meth-

ods together with training and testing images. For a deeper

analysis we refer the reader to the survey in [50].

Image deblurring Image deblurring is a traditional prob-

lem in image restoration. Its main objective is to remove the

blur that appears in the input image, this can be caused by

different factors (i.e. camera shake, object motion, or lack

of focus) and output a sharp image. As it is the case for im-

age denoising, initially the different algorithms were based

on hand-made priors or constraints, mostly treating image

restoration as an inverse ﬁltering problem [15,24,58], but

these methods were surpassed with the appearance of deep

learning models [16,35,41,49,51]. For a more in-depth

analysis of deep learning methods applied to this problem,

we point the reader to the survey in [66]

HDR reconstruction Digital cameras can only capture

around two orders of magnitude in luminance, and have

therefore a very limited Dynamic Range. This is far beyond

the luminance differences that appear in the real world, and

the reason why, when capturing images, we may end up

with highlights in very bright regions or information loss in

very dark ones.

The ﬁrst works aiming at recovering the dynamic range

of images were based on a set of multiple images from the

same scene. The seminar work of Debevek and Malik [20]

assumed that from a set of multiples images of the same

scene, it is possible to recover a single Camera Response

Function (CRF) and undo the camera process. This hypoth-

esis was true for ﬁlm cameras, but as recently proved by Gil

Rodriguez et al. [45] this is not true for current digital cam-

eras, as color channels are not independent and the camera

modiﬁes the non-linearity for different exposure values.

Currently, thanks to deep learning and different bench-

mark challenges [44], there has been a surge of methods

aiming at recovering the full High Dynamic Range (HDR)

from a single input image, a problem also named inverse

tone mapping [5,6]. This rising started with the work of

Eilersten et al. [21], where they proposed a U-Net architec-

ture for solving this problem. Other deep learning based

methods for HDR single-image reconstruction are the ones

in [37,48]. Regarding deep learning methods for multiple

images input, we should also mention the work in [10].

IRA Block

Channel

Attention

Spatial

Attention

CONV 1x1

Sigmoid

IRB IRB +

CONV 1x1

DWCONV 3x3

ReLU

CONV 1x1

ReLU

CONV 1x1

E1 E2 E3 D1 D2

Skip Connections

CONV 3x3

IRA DOWN IRA UP

E D

Figure 2. Architecture of the proposed LPIENet network and the IRA Block. Our model is designed considering current TFLITE supported

operations and mobile devices limitations. The attention mechanisms [12,57] have been optimized to require less memory and computation.

UDC image restoration Recently, a new imaging sys-

tem, the Under-Display Camera (UDC) has appeared. The

UDC system consists of a camera module placed under-

neath and closely attached to the semi-transparent Organic

Light-Emitting Diode (OLED) display [72]. This solution

provides an advantage when it comes to the user experience

analysis, with the full-screen design providing a higher level

of comfort. The disadvantage of this solution is that the

OLED display acts as an obstacle for the light interacting

with the camera sensor, inducing additional reﬂections, re-

fractions and other effects connected to the Image Signal

Processing (ISP) [18] model characterizing the camera.

Zhou et al. [73] and their 2020 ECCV challenge [72]

were the ﬁrst works that directly addressed this novel

restoration problem using deep learning. Baidu Research

team [72] proposed the Residual dense based on Shade-

Correction for T-OLED UDC Image Restoration. In [73],

the authors devised an MCIS to capture paired images, and

solve the UDC image restoration problem as a blind decon-

volution problem. More recently, Feng et al. [22] proposed

one of the world’s ﬁrst production UDC device for data col-

lection, experiments, and evaluations. They also proposed a

new model called DISCNet [22], and provided a benchmark

for multiple blind and non-blind methods on UDC datasets.

General problem formulation It is important to note that

the UDC problem can be seen as a generalization of the

other three listed above. Following the formulation pro-

posed in [22], the UDC problem can be formulated as:

y=γ(C(x∗k+n)),(1)

where xrepresents the clean image with high dynamic

range, kis the point spread function (PSF) (i.e. the blur-

ring kernel), ∗represents the 2D convolution operator, and

ndenotes the camera noise. Also, C(·)emulates the reduc-

tion of dynamic range, following C(x) = min(x, xmax ),

where xmax is a range threshold, and γ(·)represents a

Tone-Mapping Function.

From this formulation we can clearly see that when we

suppose there is not noise (n= 0 at all pixels), xmax is large

enough and we do not perform any Tone-Mapping, we end

up with the traditional deblurring formulation y=x∗k.

Similarly, if we suppose that the kernel kis a Dirac Delta,

xmax is equal to the maximum possible value of the input

signal and we do not perform any Tone-Mapping we end up

with the traditional image denoising formulation y=x+n.

As we will be using in this paper the SIDD dataset [2], we

will suppose that n=N(0, β), where β2(y) = β1y+β2

and β1represents the shot noise and β2represents the in-

dependent additive Gaussian noise. Finally, supposing that

there is not noise (n= 0 at all pixels), and that the kernel

kis a Dirac Delta, we end up with the HDR image recon-

struction problem.

However, as explained in the introduction, none of

the aforementioned methods have analyzed these problems

from the efﬁciency point of view. The proposed models can

generate high quality results but cannot be integrated into

modern smartphones due to their complexity i.e. FLOPs.

3. Proposed Method

We propose a new model called LPIENet, following a

U-Net [46] like architecture, standard in image restora-

tion [22,55,62,66,73]. The main building blocks are in-

verted residual attention blocks, we refer to them as IRA

blocks. These are selected due to their efﬁciency [31]. This

is a blind image restoration method, and therefore, we do

not rely on the PSF or other information about the cam-

era sensor. This architecture is illustrated in Figure 2. The

initial model consists on 5 blocks (3 encoders and 2 de-

coders) with [16,32,64,32,16] channels respectively, and

0.13M parameters.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PerceptualImageEnhancementforSmartphoneReal-TimeApplicationsMarcosV.Conde1,FlorinVasluianu1,JavierVazquez-Corral2,RaduTimofte11ComputerVisionLab,CAIDAS&IFI,UniversityofW¨urzburg,Germany2ComputerVisionCenterandComputerScienceDept.,UniversitatAut`onomadeBarcelona,Spain{marcos.conde-osorio,radu.timofte...

展开>> 收起<<

Perceptual Image Enhancement for Smartphone Real-Time Applications Marcos V . Conde1 Florin Vasluianu1 Javier Vazquez-Corral2 Radu Timofte1 1Computer Vision Lab CAIDAS IFI University of W urzburg Germany.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Perceptual Image Enhancement for Smartphone Real-Time Applications Marcos V . Conde1 Florin Vasluianu1 Javier Vazquez-Corral2 Radu Timofte1 1Computer Vision Lab CAIDAS IFI University of W urzburg Germany

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: