PSENet Progressive Self-Enhancement Network for Unsupervised Extreme-Light Image Enhancement Hue Nguyen Diep Tran Khoi Nguyen Rang Nguyen

2025-05-02 0 0 3.93MB 14 页 10玖币
侵权投诉
PSENet: Progressive Self-Enhancement Network
for Unsupervised Extreme-Light Image Enhancement
Hue Nguyen Diep Tran Khoi Nguyen Rang Nguyen
VinAI Research, Vietnam
{v.huent88, v.diepttn147, v.khoindm, v.rangnhm}@vinai.io
Abstract
The extremes of lighting (e.g. too much or too little
light) usually cause many troubles for machine and hu-
man vision. Many recent works have mainly focused on
under-exposure cases where images are often captured in
low-light conditions (e.g. nighttime) and achieved promis-
ing results for enhancing the quality of images. However,
they are inferior to handling images under over-exposure.
To mitigate this limitation, we propose a novel unsuper-
vised enhancement framework which is robust against var-
ious lighting conditions while does not require any well-
exposed images to serve as the ground-truths. Our main
concept is to construct pseudo-ground-truth images synthe-
sized from multiple source images that simulate all poten-
tial exposure scenarios to train the enhancement network.
Our extensive experiments show that the proposed approach
consistently outperforms the current state-of-the-art unsu-
pervised counterparts in several public datasets in terms of
both quantitative metrics and qualitative results. Our code
is available at https://github.com/VinAIResearch/PSENet-
Image-Enhancement.
1. Introduction
Producing images with high contrast, vivid color, and
rich details is one of the important goals of photography.
However, acquiring such pleasing images is not always a
trivial task due to harsh lighting conditions, including ex-
treme low lighting or unbalanced lighting conditions caused
by backlighting. The resulting under-/over-exposed images
usually decrease not only human satisfaction but also com-
puter vision system performance on several downstream
tasks such as object detection [33] or image segmenta-
tion [37]. Wrong exposure problems occur early in the cap-
turing process and are difficult to fix once the final 8-bit im-
age has been rendered. This is because the in-camera image
signal processors usually use highly nonlinear operations to
generate the final 8-bit standard RGB image [28, 12, 29].
Input
PSNR: 13.63
SSIM: 0.713
ZeroDCE
PSNR: 9.95
SSIM: 0.513
Afifi et al.
PSNR: 21.42
SSIM: 0.817
Our
PSNR: 23.96
SSIM: 0.847
Ground truth
Figure 1. Visual comparison on an over-exposed scene. Most of
the previous state-of-the-art failed to recover the over-exposed
case except recent work proposed by Afifi et al. [1], which is
trained with full supervision.
Many recent works have mainly focused on under-
exposure cases where images are often captured in low-light
conditions (e.g. nighttime). These works have achieved
promising results for enhancing the quality of images even
captured under extreme low-light conditions. However,
they failed to handle over-exposure images, as shown in
Fig. 1. The recent work proposed by Afifi et al. [1]
achieves impressive results in improving both under- and
over-exposed cases. However, their proposed method is de-
signed to work in a supervised manner, requiring a large
dataset of wrongly exposed and corresponding ground-truth
(GT) well-exposed image pairs. This data collection is typ-
ically time-consuming and expensive.
In this paper, we propose a novel unsupervised approach
that does not require any well-exposed GT images. The
key idea is to generate a pseudo GT image given the input
wrongly exposed one in order to train an enhancement net-
work. The pseudo GT image across training epochs is pro-
gressively generated by choosing the visually best regions
taken from multiple sources, namely the output of the same
input image from the previous epoch, the brighter/darker
reference images by changing the gamma value of the in-
put image, and the input image itself. The choosing crite-
ria are well-exposedness, local contrast, and color satura-
tion, which are driven by human knowledge of a visually
good image and have been shown to be effective in mea-
suring perceptual image quality [25]. In this way, the task
of generating pseudo GT images is simply comparing and
arXiv:2210.00712v1 [cs.CV] 3 Oct 2022
selecting the best regions from different sources where al-
most possible cases of exposure are simulated in training.
Furthermore, by using the output of the previous epoch as
a source for choosing, we ensure that the output of the cur-
rent epoch will be better than or at least equal to that of the
previous one, giving the name of our approach PSENet –
Progressive Self Enhancement Network.
Our contributions are summarized as follows:
We introduce a new method for generating effective
pseudo-GT images from given wrongly-exposed im-
ages. The generating process is driven by a new non-
reference score reflecting the human evaluation of a
visually good image.
We propose a novel unsupervised progressive pseudo-
GT-based approach that is robust to various severe
lighting conditions, i.e. under-exposure and over-
exposure. As a result, the burden of gathering the
matched image pairs is removed.
• Comprehensive experiments are conducted to show
that our approach outperforms previous unsupervised
methods by large margins on the SICE [3] and Afifi
[1] datasets and obtains comparable results with su-
pervised counterparts.
2. Related Work
Image enhancement approaches can be divided into two
categories: traditional and learning-based methods.
Traditional methods. One of the simplest and fastest ap-
proaches is to transform single pixels of an input image
by a mathematical function such as linear function, gamma
function, or logarithmic function. For example, histogram
equalization-based algorithms stretch out the image’s inten-
sity range using the cumulative distribution function, result-
ing in the image’s increased global contrast. The Retinex
theory [16], on the other hand, argues that an image is
made from two components: reflectance and illumination.
By estimating the illumination component of an image, the
dynamic range of the image can be easily adjusted to re-
produce images with better color contrast. However, most
Retinex algorithms use Gaussian convolution to estimate il-
lumination, thus leading to blurring edges [40]. Frequency-
domain-based methods, by contrast, preserve edges by em-
ploying the high-pass filter to enhance the high-frequency
components in the Fourier transform domain [38]. How-
ever, the adaptability of such traditional methods is often
limited due to their unawareness of the overall and local
complex gray distribution of an image [40]. For a system-
atic review of conventional approaches, we suggest readers
refer to the work of Wang et al. [40].
Learning-based methods. In recent years, there has been
increasing attention to learning-based photo-enhancing
methods in both supervised and unsupervised manners.
Supervised learning methods aim to recover natural im-
ages by either directly outputting the high quality images
[22, 23, 20, 42] or learning specific parameters of a para-
metric model (e.g. Retinex model) [6, 39, 26] from a paired
dataset. SID [5] is a typical example in the first direction.
In this work, the authors collect a short-exposure low-light
image dataset and adopt a vanilla Unet architecture [34] to
produce an enhanced sRGB image from raw data thus re-
placing the traditional image processing pipeline. Follow-
ing this work, Lamba and Mitra [14] present a novel net-
work architecture that concurrently processes all the scales
of an image and can reduce the latency times by 30% with-
out decreasing the image quality. Different from the previ-
ously mentioned approaches, Cai et al. [3] explore a new
direction in which both under and over-exposed images are
considered. They introduce a novel two-stage framework
trained on their own multi-exposure image dataset, which
enhances the low-frequency and high-frequency compo-
nents separately before refining the whole image in the sec-
ond stage. Afifi et al. [1] put a further step in this direc-
tion by introducing a larger dataset along with a coarse-
to-fine neural network to enhance image qualities in both
under- and over-exposure cases. For learning a parametric
model, Retinex theory [15] is often adopted [41, 19, 39].
Benefiting from paired data, the authors focus on design-
ing networks to estimate the reflectance and illumination
of an input image. Dealing with the image enhancement
task differently, HDRNet [6] presents a novel convolutional
neural network to predict the coefficients of a locally-affine
model in bilateral space using pairs of input/output images.
Unsupervised learning. Collecting paired training data is
always time-consuming and expensive. To address this is-
sue, an unpaired GAN-based method named EnlightenGAN
is proposed in [10]. The network, including an attention-
guided U-Net as a generator and global-local discrimina-
tors, shows promising results even though the correspond-
ing ground truth image is absent. To further reduce the cost
of collecting reference ground truth images, a set of meth-
ods [44, 46, 8, 18] that do not require paired or unpaired
training data are proposed. Two recent methods in this cat-
egory named ZeroDCE [8] and Zheng and Gupta [45] show
impressive results in low-light image enhancement tasks by
using a CNN model trained under a set of no reference loss
functions to learn an image-specific curve for producing a
high-quality output image. However, these methods seem
to perform poorly when extending to correct over-exposed
images, as shown in Fig. 1.
Our proposed method, in contrast, is the first deep learn-
ing work handling these extreme lighting conditions in an
unsupervised manner.
3. Methodology
Given an sRGB image, I, captured under a harsh light-
ing condition with low contrast and washed-out color, our
Reference Image
Generator
Input image
Enhancement
Network
Gamma map Output image
Darker reference
images


Brighter reference
images

Input image Output image
from previous epoch
Pseudo GT
Image Generator
Pseudo GT image
Reconstruction
loss
Total variation
loss
   
Forward path Backward path
Figure 2. Overview of our proposed framework which comprises three main modules: reference image generator, pseudo GT image
generator, and enhancement network. Given the input image I, the reference image generator randomly generates multiple reference
images with different exposure values, half of them are brighter than the input image while the rest are darker. The pseudo GT image
generator then takes the input image, the output image from the previous epoch, and the generated reference images as input to produce
the pseudo GT image Twhich is visually better than each of the input components alone based on our proposed non-reference scoring
criteria. Finally, the enhancement network predicts the gamma map γto transform the original image Ito obtain the output image Y. The
enhancement network is trained with two loss functions: reconstruction loss between the output image Yand the pseudo GT image T,
and the total variation loss applied on the predicted gamma map γto imply the smoothness in prediction. It is worth noting that only the
enhancement network is used in testing.
approach aims to reconstruct the corresponding enhanced
image Y, which is visibly better and visually pleasant in
terms of contrast and color without any supervision.
To address the problem, our key contribution is to pro-
pose a new self-supervised learning strategy for training the
image enhancement network. That is, we randomly synthe-
size a set of reference images to be combined together to
produce a synthetically high-quality GT image for training.
The way of combination is driven by the human knowledge
of how visually good an image is. To our best knowledge,
our unsupervised method is the first to produce pseudo-GT
images for training on a large set of ill-exposed images;
while other data synthesized methods use well-exposed im-
ages as GT to generate corresponding ill-exposed inputs.
By using this approach, our model does not suffer from the
domain gap issue. Compared with image fusion, which only
produces a single output image for input, our pseudo GT
images are progressively improved after each epoch, mak-
ing our model adapt to a wide range of lighting conditions
(see Sec. 4 for empirical evidence).
In detail, our reference image generator first takes an im-
age as input and generates 2Nimages where the first N
images are darker and the rest are brighter compared to the
original input image. Then, the pseudo GT generator mod-
ule uses these reference images along with the input and the
previous prediction of the enhancement network to create
the pseudo GT image. It is worth noting that including the
previous prediction in the set of references ensures that the
quality of the pseudo GT image is greater or at least equal
to the previous prediction according to our proposed non-
reference score, thus making our training progressively im-
proved. Our training framework is illustrated in Fig. 2 and
the detail of each module will be described in the following
sections.
3.1. Random Reference Image Generation
To synthesize an under/over-exposed image, we employ
a gamma mapping function, which is a nonlinear operation
often used to adjust the overall brightness of an image in
the image processing pipeline [31]. The gamma mapping
function is based on the observation that human eyes per-
ceive the relative change in the light following a power-
law function rather than a linear function as in cameras
[13]. The connection between the gamma mapping func-
tion and the human visual system enables the gamma map-
ping function to be widely used in image contrast enhance-
ment [7, 32, 36]. However, rather than apply the gamma
function directly to the original image, we adopt a haze re-
moval technique in which we apply it to the inverted image
to generate 2Nreference images Ynas shown in Eq. (1).
The reason is that hazy images and poor lighting images
normally share the same property of low dynamic range
with the high noise level. Therefore, haze removal tech-
niques (e.g. using an inverse image) can be used to enhance
摘要:

PSENet:ProgressiveSelf-EnhancementNetworkforUnsupervisedExtreme-LightImageEnhancementHueNguyenDiepTranKhoiNguyenRangNguyenVinAIResearch,Vietnam{v.huent88,v.diepttn147,v.khoindm,v.rangnhm}@vinai.ioAbstractTheextremesoflighting(e.g.toomuchortoolittlelight)usuallycausemanytroublesformachineandhu-manvis...

展开>> 收起<<
PSENet Progressive Self-Enhancement Network for Unsupervised Extreme-Light Image Enhancement Hue Nguyen Diep Tran Khoi Nguyen Rang Nguyen.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:3.93MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注