PSENet Progressive Self-Enhancement Network for Unsupervised Extreme-Light Image Enhancement Hue Nguyen Diep Tran Khoi Nguyen Rang Nguyen

2025-05-02 1 0 3.93MB 14 页 10玖币

侵权投诉

PSENet: Progressive Self-Enhancement Network

for Unsupervised Extreme-Light Image Enhancement

Hue Nguyen Diep Tran Khoi Nguyen Rang Nguyen

VinAI Research, Vietnam

{v.huent88, v.diepttn147, v.khoindm, v.rangnhm}@vinai.io

Abstract

The extremes of lighting (e.g. too much or too little

light) usually cause many troubles for machine and hu-

man vision. Many recent works have mainly focused on

under-exposure cases where images are often captured in

low-light conditions (e.g. nighttime) and achieved promis-

ing results for enhancing the quality of images. However,

they are inferior to handling images under over-exposure.

To mitigate this limitation, we propose a novel unsuper-

vised enhancement framework which is robust against var-

ious lighting conditions while does not require any well-

exposed images to serve as the ground-truths. Our main

concept is to construct pseudo-ground-truth images synthe-

sized from multiple source images that simulate all poten-

tial exposure scenarios to train the enhancement network.

Our extensive experiments show that the proposed approach

consistently outperforms the current state-of-the-art unsu-

pervised counterparts in several public datasets in terms of

both quantitative metrics and qualitative results. Our code

is available at https://github.com/VinAIResearch/PSENet-

Image-Enhancement.

1. Introduction

Producing images with high contrast, vivid color, and

rich details is one of the important goals of photography.

However, acquiring such pleasing images is not always a

trivial task due to harsh lighting conditions, including ex-

treme low lighting or unbalanced lighting conditions caused

by backlighting. The resulting under-/over-exposed images

usually decrease not only human satisfaction but also com-

puter vision system performance on several downstream

tasks such as object detection [33] or image segmenta-

tion [37]. Wrong exposure problems occur early in the cap-

turing process and are difﬁcult to ﬁx once the ﬁnal 8-bit im-

age has been rendered. This is because the in-camera image

signal processors usually use highly nonlinear operations to

generate the ﬁnal 8-bit standard RGB image [28, 12, 29].

Input

PSNR: 13.63

SSIM: 0.713

ZeroDCE

PSNR: 9.95

SSIM: 0.513

Afifi et al.

PSNR: 21.42

SSIM: 0.817

Our

PSNR: 23.96

SSIM: 0.847

Ground truth

Figure 1. Visual comparison on an over-exposed scene. Most of

the previous state-of-the-art failed to recover the over-exposed

case except recent work proposed by Aﬁﬁ et al. [1], which is

trained with full supervision.

Many recent works have mainly focused on under-

exposure cases where images are often captured in low-light

conditions (e.g. nighttime). These works have achieved

promising results for enhancing the quality of images even

captured under extreme low-light conditions. However,

they failed to handle over-exposure images, as shown in

Fig. 1. The recent work proposed by Aﬁﬁ et al. [1]

achieves impressive results in improving both under- and

over-exposed cases. However, their proposed method is de-

signed to work in a supervised manner, requiring a large

dataset of wrongly exposed and corresponding ground-truth

(GT) well-exposed image pairs. This data collection is typ-

ically time-consuming and expensive.

In this paper, we propose a novel unsupervised approach

that does not require any well-exposed GT images. The

key idea is to generate a pseudo GT image given the input

wrongly exposed one in order to train an enhancement net-

work. The pseudo GT image across training epochs is pro-

gressively generated by choosing the visually best regions

taken from multiple sources, namely the output of the same

input image from the previous epoch, the brighter/darker

reference images by changing the gamma value of the in-

put image, and the input image itself. The choosing crite-

ria are well-exposedness, local contrast, and color satura-

tion, which are driven by human knowledge of a visually

good image and have been shown to be effective in mea-

suring perceptual image quality [25]. In this way, the task

of generating pseudo GT images is simply comparing and

arXiv:2210.00712v1 [cs.CV] 3 Oct 2022

selecting the best regions from different sources where al-

most possible cases of exposure are simulated in training.

Furthermore, by using the output of the previous epoch as

a source for choosing, we ensure that the output of the cur-

rent epoch will be better than or at least equal to that of the

previous one, giving the name of our approach PSENet –

Progressive Self Enhancement Network.

Our contributions are summarized as follows:

• We introduce a new method for generating effective

pseudo-GT images from given wrongly-exposed im-

ages. The generating process is driven by a new non-

reference score reﬂecting the human evaluation of a

visually good image.

• We propose a novel unsupervised progressive pseudo-

GT-based approach that is robust to various severe

lighting conditions, i.e. under-exposure and over-

exposure. As a result, the burden of gathering the

matched image pairs is removed.

• Comprehensive experiments are conducted to show

that our approach outperforms previous unsupervised

methods by large margins on the SICE [3] and Aﬁﬁ

[1] datasets and obtains comparable results with su-

pervised counterparts.

2. Related Work

Image enhancement approaches can be divided into two

categories: traditional and learning-based methods.

Traditional methods. One of the simplest and fastest ap-

proaches is to transform single pixels of an input image

by a mathematical function such as linear function, gamma

function, or logarithmic function. For example, histogram

equalization-based algorithms stretch out the image’s inten-

sity range using the cumulative distribution function, result-

ing in the image’s increased global contrast. The Retinex

theory [16], on the other hand, argues that an image is

made from two components: reﬂectance and illumination.

By estimating the illumination component of an image, the

dynamic range of the image can be easily adjusted to re-

produce images with better color contrast. However, most

Retinex algorithms use Gaussian convolution to estimate il-

lumination, thus leading to blurring edges [40]. Frequency-

domain-based methods, by contrast, preserve edges by em-

ploying the high-pass ﬁlter to enhance the high-frequency

components in the Fourier transform domain [38]. How-

ever, the adaptability of such traditional methods is often

limited due to their unawareness of the overall and local

complex gray distribution of an image [40]. For a system-

atic review of conventional approaches, we suggest readers

refer to the work of Wang et al. [40].

Learning-based methods. In recent years, there has been

increasing attention to learning-based photo-enhancing

methods in both supervised and unsupervised manners.

Supervised learning methods aim to recover natural im-

ages by either directly outputting the high quality images

[22, 23, 20, 42] or learning speciﬁc parameters of a para-

metric model (e.g. Retinex model) [6, 39, 26] from a paired

dataset. SID [5] is a typical example in the ﬁrst direction.

In this work, the authors collect a short-exposure low-light

image dataset and adopt a vanilla Unet architecture [34] to

produce an enhanced sRGB image from raw data thus re-

placing the traditional image processing pipeline. Follow-

ing this work, Lamba and Mitra [14] present a novel net-

work architecture that concurrently processes all the scales

of an image and can reduce the latency times by 30% with-

out decreasing the image quality. Different from the previ-

ously mentioned approaches, Cai et al. [3] explore a new

direction in which both under and over-exposed images are

considered. They introduce a novel two-stage framework

trained on their own multi-exposure image dataset, which

enhances the low-frequency and high-frequency compo-

nents separately before reﬁning the whole image in the sec-

ond stage. Aﬁﬁ et al. [1] put a further step in this direc-

tion by introducing a larger dataset along with a coarse-

to-ﬁne neural network to enhance image qualities in both

under- and over-exposure cases. For learning a parametric

model, Retinex theory [15] is often adopted [41, 19, 39].

Beneﬁting from paired data, the authors focus on design-

ing networks to estimate the reﬂectance and illumination

of an input image. Dealing with the image enhancement

task differently, HDRNet [6] presents a novel convolutional

neural network to predict the coefﬁcients of a locally-afﬁne

model in bilateral space using pairs of input/output images.

Unsupervised learning. Collecting paired training data is

always time-consuming and expensive. To address this is-

sue, an unpaired GAN-based method named EnlightenGAN

is proposed in [10]. The network, including an attention-

guided U-Net as a generator and global-local discrimina-

tors, shows promising results even though the correspond-

ing ground truth image is absent. To further reduce the cost

of collecting reference ground truth images, a set of meth-

ods [44, 46, 8, 18] that do not require paired or unpaired

training data are proposed. Two recent methods in this cat-

egory named ZeroDCE [8] and Zheng and Gupta [45] show

impressive results in low-light image enhancement tasks by

using a CNN model trained under a set of no reference loss

functions to learn an image-speciﬁc curve for producing a

high-quality output image. However, these methods seem

to perform poorly when extending to correct over-exposed

images, as shown in Fig. 1.

Our proposed method, in contrast, is the ﬁrst deep learn-

ing work handling these extreme lighting conditions in an

unsupervised manner.

3. Methodology

Given an sRGB image, I, captured under a harsh light-

ing condition with low contrast and washed-out color, our

Reference Image

Generator

Input image 

Enhancement

Network

Gamma map Output image 

Darker reference

images 





Brighter reference

images 



Input image Output image

from previous epoch

Pseudo GT

Image Generator

Pseudo GT image 

……

Reconstruction

loss

Total variation

loss

       

Forward path Backward path

Figure 2. Overview of our proposed framework which comprises three main modules: reference image generator, pseudo GT image

generator, and enhancement network. Given the input image I, the reference image generator randomly generates multiple reference

images with different exposure values, half of them are brighter than the input image while the rest are darker. The pseudo GT image

generator then takes the input image, the output image from the previous epoch, and the generated reference images as input to produce

the pseudo GT image Twhich is visually better than each of the input components alone based on our proposed non-reference scoring

criteria. Finally, the enhancement network predicts the gamma map γto transform the original image Ito obtain the output image Y. The

enhancement network is trained with two loss functions: reconstruction loss between the output image Yand the pseudo GT image T,

and the total variation loss applied on the predicted gamma map γto imply the smoothness in prediction. It is worth noting that only the

enhancement network is used in testing.

approach aims to reconstruct the corresponding enhanced

image Y, which is visibly better and visually pleasant in

terms of contrast and color without any supervision.

To address the problem, our key contribution is to pro-

pose a new self-supervised learning strategy for training the

image enhancement network. That is, we randomly synthe-

size a set of reference images to be combined together to

produce a synthetically high-quality GT image for training.

The way of combination is driven by the human knowledge

of how visually good an image is. To our best knowledge,

our unsupervised method is the ﬁrst to produce pseudo-GT

images for training on a large set of ill-exposed images;

while other data synthesized methods use well-exposed im-

ages as GT to generate corresponding ill-exposed inputs.

By using this approach, our model does not suffer from the

domain gap issue. Compared with image fusion, which only

produces a single output image for input, our pseudo GT

images are progressively improved after each epoch, mak-

ing our model adapt to a wide range of lighting conditions

(see Sec. 4 for empirical evidence).

In detail, our reference image generator ﬁrst takes an im-

age as input and generates 2Nimages where the ﬁrst N

images are darker and the rest are brighter compared to the

original input image. Then, the pseudo GT generator mod-

ule uses these reference images along with the input and the

previous prediction of the enhancement network to create

the pseudo GT image. It is worth noting that including the

previous prediction in the set of references ensures that the

quality of the pseudo GT image is greater or at least equal

to the previous prediction according to our proposed non-

reference score, thus making our training progressively im-

proved. Our training framework is illustrated in Fig. 2 and

the detail of each module will be described in the following

sections.

3.1. Random Reference Image Generation

To synthesize an under/over-exposed image, we employ

a gamma mapping function, which is a nonlinear operation

often used to adjust the overall brightness of an image in

the image processing pipeline [31]. The gamma mapping

function is based on the observation that human eyes per-

ceive the relative change in the light following a power-

law function rather than a linear function as in cameras

[13]. The connection between the gamma mapping func-

tion and the human visual system enables the gamma map-

ping function to be widely used in image contrast enhance-

ment [7, 32, 36]. However, rather than apply the gamma

function directly to the original image, we adopt a haze re-

moval technique in which we apply it to the inverted image

to generate 2Nreference images Ynas shown in Eq. (1).

The reason is that hazy images and poor lighting images

normally share the same property of low dynamic range

with the high noise level. Therefore, haze removal tech-

niques (e.g. using an inverse image) can be used to enhance

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PSENet:ProgressiveSelf-EnhancementNetworkforUnsupervisedExtreme-LightImageEnhancementHueNguyenDiepTranKhoiNguyenRangNguyenVinAIResearch,Vietnam{v.huent88,v.diepttn147,v.khoindm,v.rangnhm}@vinai.ioAbstractTheextremesoflighting(e.g.toomuchortoolittlelight)usuallycausemanytroublesformachineandhu-manvis...

展开>> 收起<<

PSENet Progressive Self-Enhancement Network for Unsupervised Extreme-Light Image Enhancement Hue Nguyen Diep Tran Khoi Nguyen Rang Nguyen.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

PSENet Progressive Self-Enhancement Network for Unsupervised Extreme-Light Image Enhancement Hue Nguyen Diep Tran Khoi Nguyen Rang Nguyen

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: