selecting the best regions from different sources where al-
most possible cases of exposure are simulated in training.
Furthermore, by using the output of the previous epoch as
a source for choosing, we ensure that the output of the cur-
rent epoch will be better than or at least equal to that of the
previous one, giving the name of our approach PSENet –
Progressive Self Enhancement Network.
Our contributions are summarized as follows:
• We introduce a new method for generating effective
pseudo-GT images from given wrongly-exposed im-
ages. The generating process is driven by a new non-
reference score reflecting the human evaluation of a
visually good image.
• We propose a novel unsupervised progressive pseudo-
GT-based approach that is robust to various severe
lighting conditions, i.e. under-exposure and over-
exposure. As a result, the burden of gathering the
matched image pairs is removed.
• Comprehensive experiments are conducted to show
that our approach outperforms previous unsupervised
methods by large margins on the SICE [3] and Afifi
[1] datasets and obtains comparable results with su-
pervised counterparts.
2. Related Work
Image enhancement approaches can be divided into two
categories: traditional and learning-based methods.
Traditional methods. One of the simplest and fastest ap-
proaches is to transform single pixels of an input image
by a mathematical function such as linear function, gamma
function, or logarithmic function. For example, histogram
equalization-based algorithms stretch out the image’s inten-
sity range using the cumulative distribution function, result-
ing in the image’s increased global contrast. The Retinex
theory [16], on the other hand, argues that an image is
made from two components: reflectance and illumination.
By estimating the illumination component of an image, the
dynamic range of the image can be easily adjusted to re-
produce images with better color contrast. However, most
Retinex algorithms use Gaussian convolution to estimate il-
lumination, thus leading to blurring edges [40]. Frequency-
domain-based methods, by contrast, preserve edges by em-
ploying the high-pass filter to enhance the high-frequency
components in the Fourier transform domain [38]. How-
ever, the adaptability of such traditional methods is often
limited due to their unawareness of the overall and local
complex gray distribution of an image [40]. For a system-
atic review of conventional approaches, we suggest readers
refer to the work of Wang et al. [40].
Learning-based methods. In recent years, there has been
increasing attention to learning-based photo-enhancing
methods in both supervised and unsupervised manners.
Supervised learning methods aim to recover natural im-
ages by either directly outputting the high quality images
[22, 23, 20, 42] or learning specific parameters of a para-
metric model (e.g. Retinex model) [6, 39, 26] from a paired
dataset. SID [5] is a typical example in the first direction.
In this work, the authors collect a short-exposure low-light
image dataset and adopt a vanilla Unet architecture [34] to
produce an enhanced sRGB image from raw data thus re-
placing the traditional image processing pipeline. Follow-
ing this work, Lamba and Mitra [14] present a novel net-
work architecture that concurrently processes all the scales
of an image and can reduce the latency times by 30% with-
out decreasing the image quality. Different from the previ-
ously mentioned approaches, Cai et al. [3] explore a new
direction in which both under and over-exposed images are
considered. They introduce a novel two-stage framework
trained on their own multi-exposure image dataset, which
enhances the low-frequency and high-frequency compo-
nents separately before refining the whole image in the sec-
ond stage. Afifi et al. [1] put a further step in this direc-
tion by introducing a larger dataset along with a coarse-
to-fine neural network to enhance image qualities in both
under- and over-exposure cases. For learning a parametric
model, Retinex theory [15] is often adopted [41, 19, 39].
Benefiting from paired data, the authors focus on design-
ing networks to estimate the reflectance and illumination
of an input image. Dealing with the image enhancement
task differently, HDRNet [6] presents a novel convolutional
neural network to predict the coefficients of a locally-affine
model in bilateral space using pairs of input/output images.
Unsupervised learning. Collecting paired training data is
always time-consuming and expensive. To address this is-
sue, an unpaired GAN-based method named EnlightenGAN
is proposed in [10]. The network, including an attention-
guided U-Net as a generator and global-local discrimina-
tors, shows promising results even though the correspond-
ing ground truth image is absent. To further reduce the cost
of collecting reference ground truth images, a set of meth-
ods [44, 46, 8, 18] that do not require paired or unpaired
training data are proposed. Two recent methods in this cat-
egory named ZeroDCE [8] and Zheng and Gupta [45] show
impressive results in low-light image enhancement tasks by
using a CNN model trained under a set of no reference loss
functions to learn an image-specific curve for producing a
high-quality output image. However, these methods seem
to perform poorly when extending to correct over-exposed
images, as shown in Fig. 1.
Our proposed method, in contrast, is the first deep learn-
ing work handling these extreme lighting conditions in an
unsupervised manner.
3. Methodology
Given an sRGB image, I, captured under a harsh light-
ing condition with low contrast and washed-out color, our