MM ’22, October 10–14, 2022, Lisboa, Portugal Wenjing Wang, Zhengbo Xu, Haofeng Huang, & Jiaying Liu
response functions, we design a new model paradigm: deep concave
curve, which can determine the new pixel value in the enhanced
result with a high degree of freedom. To eectively satisfy concavity,
we propose to rst predict a non-positive second derivative, then
apply discrete integral implemented by convolutions. To train this
model towards unsupervised adaptation, we design asymmetric self-
supervised alignment. On the normal light side, we learn decision
heads with a self-supervised pretext task. Then on the low-light
side, we x the decision heads and let our model improve the pretext
task performance through enhancing the input image. In this way,
even without annotated data, our model can learn how to make
the machine analytics model better perceive the enhanced low-
light image. To make full use of image information and provide
good guidance for illumination enhancement, we propose a new
rotated jigsaw permutation task. Experiments show that our model
architecture and training design are compatible with each other. On
one hand, our self-learned strategy can better restore illumination
compared with other feature adaptation strategies; on the other
hand, our deep concave curve can best maximize the potential of
self-learned illumination alignment.
The proposed illumination enhancement model, self-aligned con-
cave curve (SACC), can serve as a powerful tool for unsupervised
low-light adaptation. Although SACC does not require normal or
low-light annotations and does not even adjust the downstream
model, it achieves superior performance on a variety of low-light
vision tasks. To further deal with noises and semantic domain gaps,
we propose to adapt downstream analytics models by pseudo la-
beling. Finally, we build an adaptation framework SACC+, which
is concise and easy to implement but can outperform existing low-
light enhancement and adaptation methods by a large margin.
In summary, our contributions are threefold:
•
We are the rst to propose a learnable pure illumination
enhancement model for high-level vision. Inspired by cam-
era response functions, we design a deep concave curve.
Through discrete integral, the concavity constraint can be
satised through the model architecture itself.
•
Towards unsupervised normal-to-low light adaptation, we
design an asymmetric cross-domain self-supervised training
strategy. Guided by the rotated jigsaw permutation pretext
task, our curve can adjust illumination from the perspective
of machine vision.
•
To verify the eectiveness of our method, we explore various
high-level vision tasks, including classication, detection,
action recognition, and optical ow estimation. Experiments
demonstrate our superiority over both low-light enhance-
ment and adaptation state-of-the-art methods.
2 RELATED WORKS
Low-light Enhancement.
Early methods manually design illu-
mination models and enhancement strategies. In the Retinex the-
ory [
31
], images are decomposed into reectance (albedo) and shad-
ing (illumination). On this basis, Retinex-based methods [
10
,
15
]
rst decompose images and then either separately or simultane-
ously process the two components. Histogram equalization and its
variants [48] instead redistribute the intensities on the histogram.
Recent methods are mainly based on deep learning. Some mod-
els mimic the Retinex decomposition process [
60
,
70
]. RUAS [
39
]
unrolls the optimization process of Retinex-inspired models and
searches desired network architectures. EnlightenGAN [
27
] intro-
duces adversarial learning. Zero-DCE [
14
] designs a curve-based
low-light enhancement model and learns in a zero-reference way.
Some methods also target RAW images [
5
], videos [
4
,
25
], and in-
troduce extra light sources [
62
,
63
]. Interested readers may refer to
[37] and [33] for comprehensive surveys.
Existing low-light enhancement methods disregard downstream
machine learning tasks. In comparison, our model targets high-level
vision and greatly benets machine vision in low-light scenarios.
High-Level Vision in Low-light scenarios.
With an increasing
demand for autonomous driving and surveillance analysis, low-light
high-level vision has attracted ever-higher attention in recent years.
For dark object detection, Sasagawa et al. [
50
] merged pretrained
models in dierent domains with glue layers and generative knowl-
edge distillation. MAET [
6
] learns through jointly decoding de-
grading transformations and detection predictions. HLA-Face [
58
]
adopts a joint pixel-level and feature-level adaptation framework.
For nighttime semantic segmentation, DANNet [
61
] employs ad-
versarial training to adapt models in one stage without additional
day-night image transferring. For general tasks, CIConv [
32
] de-
signed a color invariant representation. Some works also focus on
tasks of image retrieval [
24
], depth estimation [
57
], and match-
ing [52] in low-light conditions.
Despite all these progress on high-level vision in low-light sce-
narios, many methods rely on low-light annotations, which are
neither robust nor exible enough. Existing unsupervised adapta-
tion methods focus on feature migration and ignore the importance
of pixel-level adjustment. Based on illumination enhancement, we
propose a new method for low-light adaptation that outperforms
existing methods by a wide margin.
3 DEEP CONCAVE CURVE
In this section, we introduce the motivation and detailed architec-
ture of our illumination enhancement model.
3.1 From CRF to Concave Curve
Digital photographic cameras use camera response functions (CRFs)
when mapping irradiance to intensities. Although scene illumina-
tion changes linearly on the irradiance level, to t the logarithmic
perception of human vision, cameras employ non-linear CRFs, mak-
ing illumination adjustment complicated on the intensity level. To
utilize the linearity of irradiance, some low-light enhancement
methods [
19
,
20
] transform intensities to irradiance, adjust the ir-
radiance, and then map irradiance back to intensities. However,
back-and-forth irradiance
↔
intensity mapping is inconvenient
and dicult to introduce high-level machine vision guidance.
We propose to simplify the above complex pipeline into one
single intensity-level adjustment, which is denoted by
𝑔
. We rst
analyze what form
𝑔
should take. Ignoring spatial variations like
lens fall-o [
1
], vignetting, and signal-dependent noise, CRF can be
assumed to be the same for each pixel in an image. Accordingly, we
set 𝑔to be spatially shared. Second, to follow the numerical range
of pixels and preserve order,
𝑔
should pass
(
0
,
0
)
,
(
1
,
1
)
, and in-
crease monotonically. Additionally, although pixels are discrete, we
want
𝑔
to appear roughly continuous, i.e., like a curve. Despite the