1 Perceptual Multi-Exposure Fusion Xiaoning Liu

2025-04-30 0 0 7.43MB 13 页 10玖币
侵权投诉
1
Perceptual Multi-Exposure Fusion
Xiaoning Liu
Abstract—As an ever-increasing demand for high dynamic
range (HDR) scene shooting, multi-exposure image fusion (MEF)
technology has abounded. In recent years, multi-scale exposure
fusion approaches based on detail-enhancement have led the
way for improvement in highlight and shadow details. Most
of such methods, however, are too computationally expensive
to be deployed on mobile devices. This paper presents a per-
ceptual multi-exposure fusion method that not just ensures fine
shadow/highlight details but with lower complexity than detail-
enhanced methods. We analyze the potential defects of three
classical exposure measures in lieu of using detail-enhancement
component and improve two of them, namely adaptive Well-
exposedness (AWE) and the gradient of color images (3-D
gradient). AWE designed in YCbCr color space considers the
difference between varying exposure images. 3-D gradient is
employed to extract fine details. We build a large-scale multi-
exposure benchmark dataset suitable for static scenes, which
contains 167 image sequences all told. Experiments on the con-
structed dataset demonstrate that the proposed method exceeds
existing eight state-of-the-art approaches in terms of visually
and MEF-SSIM value. Moreover, our approach can achieve a
better improvement for current image enhancement techniques,
ensuring fine detail in bright light.
Index Terms—High dynamic range, multi-scale fusion, multi-
exposure fusion, Laplacian pyramid, image enhancement.
I. INTRODUCTION
HIGH dynamic range (HDR) imaging technique could
make the image captured in extremely bright or dark
condition crispy and faithfully accessible to a real-world
scene. It has increasingly catered to mobile devices, videos,
autonomous vehicles and so on [1], [2]. Unfortunately, its wide
range of applications is affected by expensive equipment cost
and visualizing on a standard display with a limited dynamic
range which usually relies on tone mapping operations [3]–
[8]. To mitigate these limitations, multi-exposure image fu-
sion (MEF) technology, also termed exposure bracketing [9],
merges multiple exposure images captured from same scenes
with different exposure time into a spectacular HDR image
abounded with desirable detail information. Since MEF tech-
nology simplifies the HDR imaging pipeline, it has recently
accommodated smart cameras, particularly smartphones. MEF
techniques, however, inevitably lead to unwelcome artifacts
like ghosting and tearing when encountering moving objects
or camera shake [9]–[12]. To overcome this challenge, many
HDR deghosting algorithms [13]–[20] have been proposed.
Among them, Tursun et al.carried out an in-depth survey of
HDR deghosting [16] and proposed an objective deghosting
quality metric to avoid the bias of subjective evaluations [17].
Although ghost-free methods have made significant headway
over the past decade, removing ghosting artifacts is still the
X. Liu is with the School of Information and Communication Engineering,
University of Electronic Science and Technology of China, Chengdu, 611731,
China. E-mail: liuxiaoning2016@sina.com
greatest challenge to MEF and HDR imaging for dynamic
scenes [21]. This work assumes input images to be all well-
aligned for static scenes. The past two decades have seen
a significant amount of work in MEF community. They are
generally classified into four categories: multi-scale transform-
based methods, statistical model-based methods, patch-based
methods, and deep learning-based methods.
A. Multi-Scale Transform-Based Methods
Initiatively guided by three intrinsic image quality metrics
namely contrast, saturation and well-exposedness, Mertens
et al.[22] constructed the weight maps to blend multiple
exposure images in the framework of Laplacian pyramid (LP)
[23]. Since multi-scale technique can reduce unpleasant halo
artifacts around edges and alleviate the seam problem across
object boundaries to some extent, multi-scale transform-based
MEF approaches [22], [24]–[27], especially those based on
LP [22], [25]–[27], have gained ground in popularity. Two
novel exposure measures, visibility and consistency in [28],
are developed based on the scrupulous observation that the
gradient magnitude will gradually decrease among over/under-
exposure areas and the gradient direction will change as
the object moves. In order to reduce the loss of detail in
multi-scale fusion, Li et al.[29] introduced a new quadratic
optimization scheme in the gradient field. The sharper image
is finally synthesized by combining the extracted detail with
an intermediate image generated by the MEF method [22]. It
is unfortunate that the computational efficiency in [29] cannot
deploy mobile devices due to the need of solving a quadratic
optimization problem by means of an iterative method. Shen
et al.[30] derived a novel boosting Laplacian pyramid which
boosts the structure of detail and base layers, respectively, and
designed a hybrid exposure weight. As the optimal weights
[30] computed by a global optimization may over-smoothing
the final weight map, Li et al.[31], [32] used edge-preserving
filters, namely recursive filter [33] and guided filter [34], to
refine the resulting weight map under a two-scale framework.
Furthermore, to ensure high levels of detail with well-
controlled artifacts even in low/high-light scenarios, the works
in [35]–[37] recently integrated multi-scale technique with
detail-enhanced smoothing pyramid relied on weighted guided
image filter [38], gradient domain guided image filter [39],
and fast weighted least square filter [40], respectively. Noted
that the detail-enhancement technology [35] is also beneficial
to low-light and back-light imaging. Because a fast weighted
least square based optimization problem is subjected to a
gradient constraint, the speed of detail extraction component in
[37] is significantly faster than that [29]. Experimental results
in [41] demonstrated that [36] ranks first according to quality
metric MEF-SSIM [42]. Even though [35]–[37] are capable of
arXiv:2210.09604v3 [cs.CV] 5 Mar 2025
2
high levels of detail in bright and dark light, the complexity
of these algorithms may be a hindrance to mobile devices.
Lately, in the presence of the multi-scale fusion scheme [22],
a simplified detail enhancement component [43] was presented
in the YUV color space.
B. Statistical Model-Based and Patch-Based Methods
Statistical approaches [44]–[46] based on perceived qual-
ity measures and patch-based methods [47]–[49] have been
improved a lot over the past decade. Integrating visual mea-
sures, namely local contrast and color information, Shen
et al.proposed two different frameworks, generalized random
walks [44] and hierarchical multivariate Gaussian conditional
random field [46], respectively. Song et al.[45] converted
MEF into a maximum posteriori (MAP) framework. Exper-
imental results in [48] showed that the final fused image
[45] tends to be noisy due to lack of explicit refinement of
weights. To reduce noise and remove ghosting from a dynamic
scene, Ma et al.[47], [48] originally decomposed each image
patch into three conceptually independent components, namely
signal strength, signal structure, and mean intensity, and then
extracted ones upon patch strength, exposedness and structural
consistency measures. The benefits of this novel decomposi-
tion abound, one of which is that the direction information of
signal structure can guide us to verify structural consistency
for producing ghost-free images.
C. Deep Learning-Based Methods
Lately, MEF approaches [50]–[53] based on convolutional
neural network (CNN) have come a long way from a past
featured by hand-rafted features [22]–[49]. Prabhakar et al.
[50] recently proposed a CNN based architecture that ac-
tually consists of two methods. One is to train the CNN
model with the results selected from two MEF methods
[22], [32] that are considered the top-ranking ones at the
time as the “ground truth”. The other is to learn CNN by
utilizing a no-reference quality metric MEF-SSIM [42] as
loss function. Considering the lack of ground-truth for MEF,
Li et al.[51] extracted features from pre-trained CNN on
other tasks to calculate weight maps, which is also suitable
for dynamic scenes, while the model [50] is only for static
scenes. A variable augmented neural network [52] for de-
colorization was designed in the gradient domain. The network
is also employed in blending multiple exposure images by
revealing the relationship between de-colorization and MEF.
The network [52], however, can only feed three exposure
images at the same time, which greatly limits its practical
application. Zhang et al.[53] most recently proposed a multi-
modal image fusion framework based on CNN, which takes
advantage of two convolutional layers to extract low-level
features from multiple input images and blends these features
through alternative strategies. [53] offers reasonable texture
and detail preservation in the shadow regions, aside from some
visually noticeable noise and artifacts. Different from image
denoising and super-resolution, in the field of MEF, the lack
of ground truth is still the main stumbling block for deep
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
Fig. 1. Example results of several different MEF methods. (a) Mertens09
[22]. (b) S.Li13 [32]. (c) Shen14 [30]. (d) Kou17 [36]. (e) Z.Li17 [35]. (f)
Ma17 [48]. (g) Wang19 [43]. (h) Zhang19 [53]. (i) Proposed. Green and red
boxes are zoom-in regions for taking a closer look. One can see that our
method retains more details in the highlights and shadow regions compared
to Mertens09 [22].
learning. In addition, CNN-based methods require expensive
GPU resources to store model parameters.
Although [22] is simple and efficient, it suffers visible loss
of detail in shadows and exposed areas (see Fig. 1(a)). The
previous detail-enhancement components-based methods [25],
[29], [35]–[37], [43] can add details to the final resulting image
while the process is usually time-consuming and unfriendly to
mobile devices, especially smart phones. In this article, we re-
examine three classic exposure metrics [22] and improve two
of them. Under the LP framework, our method can maintain
fine details in the highlights and capture better exposure in the
shadow areas. Overall, our contributions are four-fold:
1) We propose a perceptual multi-exposure fusion method
in the LP domain. Based on the difference between
varying exposure images and cross-channel correlation
of color images, we reanalyze the potential flaws of three
classic exposure metrics [22] and improve two of them.
2) We find that traditional saturation definition is sensitive
to grayscale content of color images. A adaptive well-
exposedness (AWE) is designed in YCbCr color space,
which considers the differences between images in an
exposure sequence. The gradient of a color image (3-
D gradient) is employed to extract fine details, which
explores the cross-channel correlation of color imags.
3) This work constructs a large-scale multi-exposure
dataset for static scenes that consists of 167 exposure se-
quences and covers a wide range of scenes. Experiments
on the constructed dataset indicate that our method
excels existing eight state-of-the-art approaches in term
of both visual and quantitative results.
4) The proposed method can facilitate current image en-
hancement technologies, recording more detail in the
highlight areas. All datasets and our MATLAB code are
3
available at1.
The rest of this paper is structured as follows. In section II, we
briefly review three classical exposure quality metrics and LP
fusion framework. Section III describes the proposed MEF
approach in detail. Experimental results and evaluation are
provided in Section IV. The final Section Vsummarizes the
paper and proposes further discussion.
II. RELATED WORK
In this section, we review three classical exposure qual-
ity metrics [22] and multi-scale fusion framework based on
Laplace pyramid [23].
A. Exposure Quality Metrics
1) Contrast: The grayscale version of a color image is
filtered with Laplace operator and then the absolute value of
the filter response is taken as contrast denoted by C, which
will give larger weight to the edges and textures than flat areas.
2) Saturation: Saturation can reflect over/under-exposed
areas under most conditions, which is calculated by the
standard deviation within the three channels, R,G, and B
at the corresponding pixel. Thus, saturation Scan be written
as
S=q(IRIµ)2+ (IGIµ)2+ (IBIµ)2,(1)
Iµ=1
3(IR+IG+IB).(2)
where Iiis the intensity on the i-th channel, i∈ {R, G, B},
and Iµdenotes the corresponding mean of three channels.
3) Well-exposedness: Each pixel is weighted in the form of
a Gaussian curve. The well-exposedness Eis defined as:
E=ER·EG·EB,(3)
Ei= exp (Ii0.5)2
2σ2!, i =R, G, B. (4)
where “·” denotes the pixel-wise multiplication and parameter
σis set to 0.2.
According to above the three exposure quality measures
[22], the initial weight map is given by,
Wn(x, y) = Cn(x, y)wC·Sn(x, y)wS·En(x, y)wE.(5)
where (x, y)and ndenote the location of pixel and n-th input
exposure image, respectively. wC,wSand wErefer to the
corresponding weighted indices, which default to 1.
B. Multi-Scale Fusion Framework
Firstly, to ensure space content consistency, each weight
map is normalized by
ˆ
Wn(x, y) = "N
X
n=1
Wn(x, y)#1
Wn(x, y),(6)
1https://github.com/hangxiaotian/Perceptual-Multi-exposure-Image-Fusion
Input exposure sequence
Weighting map
AWE
,1n
W
3-D Gradient
,2n
W
Weighting map
Fused image
F
LP fusion
Fig. 2. The pipeline of the proposed MEF method.
The final resulting image Fis generated by
F(x, y) =
N
X
n=1
ˆ
Wn(x, y)In(x, y).(7)
Unfortunately, using Eq. (7) directly will cause seam prob-
lems. To address the issue, Mertens et al.[22] adopted LP
framework. LP is used to decompose input image sequence,
and then pyramid images with different scales are blended.
The steps are as follows: assumed that there are Nexposure
images, and Nnormalized weight maps served as alpha masks.
Let L{A}lrepresent that image Ais decomposed into l-
th layer (l∈ {1,2,· · · , L}) in the LP decomposition, and
analogously G{B}ldenotes the corresponding decomposed
image of Gaussian pyramid. Thus, the pyramid is yielded by
L{F(x, y)}l=
N
X
n=1
Gnˆ
Wn(x, y)ol
L{In(x, y)}l.(8)
And finally, the Laplacian pyramid L{F(x, y)}lis collapsed
to generate the resulting image F.
III. PROPOSED MULTI-EXPOSURE IMAGE FUSION
This section presents the proposed MEF method. We first
analyze the defects of saturation, present an adaptive well-
exposedness (AWE) and introduce the gradient of color im-
ages, i.e., 3-D gradient that is the generalization of conven-
tional 2-D gradient. Then, we employ LP framework [23] to
generate the final image. The pipeline of the proposed method
is shown in Fig. 2.
Fig. 3. A synthetic color image with a size of 256 ×256 where the pixel
value of three channels in each row is the same and increases line by line
from 0 to 255.
摘要:

1PerceptualMulti-ExposureFusionXiaoningLiuAbstract—Asanever-increasingdemandforhighdynamicrange(HDR)sceneshooting,multi-exposureimagefusion(MEF)technologyhasabounded.Inrecentyears,multi-scaleexposurefusionapproachesbasedondetail-enhancementhaveledthewayforimprovementinhighlightandshadowdetails.Mosto...

展开>> 收起<<
1 Perceptual Multi-Exposure Fusion Xiaoning Liu.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:7.43MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注