1 Perceptual Multi-Exposure Fusion Xiaoning Liu

2025-04-30 0 0 7.43MB 13 页 10玖币

侵权投诉

Perceptual Multi-Exposure Fusion

Xiaoning Liu

Abstract—As an ever-increasing demand for high dynamic

range (HDR) scene shooting, multi-exposure image fusion (MEF)

technology has abounded. In recent years, multi-scale exposure

fusion approaches based on detail-enhancement have led the

way for improvement in highlight and shadow details. Most

of such methods, however, are too computationally expensive

to be deployed on mobile devices. This paper presents a per-

ceptual multi-exposure fusion method that not just ensures ﬁne

shadow/highlight details but with lower complexity than detail-

enhanced methods. We analyze the potential defects of three

classical exposure measures in lieu of using detail-enhancement

component and improve two of them, namely adaptive Well-

exposedness (AWE) and the gradient of color images (3-D

gradient). AWE designed in YCbCr color space considers the

difference between varying exposure images. 3-D gradient is

employed to extract ﬁne details. We build a large-scale multi-

exposure benchmark dataset suitable for static scenes, which

contains 167 image sequences all told. Experiments on the con-

structed dataset demonstrate that the proposed method exceeds

existing eight state-of-the-art approaches in terms of visually

and MEF-SSIM value. Moreover, our approach can achieve a

better improvement for current image enhancement techniques,

ensuring ﬁne detail in bright light.

Index Terms—High dynamic range, multi-scale fusion, multi-

exposure fusion, Laplacian pyramid, image enhancement.

I. INTRODUCTION

HIGH dynamic range (HDR) imaging technique could

make the image captured in extremely bright or dark

condition crispy and faithfully accessible to a real-world

scene. It has increasingly catered to mobile devices, videos,

autonomous vehicles and so on [1], [2]. Unfortunately, its wide

range of applications is affected by expensive equipment cost

and visualizing on a standard display with a limited dynamic

range which usually relies on tone mapping operations [3]–

[8]. To mitigate these limitations, multi-exposure image fu-

sion (MEF) technology, also termed exposure bracketing [9],

merges multiple exposure images captured from same scenes

with different exposure time into a spectacular HDR image

abounded with desirable detail information. Since MEF tech-

nology simpliﬁes the HDR imaging pipeline, it has recently

accommodated smart cameras, particularly smartphones. MEF

techniques, however, inevitably lead to unwelcome artifacts

like ghosting and tearing when encountering moving objects

or camera shake [9]–[12]. To overcome this challenge, many

HDR deghosting algorithms [13]–[20] have been proposed.

Among them, Tursun et al.carried out an in-depth survey of

HDR deghosting [16] and proposed an objective deghosting

quality metric to avoid the bias of subjective evaluations [17].

Although ghost-free methods have made signiﬁcant headway

over the past decade, removing ghosting artifacts is still the

X. Liu is with the School of Information and Communication Engineering,

University of Electronic Science and Technology of China, Chengdu, 611731,

China. E-mail: liuxiaoning2016@sina.com

greatest challenge to MEF and HDR imaging for dynamic

scenes [21]. This work assumes input images to be all well-

aligned for static scenes. The past two decades have seen

a signiﬁcant amount of work in MEF community. They are

generally classiﬁed into four categories: multi-scale transform-

based methods, statistical model-based methods, patch-based

methods, and deep learning-based methods.

A. Multi-Scale Transform-Based Methods

Initiatively guided by three intrinsic image quality metrics

namely contrast, saturation and well-exposedness, Mertens

et al.[22] constructed the weight maps to blend multiple

exposure images in the framework of Laplacian pyramid (LP)

[23]. Since multi-scale technique can reduce unpleasant halo

artifacts around edges and alleviate the seam problem across

object boundaries to some extent, multi-scale transform-based

MEF approaches [22], [24]–[27], especially those based on

LP [22], [25]–[27], have gained ground in popularity. Two

novel exposure measures, visibility and consistency in [28],

are developed based on the scrupulous observation that the

gradient magnitude will gradually decrease among over/under-

exposure areas and the gradient direction will change as

the object moves. In order to reduce the loss of detail in

multi-scale fusion, Li et al.[29] introduced a new quadratic

optimization scheme in the gradient ﬁeld. The sharper image

is ﬁnally synthesized by combining the extracted detail with

an intermediate image generated by the MEF method [22]. It

is unfortunate that the computational efﬁciency in [29] cannot

deploy mobile devices due to the need of solving a quadratic

optimization problem by means of an iterative method. Shen

et al.[30] derived a novel boosting Laplacian pyramid which

boosts the structure of detail and base layers, respectively, and

designed a hybrid exposure weight. As the optimal weights

[30] computed by a global optimization may over-smoothing

the ﬁnal weight map, Li et al.[31], [32] used edge-preserving

ﬁlters, namely recursive ﬁlter [33] and guided ﬁlter [34], to

reﬁne the resulting weight map under a two-scale framework.

Furthermore, to ensure high levels of detail with well-

controlled artifacts even in low/high-light scenarios, the works

in [35]–[37] recently integrated multi-scale technique with

detail-enhanced smoothing pyramid relied on weighted guided

image ﬁlter [38], gradient domain guided image ﬁlter [39],

and fast weighted least square ﬁlter [40], respectively. Noted

that the detail-enhancement technology [35] is also beneﬁcial

to low-light and back-light imaging. Because a fast weighted

least square based optimization problem is subjected to a

gradient constraint, the speed of detail extraction component in

[37] is signiﬁcantly faster than that [29]. Experimental results

in [41] demonstrated that [36] ranks ﬁrst according to quality

metric MEF-SSIM [42]. Even though [35]–[37] are capable of

arXiv:2210.09604v3 [cs.CV] 5 Mar 2025

high levels of detail in bright and dark light, the complexity

of these algorithms may be a hindrance to mobile devices.

Lately, in the presence of the multi-scale fusion scheme [22],

a simpliﬁed detail enhancement component [43] was presented

in the YUV color space.

B. Statistical Model-Based and Patch-Based Methods

Statistical approaches [44]–[46] based on perceived qual-

ity measures and patch-based methods [47]–[49] have been

improved a lot over the past decade. Integrating visual mea-

sures, namely local contrast and color information, Shen

et al.proposed two different frameworks, generalized random

walks [44] and hierarchical multivariate Gaussian conditional

random ﬁeld [46], respectively. Song et al.[45] converted

MEF into a maximum posteriori (MAP) framework. Exper-

imental results in [48] showed that the ﬁnal fused image

[45] tends to be noisy due to lack of explicit reﬁnement of

weights. To reduce noise and remove ghosting from a dynamic

scene, Ma et al.[47], [48] originally decomposed each image

patch into three conceptually independent components, namely

signal strength, signal structure, and mean intensity, and then

extracted ones upon patch strength, exposedness and structural

consistency measures. The beneﬁts of this novel decomposi-

tion abound, one of which is that the direction information of

signal structure can guide us to verify structural consistency

for producing ghost-free images.

C. Deep Learning-Based Methods

Lately, MEF approaches [50]–[53] based on convolutional

neural network (CNN) have come a long way from a past

featured by hand-rafted features [22]–[49]. Prabhakar et al.

[50] recently proposed a CNN based architecture that ac-

tually consists of two methods. One is to train the CNN

model with the results selected from two MEF methods

[22], [32] that are considered the top-ranking ones at the

time as the “ground truth”. The other is to learn CNN by

utilizing a no-reference quality metric MEF-SSIM [42] as

loss function. Considering the lack of ground-truth for MEF,

Li et al.[51] extracted features from pre-trained CNN on

other tasks to calculate weight maps, which is also suitable

for dynamic scenes, while the model [50] is only for static

scenes. A variable augmented neural network [52] for de-

colorization was designed in the gradient domain. The network

is also employed in blending multiple exposure images by

revealing the relationship between de-colorization and MEF.

The network [52], however, can only feed three exposure

images at the same time, which greatly limits its practical

application. Zhang et al.[53] most recently proposed a multi-

modal image fusion framework based on CNN, which takes

advantage of two convolutional layers to extract low-level

features from multiple input images and blends these features

through alternative strategies. [53] offers reasonable texture

and detail preservation in the shadow regions, aside from some

visually noticeable noise and artifacts. Different from image

denoising and super-resolution, in the ﬁeld of MEF, the lack

of ground truth is still the main stumbling block for deep

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Fig. 1. Example results of several different MEF methods. (a) Mertens09

[22]. (b) S.Li13 [32]. (c) Shen14 [30]. (d) Kou17 [36]. (e) Z.Li17 [35]. (f)

Ma17 [48]. (g) Wang19 [43]. (h) Zhang19 [53]. (i) Proposed. Green and red

boxes are zoom-in regions for taking a closer look. One can see that our

method retains more details in the highlights and shadow regions compared

to Mertens09 [22].

learning. In addition, CNN-based methods require expensive

GPU resources to store model parameters.

Although [22] is simple and efﬁcient, it suffers visible loss

of detail in shadows and exposed areas (see Fig. 1(a)). The

previous detail-enhancement components-based methods [25],

[29], [35]–[37], [43] can add details to the ﬁnal resulting image

while the process is usually time-consuming and unfriendly to

mobile devices, especially smart phones. In this article, we re-

examine three classic exposure metrics [22] and improve two

of them. Under the LP framework, our method can maintain

ﬁne details in the highlights and capture better exposure in the

shadow areas. Overall, our contributions are four-fold:

1) We propose a perceptual multi-exposure fusion method

in the LP domain. Based on the difference between

varying exposure images and cross-channel correlation

of color images, we reanalyze the potential ﬂaws of three

classic exposure metrics [22] and improve two of them.

2) We ﬁnd that traditional saturation deﬁnition is sensitive

to grayscale content of color images. A adaptive well-

exposedness (AWE) is designed in YCbCr color space,

which considers the differences between images in an

exposure sequence. The gradient of a color image (3-

D gradient) is employed to extract ﬁne details, which

explores the cross-channel correlation of color imags.

3) This work constructs a large-scale multi-exposure

dataset for static scenes that consists of 167 exposure se-

quences and covers a wide range of scenes. Experiments

on the constructed dataset indicate that our method

excels existing eight state-of-the-art approaches in term

of both visual and quantitative results.

4) The proposed method can facilitate current image en-

hancement technologies, recording more detail in the

highlight areas. All datasets and our MATLAB code are

available at1.

The rest of this paper is structured as follows. In section II, we

brieﬂy review three classical exposure quality metrics and LP

fusion framework. Section III describes the proposed MEF

approach in detail. Experimental results and evaluation are

provided in Section IV. The ﬁnal Section Vsummarizes the

paper and proposes further discussion.

II. RELATED WORK

In this section, we review three classical exposure qual-

ity metrics [22] and multi-scale fusion framework based on

Laplace pyramid [23].

A. Exposure Quality Metrics

1) Contrast: The grayscale version of a color image is

ﬁltered with Laplace operator and then the absolute value of

the ﬁlter response is taken as contrast denoted by C, which

will give larger weight to the edges and textures than ﬂat areas.

2) Saturation: Saturation can reﬂect over/under-exposed

areas under most conditions, which is calculated by the

standard deviation within the three channels, R,G, and B

at the corresponding pixel. Thus, saturation Scan be written

S=q(IR−Iµ)2+ (IG−Iµ)2+ (IB−Iµ)2,(1)

Iµ=1

3(IR+IG+IB).(2)

where Iiis the intensity on the i-th channel, i∈ {R, G, B},

and Iµdenotes the corresponding mean of three channels.

3) Well-exposedness: Each pixel is weighted in the form of

a Gaussian curve. The well-exposedness Eis deﬁned as:

E=ER·EG·EB,(3)

Ei= exp −(Ii−0.5)2

2σ2!, i =R, G, B. (4)

where “·” denotes the pixel-wise multiplication and parameter

σis set to 0.2.

According to above the three exposure quality measures

[22], the initial weight map is given by,

Wn(x, y) = Cn(x, y)wC·Sn(x, y)wS·En(x, y)wE.(5)

where (x, y)and ndenote the location of pixel and n-th input

exposure image, respectively. wC,wSand wErefer to the

corresponding weighted indices, which default to 1.

B. Multi-Scale Fusion Framework

Firstly, to ensure space content consistency, each weight

map is normalized by

Wn(x, y) = "N

n=1

Wn(x, y)#−1

Wn(x, y),(6)

1https://github.com/hangxiaotian/Perceptual-Multi-exposure-Image-Fusion

Input exposure sequence

Weighting map

AWE

,1n

3-D Gradient

,2n

Weighting map

Fused image

LP fusion

Fig. 2. The pipeline of the proposed MEF method.

The ﬁnal resulting image Fis generated by

F(x, y) =

n=1

Wn(x, y)In(x, y).(7)

Unfortunately, using Eq. (7) directly will cause seam prob-

lems. To address the issue, Mertens et al.[22] adopted LP

framework. LP is used to decompose input image sequence,

and then pyramid images with different scales are blended.

The steps are as follows: assumed that there are Nexposure

images, and Nnormalized weight maps served as alpha masks.

Let L{A}lrepresent that image Ais decomposed into l-

th layer (l∈ {1,2,· · · , L}) in the LP decomposition, and

analogously G{B}ldenotes the corresponding decomposed

image of Gaussian pyramid. Thus, the pyramid is yielded by

L{F(x, y)}l=

n=1

Gnˆ

Wn(x, y)ol

L{In(x, y)}l.(8)

And ﬁnally, the Laplacian pyramid L{F(x, y)}lis collapsed

to generate the resulting image F.

III. PROPOSED MULTI-EXPOSURE IMAGE FUSION

This section presents the proposed MEF method. We ﬁrst

analyze the defects of saturation, present an adaptive well-

exposedness (AWE) and introduce the gradient of color im-

ages, i.e., 3-D gradient that is the generalization of conven-

tional 2-D gradient. Then, we employ LP framework [23] to

generate the ﬁnal image. The pipeline of the proposed method

is shown in Fig. 2.

Fig. 3. A synthetic color image with a size of 256 ×256 where the pixel

value of three channels in each row is the same and increases line by line

from 0 to 255.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1PerceptualMulti-ExposureFusionXiaoningLiuAbstract—Asanever-increasingdemandforhighdynamicrange(HDR)sceneshooting,multi-exposureimagefusion(MEF)technologyhasabounded.Inrecentyears,multi-scaleexposurefusionapproachesbasedondetail-enhancementhaveledthewayforimprovementinhighlightandshadowdetails.Mosto...

展开>> 收起<<

1 Perceptual Multi-Exposure Fusion Xiaoning Liu.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Perceptual Multi-Exposure Fusion Xiaoning Liu

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: