Iris superresolution using CNNs is photorealism important to iris recognition

2025-05-03 0 0 3.41MB 10 页 10玖币

侵权投诉

IET Biometrics

Research Article

Iris super-resolution using CNNs: is photo-

realism important to iris recognition?

ISSN 2047-4938

Received on 4th December 2017

Revised 19th June 2018

Accepted on 22nd June 2018

E-First on 24th August 2018

doi: 10.1049/iet-bmt.2018.5146

www.ietdl.org

Eduardo Ribeiro1,2 , Andreas Uhl1, Fernando Alonso-Fernandez3

1Department of Computer Sciences, University of Salzburg, Jakob Haringer Strasse 2 5020, Salzburg, Austria

2Department of Computer Sciences, Federal University of Tocantins, 109 Norte, Av. NS 15, ALC NO 14, Palmas, Brazil

3IS-Lab/CAISR, Halmstad University, Box 823, Halmstad SE 301-18, Sweden

E-mail: uft.eduardo@uft.edu.br

Abstract: The use of low-resolution images adopting more relaxed acquisition conditions such as mobile phones and

surveillance videos is becoming increasingly common in iris recognition nowadays. Concurrently, a great variety of single image

super-resolution techniques are emerging, especially with the use of convolutional neural networks (CNNs). The main objective

of these methods is to try to recover finer texture details generating more photo-realistic images based on the optimisation of an

objective function depending basically on the CNN architecture and training approach. In this work, the authors explore single

image super-resolution using CNNs for iris recognition. For this, they test different CNN architectures and use different training

databases, validating their approach on a database of 1.872 near infrared iris images and on a mobile phone image database.

They also use quality assessment, visual results and recognition experiments to verify if the photo-realism provided by the

CNNs which have already proven to be effective for natural images can reflect in a better recognition rate for iris recognition.

The results show that using deeper architectures trained with texture databases that provide a balance between edge

preservation and the smoothness of the method can lead to good results in the iris recognition process.

1 Introduction

The main goal of super resolution (SR) is to produce, from one or

more images, an image with a higher resolution (with more pixels)

at the same time that produces a more detailed and realistic image

being faithful to the low-resolution (LR) image(s). One of the most

used examples is bicubic interpolation that, despite producing more

pixels and being faithful to the image at LR, does not produce more

detailed texture details generating more noise or blur than realism

[1].

Several applications, especially in the pattern recognition area,

demand, in an ideal environment, images in high resolution (HR)

where details and textures from the images may be critical to the

final result [2]. With the popularisation of devices built with

simpler sensors such as the charged-couple device and the

complementary metal oxide semiconductor (CMOS), millions of

images have been generated opening a range of possibilities for the

most diverse purposes in this area. One of them is biometrics as,

e.g. face and iris recognition using mobile phone devices.

Biometrics is a very strong and reliable approach for the automatic

identification of individuals based on biological phenomena which

can be statistically measured. In some practical applications, the

lack of pixel resolution in images supplied by less robust sensors

(such as mobile phones or surveillance cameras) and the focal

length may compromise the performance of recognition systems

[3]. In [4], a significant recognition performance degradation is

shown when the iris image resolution is reduced.

There are currently two approaches to the SR problem. The first

one is based on the use of sub-pixels obtained from several LR

images to reach a HR image, also known as reconstruction-based

SR [2, 5]. The main disadvantage of this technique is the

requirement of multiple images as input to obtain the final image

which may make the process unfeasible [6]. The second approach

(i.e. also the main focus of this work) called learning-based

approach is based on the learning of a model that maps the relation

between LR and HR images through a training image database [2].

The advantage of this method is that there is no need for multiple

versions of the same image as the input of the system: a single

image is required as input. For this reason, this method can also be

called as single-image SR approach [7]. This method also can

achieve high magnification factors since the model training can be

modelled with good performance specially using deep learning

approaches.

The use of deep learning, specifically convolutional neural

networks (CNNs) to perform the mapping between LR and HR

images/patches have been extensively explored in recent years.

One of the advantages of using a CNN is that it does not require

any handcrafted or engineered feature extractor as those required in

previous methods. In addition, the image reconstruction overcomes

the performance of traditional methods, particularly in relation to

the quality of image textures. However, in the biometrics field, few

studies were made exploring this better quality artificially created

with respect to the recognition performance.

In this study, we investigate the use of deep learning SR

(DLSR) applied to iris recognition. For this, we test different

architectures trained from scratch using different databases. The

motivation for this is to verify if the proven effectiveness of these

methods in relation to the image quality will be reflected in the

recognition performance. In addition, through different training

databases, we have verified that texture transfer learning can be an

alternative to the training of CNNs in practical applications.

Specifically, the idea of this contribution is to evaluate if the CNN

training with Iris images can specifically learn iris patterns to help

in the SR for very small factors where the patterns cannot be

identified due to the lack of information and to the image blur. The

results show that for very small factors, the databases trained with

iris images can achieve better recognition results despite the fact

that the present worse quality in the quality assessment algorithm

context. This dichotomy is also the main idea of this work

contribution and will be discussed in this study.

2 Related works

Single-image SR has become the focus of SR discussions in recent

years, deriving some surveys about it [8, 9]. Nonetheless, this area

has been discussed for decades, beginning with prediction-based

methods through filtering approaches (e.g. bilinear and bicubic),

which produce smooth textures leading to the study of methods

IET Biom., 2019, Vol. 8 Iss. 1, pp. 69-78

This is an open access article published by the IET under the Creative Commons Attribution License

(http://creativecommons.org/licenses/by/3.0/)

20474946, 2019, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-bmt.2018.5146 by Cochrane Sweden, Wiley Online Library on [24/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License

based on edge-preservation [10, 11]. Learning-based (or

Hallucination) algorithms using a single image were first

introduced in [12] where the mapping between the LR and HR

image was learned by a neural network applied to fingerprint

images.

With the popularisation of CNNs, several methods were

proposed for obtaining excellent results. Wang et al. [13] showed

that encoding a sparse representation, particularly designed for SR

can make the end-to-end mapping between the LR and HR image

through a reduced model size. However, the most famous

architecture of this end-to-end mapping is the super-resolution

CNN (SRCNN) proposed by Dong et al. [14] that used a bicubic

interpolation to up-sample the input LR image using a trained

three-layer deep fully CNN to reconstruct the HR image acting as a

denoising tool. The most common concern of the work that

followed was to find an architecture that minimises the mean

squared error (MSE) between the reconstructed HR image and the

ground truth. Besides that, also reflecting the maximisation of the

peak signal-to-noise ratio (PSNR), one of the most used metrics is

to evaluate the quality of the result in comparison with the

proposed methods [15].

In [16], a deeper CNN architecture is presented inspired by

VGG-net used for ImageNet classification [17] also called very

deep CNN (VDCNN). That work demonstrates that the use of the

cascading of small filters many times in a deep network structure

and the use of residual-learning can affect the accuracy of the SR

method.

In [15], a SR generative adversarial network (SRGAN) was

proposed to try and recover finer texture details from LR images

inferring photo-realistic natural images through a novel perceptual

loss function using high-level maps from the VGG network. The

SRCNN, VDCNN and SRGAN architectures will be used in this

work and will be detailed in the following sections.

Research on SR in biometrics (especially for Iris recognition)

has been increasing in the last few years specially using

reconstruction-based methods. For example, Kien et al. [3] use the

feature domain to super-resolve LR images relying only on the

features incorporating domain-specific information for iris models

to constrain the estimation. In [18], Nguyen et al. introduce a

signal-level fusion to integrate quality scores to the reconstruction-

based SR process performing a quality weighted SR for a LR video

sequence of a less constrained iris at a distance or on the move

obtaining good results. However, in this case, as in [19] that

perform the best frame selection, many LR images are required to

reconstruct the HR image which is one of the disadvantages of this

kind of reconstruction-based methods.

In [20], an iris recognition algorithm based on the principal

component analysis (PCA) is presented by constructing coarse iris

images with PCA coefficients and enhancing them using SR. In

[21], a reconstruction-based SR is proposed for iris SR from LR

video frames using an auto-regressive signature model between

consecutive LR images to fill the sub pixels in the constructed

image. In [22], two SR approaches are tested for iris recognition,

one based on the PCA eigen-transformation and other based on

locality-constrained iterative neighbour embedding (LINE) of local

image patches. Both methods use coupled dictionaries to learn the

mapping between LR and HR images in a very LR simulation

using infrared iris images obtaining good results for very small

images.

Despite the vast literature in the SR area and the great interest

in the use of deep-learning in biometrics, the application of DLSR

in iris recognition is still an unexplored field, mainly because

approaches generally focus on general and/or natural scenes to

produce the overall visual enhancement and produce better quality

images regarding to photo-realism, while iris recognition focuses

on the best recognition performance itself [1, 23]. In [24], three

multilayer perceptrons (MLPs) are used to perform single image

SR for iris recognition. The method is based on merging the

bilinear interpolation approach with the output pixel values from

the trained multiple MLPs considering the edge direction of the iris

patterns. Recently, Zhang et al. [25] use the classic SRCNN and

SR forest to perform SR in mobile iris recognition systems. The

algorithms are applied to the segmented and normalised iris images

and the results show a limited effectiveness of the SR method for

the iris recognition accuracy. Different from the methods presented

in the DLSR literature, in this work, we explore if the architectures

and database used in training can have an influence on the quality

results, and consequently on the recognition performance.

In our previous works [26, 27], we demonstrated that basic deep

learning techniques for super-resolution such as stacked auto-

encoders and the classic SRCNN can be successfully applied to Iris

SR. In that case, we used the CASIA Interval database as a target

database focusing more on the recognition process. In this work,

we focus on the relation between the quality and the performance

of the recognition and the SR is performed on the original image

without any segmentation. We also use a new iris database as a

target database that simulates a real world situation where the

images are acquired using mobile phones. Additionally, we test a

new application that is the use of SRGANs to verify if the good

performance of this method for natural images in terms of photo-

realism is also valid and useful for iris images in the iris

recognition context.

3 Reconstruction of LR iris images through CNNs

Typically, in a deep learning system, the main question is to find a

good training database that can provide relevant information to the

desired application. In the case of SR, it is necessary to achieve,

during the proposed method training (also called the off-line

phase), a mapping between a high-resolution (HR) image with

high-frequency information and a LR image with low-frequency

information. Fig. 1 shows this phase, in which a training database

is chosen and the images are prepared for the deep learning SR

method training.

In the training phase, the only pre-processing required is, given

an image in HR X, that the image needs to be downscaled to one or

more factors followed by an up-scaling using bicubic interpolation

to the same size as the original image X. This image, although it

has the same size as X is called ‘LR’ image and is denoted as the

LR image Y. The purpose of DLSR training is, after feeding the

network with an LR image or patch Y as input, try to obtain a

result F(Y) (the reconstructed image) as similar as possible to the

HR image or patch X, in this case, the ground truth. The weight

adjustment of the method will depend on both the chosen

architecture and the loss function that will be better explained in

the following sections.

After training, the deep learning method is applied to a LR

database for the proposed application which is, in the case of this

work, an iris database also called target database. If so, the deep

learning process is a pre-processing step before the iris recognition,

in which the LR image is introduced as an input to the network that

will produce the reconstructed image in the HR to be used in the

recognition process as is shown in Fig. 1 (on-line phase) that will

be reconstructed based on the factor training.

In deep learning, the preparation of individual machines for all

possible scenarios to deal with different scales, poses, illumination,

and textures is still a challenge. In this work, we test the main SR

Fig. 1 General overview of the training and reconstruction method for the

Iris SR using CNNs proposed for this work

70 IET Biom., 2019, Vol. 8 Iss. 1, pp. 69-78

This is an open access article published by the IET under the Creative Commons Attribution License

(http://creativecommons.org/licenses/by/3.0/)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

IETBiometricsResearchArticle,rissuper-resolutionusingC11s:isphoto-realismimportanttoirisrecognition"ISSN2047-4938Receivedon4thDecember2017Revised19thJune2018Acceptedon22ndJune2018E-Firston24thAugust2018doi:10.1049/iet-bmt.2018.5146www.ietdl.orgEduardoRibeiro1,2,Andreas...

展开>> 收起<<

Iris superresolution using CNNs is photorealism important to iris recognition.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Iris superresolution using CNNs is photorealism important to iris recognition

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: