Iris superresolution using CNNs is photorealism important to iris recognition

2025-05-03 0 0 3.41MB 10 页 10玖币
侵权投诉
IET Biometrics
Research Article
Iris super-resolution using CNNs: is photo-
realism important to iris recognition?
ISSN 2047-4938
Received on 4th December 2017
Revised 19th June 2018
Accepted on 22nd June 2018
E-First on 24th August 2018
doi: 10.1049/iet-bmt.2018.5146
www.ietdl.org
Eduardo Ribeiro1,2 , Andreas Uhl1, Fernando Alonso-Fernandez3
1Department of Computer Sciences, University of Salzburg, Jakob Haringer Strasse 2 5020, Salzburg, Austria
2Department of Computer Sciences, Federal University of Tocantins, 109 Norte, Av. NS 15, ALC NO 14, Palmas, Brazil
3IS-Lab/CAISR, Halmstad University, Box 823, Halmstad SE 301-18, Sweden
E-mail: uft.eduardo@uft.edu.br
Abstract: The use of low-resolution images adopting more relaxed acquisition conditions such as mobile phones and
surveillance videos is becoming increasingly common in iris recognition nowadays. Concurrently, a great variety of single image
super-resolution techniques are emerging, especially with the use of convolutional neural networks (CNNs). The main objective
of these methods is to try to recover finer texture details generating more photo-realistic images based on the optimisation of an
objective function depending basically on the CNN architecture and training approach. In this work, the authors explore single
image super-resolution using CNNs for iris recognition. For this, they test different CNN architectures and use different training
databases, validating their approach on a database of 1.872 near infrared iris images and on a mobile phone image database.
They also use quality assessment, visual results and recognition experiments to verify if the photo-realism provided by the
CNNs which have already proven to be effective for natural images can reflect in a better recognition rate for iris recognition.
The results show that using deeper architectures trained with texture databases that provide a balance between edge
preservation and the smoothness of the method can lead to good results in the iris recognition process.
1 Introduction
The main goal of super resolution (SR) is to produce, from one or
more images, an image with a higher resolution (with more pixels)
at the same time that produces a more detailed and realistic image
being faithful to the low-resolution (LR) image(s). One of the most
used examples is bicubic interpolation that, despite producing more
pixels and being faithful to the image at LR, does not produce more
detailed texture details generating more noise or blur than realism
[1].
Several applications, especially in the pattern recognition area,
demand, in an ideal environment, images in high resolution (HR)
where details and textures from the images may be critical to the
final result [2]. With the popularisation of devices built with
simpler sensors such as the charged-couple device and the
complementary metal oxide semiconductor (CMOS), millions of
images have been generated opening a range of possibilities for the
most diverse purposes in this area. One of them is biometrics as,
e.g. face and iris recognition using mobile phone devices.
Biometrics is a very strong and reliable approach for the automatic
identification of individuals based on biological phenomena which
can be statistically measured. In some practical applications, the
lack of pixel resolution in images supplied by less robust sensors
(such as mobile phones or surveillance cameras) and the focal
length may compromise the performance of recognition systems
[3]. In [4], a significant recognition performance degradation is
shown when the iris image resolution is reduced.
There are currently two approaches to the SR problem. The first
one is based on the use of sub-pixels obtained from several LR
images to reach a HR image, also known as reconstruction-based
SR [2, 5]. The main disadvantage of this technique is the
requirement of multiple images as input to obtain the final image
which may make the process unfeasible [6]. The second approach
(i.e. also the main focus of this work) called learning-based
approach is based on the learning of a model that maps the relation
between LR and HR images through a training image database [2].
The advantage of this method is that there is no need for multiple
versions of the same image as the input of the system: a single
image is required as input. For this reason, this method can also be
called as single-image SR approach [7]. This method also can
achieve high magnification factors since the model training can be
modelled with good performance specially using deep learning
approaches.
The use of deep learning, specifically convolutional neural
networks (CNNs) to perform the mapping between LR and HR
images/patches have been extensively explored in recent years.
One of the advantages of using a CNN is that it does not require
any handcrafted or engineered feature extractor as those required in
previous methods. In addition, the image reconstruction overcomes
the performance of traditional methods, particularly in relation to
the quality of image textures. However, in the biometrics field, few
studies were made exploring this better quality artificially created
with respect to the recognition performance.
In this study, we investigate the use of deep learning SR
(DLSR) applied to iris recognition. For this, we test different
architectures trained from scratch using different databases. The
motivation for this is to verify if the proven effectiveness of these
methods in relation to the image quality will be reflected in the
recognition performance. In addition, through different training
databases, we have verified that texture transfer learning can be an
alternative to the training of CNNs in practical applications.
Specifically, the idea of this contribution is to evaluate if the CNN
training with Iris images can specifically learn iris patterns to help
in the SR for very small factors where the patterns cannot be
identified due to the lack of information and to the image blur. The
results show that for very small factors, the databases trained with
iris images can achieve better recognition results despite the fact
that the present worse quality in the quality assessment algorithm
context. This dichotomy is also the main idea of this work
contribution and will be discussed in this study.
2 Related works
Single-image SR has become the focus of SR discussions in recent
years, deriving some surveys about it [8, 9]. Nonetheless, this area
has been discussed for decades, beginning with prediction-based
methods through filtering approaches (e.g. bilinear and bicubic),
which produce smooth textures leading to the study of methods
IET Biom., 2019, Vol. 8 Iss. 1, pp. 69-78
This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
69
20474946, 2019, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-bmt.2018.5146 by Cochrane Sweden, Wiley Online Library on [24/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
based on edge-preservation [10, 11]. Learning-based (or
Hallucination) algorithms using a single image were first
introduced in [12] where the mapping between the LR and HR
image was learned by a neural network applied to fingerprint
images.
With the popularisation of CNNs, several methods were
proposed for obtaining excellent results. Wang et al. [13] showed
that encoding a sparse representation, particularly designed for SR
can make the end-to-end mapping between the LR and HR image
through a reduced model size. However, the most famous
architecture of this end-to-end mapping is the super-resolution
CNN (SRCNN) proposed by Dong et al. [14] that used a bicubic
interpolation to up-sample the input LR image using a trained
three-layer deep fully CNN to reconstruct the HR image acting as a
denoising tool. The most common concern of the work that
followed was to find an architecture that minimises the mean
squared error (MSE) between the reconstructed HR image and the
ground truth. Besides that, also reflecting the maximisation of the
peak signal-to-noise ratio (PSNR), one of the most used metrics is
to evaluate the quality of the result in comparison with the
proposed methods [15].
In [16], a deeper CNN architecture is presented inspired by
VGG-net used for ImageNet classification [17] also called very
deep CNN (VDCNN). That work demonstrates that the use of the
cascading of small filters many times in a deep network structure
and the use of residual-learning can affect the accuracy of the SR
method.
In [15], a SR generative adversarial network (SRGAN) was
proposed to try and recover finer texture details from LR images
inferring photo-realistic natural images through a novel perceptual
loss function using high-level maps from the VGG network. The
SRCNN, VDCNN and SRGAN architectures will be used in this
work and will be detailed in the following sections.
Research on SR in biometrics (especially for Iris recognition)
has been increasing in the last few years specially using
reconstruction-based methods. For example, Kien et al. [3] use the
feature domain to super-resolve LR images relying only on the
features incorporating domain-specific information for iris models
to constrain the estimation. In [18], Nguyen et al. introduce a
signal-level fusion to integrate quality scores to the reconstruction-
based SR process performing a quality weighted SR for a LR video
sequence of a less constrained iris at a distance or on the move
obtaining good results. However, in this case, as in [19] that
perform the best frame selection, many LR images are required to
reconstruct the HR image which is one of the disadvantages of this
kind of reconstruction-based methods.
In [20], an iris recognition algorithm based on the principal
component analysis (PCA) is presented by constructing coarse iris
images with PCA coefficients and enhancing them using SR. In
[21], a reconstruction-based SR is proposed for iris SR from LR
video frames using an auto-regressive signature model between
consecutive LR images to fill the sub pixels in the constructed
image. In [22], two SR approaches are tested for iris recognition,
one based on the PCA eigen-transformation and other based on
locality-constrained iterative neighbour embedding (LINE) of local
image patches. Both methods use coupled dictionaries to learn the
mapping between LR and HR images in a very LR simulation
using infrared iris images obtaining good results for very small
images.
Despite the vast literature in the SR area and the great interest
in the use of deep-learning in biometrics, the application of DLSR
in iris recognition is still an unexplored field, mainly because
approaches generally focus on general and/or natural scenes to
produce the overall visual enhancement and produce better quality
images regarding to photo-realism, while iris recognition focuses
on the best recognition performance itself [1, 23]. In [24], three
multilayer perceptrons (MLPs) are used to perform single image
SR for iris recognition. The method is based on merging the
bilinear interpolation approach with the output pixel values from
the trained multiple MLPs considering the edge direction of the iris
patterns. Recently, Zhang et al. [25] use the classic SRCNN and
SR forest to perform SR in mobile iris recognition systems. The
algorithms are applied to the segmented and normalised iris images
and the results show a limited effectiveness of the SR method for
the iris recognition accuracy. Different from the methods presented
in the DLSR literature, in this work, we explore if the architectures
and database used in training can have an influence on the quality
results, and consequently on the recognition performance.
In our previous works [26, 27], we demonstrated that basic deep
learning techniques for super-resolution such as stacked auto-
encoders and the classic SRCNN can be successfully applied to Iris
SR. In that case, we used the CASIA Interval database as a target
database focusing more on the recognition process. In this work,
we focus on the relation between the quality and the performance
of the recognition and the SR is performed on the original image
without any segmentation. We also use a new iris database as a
target database that simulates a real world situation where the
images are acquired using mobile phones. Additionally, we test a
new application that is the use of SRGANs to verify if the good
performance of this method for natural images in terms of photo-
realism is also valid and useful for iris images in the iris
recognition context.
3 Reconstruction of LR iris images through CNNs
Typically, in a deep learning system, the main question is to find a
good training database that can provide relevant information to the
desired application. In the case of SR, it is necessary to achieve,
during the proposed method training (also called the off-line
phase), a mapping between a high-resolution (HR) image with
high-frequency information and a LR image with low-frequency
information. Fig. 1 shows this phase, in which a training database
is chosen and the images are prepared for the deep learning SR
method training.
In the training phase, the only pre-processing required is, given
an image in HR X, that the image needs to be downscaled to one or
more factors followed by an up-scaling using bicubic interpolation
to the same size as the original image X. This image, although it
has the same size as X is called ‘LR’ image and is denoted as the
LR image Y. The purpose of DLSR training is, after feeding the
network with an LR image or patch Y as input, try to obtain a
result F(Y) (the reconstructed image) as similar as possible to the
HR image or patch X, in this case, the ground truth. The weight
adjustment of the method will depend on both the chosen
architecture and the loss function that will be better explained in
the following sections.
After training, the deep learning method is applied to a LR
database for the proposed application which is, in the case of this
work, an iris database also called target database. If so, the deep
learning process is a pre-processing step before the iris recognition,
in which the LR image is introduced as an input to the network that
will produce the reconstructed image in the HR to be used in the
recognition process as is shown in Fig. 1 (on-line phase) that will
be reconstructed based on the factor training.
In deep learning, the preparation of individual machines for all
possible scenarios to deal with different scales, poses, illumination,
and textures is still a challenge. In this work, we test the main SR
Fig. 1  General overview of the training and reconstruction method for the
Iris SR using CNNs proposed for this work
70 IET Biom., 2019, Vol. 8 Iss. 1, pp. 69-78
This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
20474946, 2019, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-bmt.2018.5146 by Cochrane Sweden, Wiley Online Library on [24/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
摘要:

IETBiometricsResearchArticle,rissuper-resolutionusingC11s:isphoto-realismimportanttoirisrecognition"ISSN2047-4938Receivedon4thDecember2017Revised19thJune2018Acceptedon22ndJune2018E-Firston24thAugust2018doi:10.1049/iet-bmt.2018.5146www.ietdl.orgEduardoRibeiro1,2,Andreas...

展开>> 收起<<
Iris superresolution using CNNs is photorealism important to iris recognition.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:3.41MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注