
based on edge-preservation [10, 11]. Learning-based (or
Hallucination) algorithms using a single image were first
introduced in [12] where the mapping between the LR and HR
image was learned by a neural network applied to fingerprint
images.
With the popularisation of CNNs, several methods were
proposed for obtaining excellent results. Wang et al. [13] showed
that encoding a sparse representation, particularly designed for SR
can make the end-to-end mapping between the LR and HR image
through a reduced model size. However, the most famous
architecture of this end-to-end mapping is the super-resolution
CNN (SRCNN) proposed by Dong et al. [14] that used a bicubic
interpolation to up-sample the input LR image using a trained
three-layer deep fully CNN to reconstruct the HR image acting as a
denoising tool. The most common concern of the work that
followed was to find an architecture that minimises the mean
squared error (MSE) between the reconstructed HR image and the
ground truth. Besides that, also reflecting the maximisation of the
peak signal-to-noise ratio (PSNR), one of the most used metrics is
to evaluate the quality of the result in comparison with the
proposed methods [15].
In [16], a deeper CNN architecture is presented inspired by
VGG-net used for ImageNet classification [17] also called very
deep CNN (VDCNN). That work demonstrates that the use of the
cascading of small filters many times in a deep network structure
and the use of residual-learning can affect the accuracy of the SR
method.
In [15], a SR generative adversarial network (SRGAN) was
proposed to try and recover finer texture details from LR images
inferring photo-realistic natural images through a novel perceptual
loss function using high-level maps from the VGG network. The
SRCNN, VDCNN and SRGAN architectures will be used in this
work and will be detailed in the following sections.
Research on SR in biometrics (especially for Iris recognition)
has been increasing in the last few years specially using
reconstruction-based methods. For example, Kien et al. [3] use the
feature domain to super-resolve LR images relying only on the
features incorporating domain-specific information for iris models
to constrain the estimation. In [18], Nguyen et al. introduce a
signal-level fusion to integrate quality scores to the reconstruction-
based SR process performing a quality weighted SR for a LR video
sequence of a less constrained iris at a distance or on the move
obtaining good results. However, in this case, as in [19] that
perform the best frame selection, many LR images are required to
reconstruct the HR image which is one of the disadvantages of this
kind of reconstruction-based methods.
In [20], an iris recognition algorithm based on the principal
component analysis (PCA) is presented by constructing coarse iris
images with PCA coefficients and enhancing them using SR. In
[21], a reconstruction-based SR is proposed for iris SR from LR
video frames using an auto-regressive signature model between
consecutive LR images to fill the sub pixels in the constructed
image. In [22], two SR approaches are tested for iris recognition,
one based on the PCA eigen-transformation and other based on
locality-constrained iterative neighbour embedding (LINE) of local
image patches. Both methods use coupled dictionaries to learn the
mapping between LR and HR images in a very LR simulation
using infrared iris images obtaining good results for very small
images.
Despite the vast literature in the SR area and the great interest
in the use of deep-learning in biometrics, the application of DLSR
in iris recognition is still an unexplored field, mainly because
approaches generally focus on general and/or natural scenes to
produce the overall visual enhancement and produce better quality
images regarding to photo-realism, while iris recognition focuses
on the best recognition performance itself [1, 23]. In [24], three
multilayer perceptrons (MLPs) are used to perform single image
SR for iris recognition. The method is based on merging the
bilinear interpolation approach with the output pixel values from
the trained multiple MLPs considering the edge direction of the iris
patterns. Recently, Zhang et al. [25] use the classic SRCNN and
SR forest to perform SR in mobile iris recognition systems. The
algorithms are applied to the segmented and normalised iris images
and the results show a limited effectiveness of the SR method for
the iris recognition accuracy. Different from the methods presented
in the DLSR literature, in this work, we explore if the architectures
and database used in training can have an influence on the quality
results, and consequently on the recognition performance.
In our previous works [26, 27], we demonstrated that basic deep
learning techniques for super-resolution such as stacked auto-
encoders and the classic SRCNN can be successfully applied to Iris
SR. In that case, we used the CASIA Interval database as a target
database focusing more on the recognition process. In this work,
we focus on the relation between the quality and the performance
of the recognition and the SR is performed on the original image
without any segmentation. We also use a new iris database as a
target database that simulates a real world situation where the
images are acquired using mobile phones. Additionally, we test a
new application that is the use of SRGANs to verify if the good
performance of this method for natural images in terms of photo-
realism is also valid and useful for iris images in the iris
recognition context.
3 Reconstruction of LR iris images through CNNs
Typically, in a deep learning system, the main question is to find a
good training database that can provide relevant information to the
desired application. In the case of SR, it is necessary to achieve,
during the proposed method training (also called the off-line
phase), a mapping between a high-resolution (HR) image with
high-frequency information and a LR image with low-frequency
information. Fig. 1 shows this phase, in which a training database
is chosen and the images are prepared for the deep learning SR
method training.
In the training phase, the only pre-processing required is, given
an image in HR X, that the image needs to be downscaled to one or
more factors followed by an up-scaling using bicubic interpolation
to the same size as the original image X. This image, although it
has the same size as X is called ‘LR’ image and is denoted as the
LR image Y. The purpose of DLSR training is, after feeding the
network with an LR image or patch Y as input, try to obtain a
result F(Y) (the reconstructed image) as similar as possible to the
HR image or patch X, in this case, the ground truth. The weight
adjustment of the method will depend on both the chosen
architecture and the loss function that will be better explained in
the following sections.
After training, the deep learning method is applied to a LR
database for the proposed application which is, in the case of this
work, an iris database also called target database. If so, the deep
learning process is a pre-processing step before the iris recognition,
in which the LR image is introduced as an input to the network that
will produce the reconstructed image in the HR to be used in the
recognition process as is shown in Fig. 1 (on-line phase) that will
be reconstructed based on the factor training.
In deep learning, the preparation of individual machines for all
possible scenarios to deal with different scales, poses, illumination,
and textures is still a challenge. In this work, we test the main SR
Fig. 1 General overview of the training and reconstruction method for the
Iris SR using CNNs proposed for this work
70 IET Biom., 2019, Vol. 8 Iss. 1, pp. 69-78
This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
20474946, 2019, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-bmt.2018.5146 by Cochrane Sweden, Wiley Online Library on [24/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License