and a masked version of LFW was created in this work. The
obtained results show that the ensemble learning outperformed
other models.
This paper is organized as follows: Section 2 reviews the
related works. Section 3 presents the proposed approach.
Section 4 describes the experimental work and the obtained
results. Finally, the conclusions and future work are presented
in Section 5.
II. RELATED WORK
Several approaches have been proposed for masked face
recognition. These techniques can be categorized into recog-
nition techniques that recognize the face with a mask with-
out performing de-occlusion and techniques that perform de-
occlusion before recognizing the face.
A. Masked Face Recognition Without De-occlusion
Several approaches have been proposed for recognizing
masked faces without the need for reconstructing areas under
the mask. CNN has been extensively used and included in
the state-of-the-art architectures for unconstrained general face
recognition. However, when the face of the subject is occluded
by some objects such as the facial masks or scarves and
sunglasses, the accuracy of the CNN model drops significantly
[8]. This drop in the performance happens mostly when the
models are trained on unconstrained face images and tested on
occluded ones [9]. Therefore, some researchers trained their
model on a mix of these images to boost the model accuracy.
However, Song et al. [10] argued that adding a large amount
of partially occluded images is not enough because the learned
features of two faces with different occlusion conditions are
still inconsistent. Therefore, they introduced a method that
discards the facial mask and focuses on the features extracted
from other face regions. This approach was evaluated on AR
dataset and achieved an accuracy of 99.03%.
Li et al. [11] used Convolutional Block Attention Module
(CBAM) [12] for masked face recognition. The authors fed the
model with the subject’s eye extracted using different crop-
ping approaches. The proposed approach has been evaluated
on Masked-LFW [13] dataset and obtained an accuracy of
82.86%. They also tested their model’s recognition on masked-
Webface Dataset and achieved 91.525% accuracy compared
to 88.01% and 87.906% with Arcface [6] and Cosface [7]
methods, respectively. A similar approach was followed by
Hariri [14] for masked face recognition. The authors discarded
the occlusion portion of the face and kept only the area
around the eyes. The pre-trained VGG16 model was used to
extract features from the segmented eyes. This approach was
evaluated on the Real-World-Masked-Face-Dataset [15] and
accuracy of 91.3% was reported using 10-fold cross-validation
technique.
Wan et al. [16] proposed a deep trainable model, MaskNet,
that learns image features and neglects deformation by oc-
clusion. The authors claimed that the MaskNet model can be
involved in CNN architectures with minimum identity labels
and less computation. A verification accuracy of 96.4% was
reported on the LFW dataset when the face is randomly
occluded with a square of size 40. However, this accuracy
decreases as the size of the occlusion block increases.
Other approaches tried to improve the masked face recog-
nition accuracy by minimizing the intra-class and maximizing
the inter-class distances using different loss functions. Early
approaches used loss functions such as triplet loss [17] and
N-pairs [18] to optimize the distance while recent techniques
used other loss functions such as Arcface [6] and cosface [7].
Sface was proposed by Zhong et al. [19] to minimize the dis-
tance between a face with and without a mask by altering the
softmax loss function. Sface addresses the issue of overfitting
to low-quality training images and noisy labels by introducing
the Sigmoid-constrained Hypersphere loss function that re-
scales the gradients of intra-class and inter-class gradients
accordingly. SFace was evaluated on multiple benchmarking
datasets and achieved a verification accuracy of 99.82% and
90.63% on LFW and masked-LFW [13] datasets, respectively.
B. Masked Face Recognition with De-occlusion
A common approach to doing mask face recognition is
to restore the covered area with the mask [20]. Several
approaches have been used for masked face recognition by
face restoration. One of these approaches is by extracting
the key facial features with the help of pre-trained models.
The restored face is then matched to the original face to
recognize the person. The quality of the restored region plays
an important role in masked face recognition. Iizuka et al. [21]
proposed a generative model for face restoration. The proposed
model employed an adversarial training approach using global
and local context discriminators. The global discriminator
assesses the entire image and the local discriminator looks
at a small area in the completed region to ensure consistency
with generated patches. An improvement to this approach was
proposed by Yu et al. [22]. The improvement was to split
the image completion network into a coarse network and a
refinement network. The refinement network takes the initial
coarse prediction and produces refined results. The authors
used Wasserstein GAN (WGAN) [23] in their network to
improve the results.
In contrast to several cases of image inpainting where the
missing part is small and not complex in shape, a facial mask
covers a big region of the face which makes this task more
challenging. Din et al. [5] proposed a generative network
consisting of two discriminators to learn the general face shape
and one generator. This approach was capable of removing the
facial mask using the binary map and synthesizing the missing
regions while keeping the initial face structure. The proposed
mask extraction encoder uses five blocks of convolution layers.
The decoder component has the same architecture as the
encoder except for the convolution layers that were replaced
with deconvolution layers. This approach was evaluated on
CelebA Dataset and structural similarity (SSIM) of 0.864 was
reported.
Yu et al. [24] used a gated convolutional network that
provides a learnable dynamic feature selection mechanism