Ensemble Learning using Transformers and Convolutional Networks for Masked Face Recognition

2025-04-29 0 0 3.14MB 6 页 10玖币

侵权投诉

Ensemble Learning using Transformers and

Convolutional Networks for Masked Face

Recognition

Mohammed R. Al-Sinan, Aseel F. Haneef, and Hamzah Luqman∗

Information and Computer Science Department, King Fahd University of Petroleum and Minerals

∗SDAIA-KFUPM Joint Research Center for Artiﬁcial Intelligence, Dhahran 31261, Saudi Arabia.

Email: {g201354590, g201565430, hluqman}@kfupm.edu.sa

Abstract—Wearing a face mask is one of the adjustments we

had to follow to reduce the spread of the coronavirus. Having

our faces covered by masks constantly has driven the need

to understand and investigate how this behavior affects the

recognition capability of face recognition systems. Current face

recognition systems have extremely high accuracy when dealing

with unconstrained general face recognition cases but do not

generalize well with occluded masked faces. In this work, we

propose a system for masked face recognition. The proposed

system comprises two Convolutional Neural Network (CNN)

models and two Transformer models. The CNN models have

been ﬁne-tuned on FaceNet pre-trained model. We ensemble

the predictions of the four models using the majority voting

technique to identify the person with the mask. The proposed

system has been evaluated on a synthetically masked LFW

dataset created in this work. The best accuracy is obtained

using the ensembled models with an accuracy of 92%. This

recognition rate outperformed the accuracy of other models and

it shows the correctness and robustness of the proposed model

for recognizing masked faces. The code and data are available

at https://github.com/Hamzah-Luqman/MFR.

Index Terms—Masked Face Recognition, Face Recognition,

Face De-occlusion, Transformer, Ensemble Learning, LFW

dataset

I. INTRODUCTION

Coronavirus or COVID-19 is a global pandemic that has

affected more than 227 countries and territories [1]. This

disease has led to a serious negative impact on people’s health

and the global economy. Wearing face masks has become a

necessity in our daily lives as a preventive measure to avoid

the disease. According to the Centers for Disease Control and

Prevention (CDC), the most effective way to avoid spreading

the disease or being infected with it is to practice social

distancing and wear face masks [2].

Face recognition systems have been extensively used during

this pandemic. Coronavirus can be transmitted quickly be-

tween people via surfaces. This forced several organizations

and entities to avoid using touchable authentication devices

such as ﬁngerprint and password-based security systems.

These procedures increased the dependency on systems that

avoid unnecessary contact with surfaces. A face recognition

system is one of these systems that is used for user authen-

tication. In addition, they are used for security purposes that

involve people recognition and veriﬁcation.

However, wearing masks has driven the need to understand

and investigate how these masks affect the existing digital

systems such as face detection systems, face recognition

systems, and face veriﬁcation systems. According to Noa

et al. [3], face masks interfere with basic mechanisms of

face recognition accuracy for facial identity, gender, age, and

emotional identiﬁcation. For example, face masks can cause

recognition systems to misinterpret disgusted faces as angry

faces. Another study has been conducted by the National

Institute of Standards and Technology (NIST) [4] to evaluate

some commercial facial recognition systems on masked face

images. This study reported an error rate of 5-50% with these

systems on recognizing faces with masks created digitally on

faces without masks.

The failure of the currently available face recognition

systems on recognizing masked faces can be attributed to

several reasons. The primary reason is the lack of adequate

visual and identity cues due to the facial mask that covers

almost half of the face. This occlusion takes away a large

percentage of human face features [5]. Therefore, this type of

occlusion adds some difﬁculties to the recognition models to

identify masked faces. Several techniques have been proposed

to address this problem. Some of these techniques depend

on the un-occluded regions of the face to identify the person

while other approaches involve the full masked face for the

recognition. Other approaches in the literature tackled this

problem by reconstructing the occluded regions in the face

and then recognizing the whole face.

Many of the current face recognition methods depend on

deep learning models for recognition. These models were

proven to have very high accuracy even beyond the human

recognition capability on non-occluded faces [6], [7]. How-

ever, few works targeted masked face recognition. In this work,

we propose a system for recognizing the identity of the person

wearing a facial mask. Five systems have been proposed in this

work. Three of these models are CNN-based models ﬁne-tuned

on different pre-trained models. We also use the state-of-the-

art Transformer model for masked face recognition. To utilize

the features of the ﬁne-tuned models and the Transformer, we

ensemble two CNN models and two Transformer models and

apply the majority voting technique for the ﬁnal decision. The

proposed techniques have been evaluated on the LFW dataset

arXiv:2210.04816v1 [cs.CV] 10 Oct 2022

and a masked version of LFW was created in this work. The

obtained results show that the ensemble learning outperformed

other models.

This paper is organized as follows: Section 2 reviews the

related works. Section 3 presents the proposed approach.

Section 4 describes the experimental work and the obtained

results. Finally, the conclusions and future work are presented

in Section 5.

II. RELATED WORK

Several approaches have been proposed for masked face

recognition. These techniques can be categorized into recog-

nition techniques that recognize the face with a mask with-

out performing de-occlusion and techniques that perform de-

occlusion before recognizing the face.

A. Masked Face Recognition Without De-occlusion

Several approaches have been proposed for recognizing

masked faces without the need for reconstructing areas under

the mask. CNN has been extensively used and included in

the state-of-the-art architectures for unconstrained general face

recognition. However, when the face of the subject is occluded

by some objects such as the facial masks or scarves and

sunglasses, the accuracy of the CNN model drops signiﬁcantly

[8]. This drop in the performance happens mostly when the

models are trained on unconstrained face images and tested on

occluded ones [9]. Therefore, some researchers trained their

model on a mix of these images to boost the model accuracy.

However, Song et al. [10] argued that adding a large amount

of partially occluded images is not enough because the learned

features of two faces with different occlusion conditions are

still inconsistent. Therefore, they introduced a method that

discards the facial mask and focuses on the features extracted

from other face regions. This approach was evaluated on AR

dataset and achieved an accuracy of 99.03%.

Li et al. [11] used Convolutional Block Attention Module

(CBAM) [12] for masked face recognition. The authors fed the

model with the subject’s eye extracted using different crop-

ping approaches. The proposed approach has been evaluated

on Masked-LFW [13] dataset and obtained an accuracy of

82.86%. They also tested their model’s recognition on masked-

Webface Dataset and achieved 91.525% accuracy compared

to 88.01% and 87.906% with Arcface [6] and Cosface [7]

methods, respectively. A similar approach was followed by

Hariri [14] for masked face recognition. The authors discarded

the occlusion portion of the face and kept only the area

around the eyes. The pre-trained VGG16 model was used to

extract features from the segmented eyes. This approach was

evaluated on the Real-World-Masked-Face-Dataset [15] and

accuracy of 91.3% was reported using 10-fold cross-validation

technique.

Wan et al. [16] proposed a deep trainable model, MaskNet,

that learns image features and neglects deformation by oc-

clusion. The authors claimed that the MaskNet model can be

involved in CNN architectures with minimum identity labels

and less computation. A veriﬁcation accuracy of 96.4% was

reported on the LFW dataset when the face is randomly

occluded with a square of size 40. However, this accuracy

decreases as the size of the occlusion block increases.

Other approaches tried to improve the masked face recog-

nition accuracy by minimizing the intra-class and maximizing

the inter-class distances using different loss functions. Early

approaches used loss functions such as triplet loss [17] and

N-pairs [18] to optimize the distance while recent techniques

used other loss functions such as Arcface [6] and cosface [7].

Sface was proposed by Zhong et al. [19] to minimize the dis-

tance between a face with and without a mask by altering the

softmax loss function. Sface addresses the issue of overﬁtting

to low-quality training images and noisy labels by introducing

the Sigmoid-constrained Hypersphere loss function that re-

scales the gradients of intra-class and inter-class gradients

accordingly. SFace was evaluated on multiple benchmarking

datasets and achieved a veriﬁcation accuracy of 99.82% and

90.63% on LFW and masked-LFW [13] datasets, respectively.

B. Masked Face Recognition with De-occlusion

A common approach to doing mask face recognition is

to restore the covered area with the mask [20]. Several

approaches have been used for masked face recognition by

face restoration. One of these approaches is by extracting

the key facial features with the help of pre-trained models.

The restored face is then matched to the original face to

recognize the person. The quality of the restored region plays

an important role in masked face recognition. Iizuka et al. [21]

proposed a generative model for face restoration. The proposed

model employed an adversarial training approach using global

and local context discriminators. The global discriminator

assesses the entire image and the local discriminator looks

at a small area in the completed region to ensure consistency

with generated patches. An improvement to this approach was

proposed by Yu et al. [22]. The improvement was to split

the image completion network into a coarse network and a

reﬁnement network. The reﬁnement network takes the initial

coarse prediction and produces reﬁned results. The authors

used Wasserstein GAN (WGAN) [23] in their network to

improve the results.

In contrast to several cases of image inpainting where the

missing part is small and not complex in shape, a facial mask

covers a big region of the face which makes this task more

challenging. Din et al. [5] proposed a generative network

consisting of two discriminators to learn the general face shape

and one generator. This approach was capable of removing the

facial mask using the binary map and synthesizing the missing

regions while keeping the initial face structure. The proposed

mask extraction encoder uses ﬁve blocks of convolution layers.

The decoder component has the same architecture as the

encoder except for the convolution layers that were replaced

with deconvolution layers. This approach was evaluated on

CelebA Dataset and structural similarity (SSIM) of 0.864 was

reported.

Yu et al. [24] used a gated convolutional network that

provides a learnable dynamic feature selection mechanism

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EnsembleLearningusingTransformersandConvolutionalNetworksforMaskedFaceRecognitionMohammedR.Al-Sinan,AseelF.Haneef,andHamzahLuqmanInformationandComputerScienceDepartment,KingFahdUniversityofPetroleumandMineralsSDAIA-KFUPMJointResearchCenterforArticialIntelligence,Dhahran31261,SaudiArabia.Email:fg2...

展开>> 收起<<

Ensemble Learning using Transformers and Convolutional Networks for Masked Face Recognition.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Ensemble Learning using Transformers and Convolutional Networks for Masked Face Recognition

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: