RACE BIASANALYSIS OF BONA FIDEERRORS IN FACE ANTI -SPOOFING Latifah Abduh Ioannis Ivrissimtzis

2025-04-29 0 0 7.36MB 15 页 10玖币
侵权投诉
RACE BIAS ANALYSIS OF BONA FIDE ERRORS
IN FACE ANTI-SPOOFING
Latifah Abduh & Ioannis Ivrissimtzis
Department of Computer Science, Durham University, UK
e-mail:{latifah.a.abduh, ioannis.ivrissimtzis}@durham.ac.uk
ABSTRACT
The study of bias in Machine Learning is receiving a lot of attention in recent years, however, few
only papers deal explicitly with the problem of race bias in face anti-spoofing. In this paper, we
present a systematic study of race bias in face anti-spoofing with three key characteristics: the focus
is on analysing potential bias in the bona fide errors, where significant ethical and legal issues lie; the
analysis is not restricted to the final binary outcomes of the classifier, but also covers the classifier’s
scalar responses and its latent space; the threshold determining the operating point of the classifier is
considered a variable. We demonstrate the proposed bias analysis process on a VQ-VAE based face
anti-spoofing algorithm, trained on the Replay Attack and the Spoof in the Wild (SiW) databases,
and analysed for bias on the SiW and Racial Faces in the Wild (RFW), databases. The results
demonstrate that race bias is not necessarily the result of different mean response values among the
various populations. Instead, it can be better understood as the combined effect of several possible
characteristics of the response distributions: different means; different variances; bimodal behaviour;
existence of outliers.
Keywords Face presentation attacks ·face anti-spoofing ·fairness ·race bias.
1 Introduction
Face recognition is the method of choice behind some of the most widely deployed biometric authentication systems,
currently supporting a range of applications, from passport control at airports, to mobile phone or laptop login. A key
weaknesses of the technology, preventing it from being employed in security sensitive applications in uncontrolled
environments, as for example ATM machines for money withdrawal, is its vulnerability to presentation attacks, where
imposters attempt to gain wrongful access by presenting in front of the system’s camera a photo, or a video, or by
wearing a mask resembling a registered person. As a solution to this problem, algorithms for presentation attack
detection (PAD) are developed, that is, binary classifiers trained to distinguish between the bona fide samples coming
from live subjects, and those coming from imposters.
The large variety in the types of possible presentation attacks, and the large variation in the environmental conditions
under which they might take place, make PAD a particularly challenging problem. However, the current state-of-the-art,
utilising the power of deep learning, comprises classifiers with excellent accuracy rates, and a satisfactory generalisation
power to at least a limited number of previously unseen attacks. Cross-database generalisation is still problematic,
however, it is debatable if this is a real obstacle to the deployment of PAD algorithms in practical applications, since
such algorithms as usually embedded in specific face recognition systems, with given camera specifications and
configurations.
Here, we deal with the problem of race bias in face anti-spoofing algorithms. It is a topic that has attracted considerably
less research interest than accuracy and generalisation power, despite the fact that it raises ethical, legal, and regulatory
considerations, which, by their own, can prevent adoption in specific applications. Addressing this gap, the aim of this
paper is to provide a framework for studying the question: Does the classifier work equally well on people from all
races?.
arXiv:2210.05366v1 [cs.CV] 11 Oct 2022
Race Bias Analysis of Bona Fide Errors in face anti-spoofing
The proposed race bias analysis process has three key characteristics. First, the focus is on the bona fide error, that is,
on genuine people wrongly classified as imposters. Bias in this type of error has significant ethical, legal and regulatory
ramifications, and as it has recently point out “creates customer annoyance and inconvenience, and this is also where
bias can occur in PAD systems”, [
1
]. Secondly, we analyse various stages of the classification process. Not just the final
binary outcome, but also the scalar responses of the network prior to thresholding, and before that the representation of
the face image in the network’s latent space. Thirdly, we treat the value of the threshold that determines the classifier’s
operating point on the ROC curve as a user-defined variable. We do not assume it is fixed by the vendor of the biometric
verification system through a black-box process.
In the rest of the paper, we demonstrate the application of the proposed bias analysis approach on a face anti-spoofing
algorithm based on the recently proposed Vector Quantized Variational Autoencoder (VQ-VAE) architecture, [
2
]. The
network is trained and validated on the Replay Attack and the SiW databases, and tested for racial bias on bona fide
samples from the SiW and the RFW databases. Hypotheses are tested using the chi-squared test on the binary outcomes,
the Mann–Whitney U test on the scalar responses, and Hartigan’s Dip for testing bimodality in the response distributions.
To test for bias in the latent space of the VQ-VAE network, we train an SVM with encoding vectors from two races, and
measure its performance as a binary classifier.
The contributions of the paper are summarised as follows:
A demonstration that race bias can be attributed to several characteristics of the response distributions: different
means; different variances; bimodality; outliers.
A demonstration that non-specialised databases, such as RFW, can be used to analyse face anti-spoofing
algorithms.
A VQ-VAE based network for face anti-spoofing.
The rest of the paper is organised as follows. In Section 2, we review the relevant literature. In Section 3, we describe
the VQ-VAE face anti-spoofing algorithm and the databases we used. In Section 4, we present the bias analysis on the
SiW database, and in Section 5 the bias analysis on the RFW database. We briefly conclude in Section 6.
2 Background
We briefly review the area of face anti-spoofing, and then studies of bias in machine learning, and PAD in particular.
2.1 Face anti-spoofing
The earlier machine learning approaches to PAD were based on handcrafted features [
3
,
4
], with Histogram of Oriented
Gradient (HOG) [5], and Local Binary Patterns (LBP) [3], among the most popular.
More recent approaches were based on CNNs, [
6
,
7
], or combinations of various deep network types [
8
], leading
to the current state of the art being based on various forms of deep learning [
9
,
10
,
11
,
12
,
13
,
8
,
14
,
15
], such as
Central Difference Convolutional Networks (CDCN) [
11
,
10
], or transformers [
16
]. Following some earlier approaches
[
17
,
18
], the current state of the art algorithms also utilise depth information [
19
,
13
,
15
,
20
], which can be estimated
by a independently trained neural network, while the use of GANs to estimate Near Infrared (NIR) information was
proposed [14].
The experiment presented in this paper is based on a one class trained autoencoder. Anomaly detection is a popular
approach [
21
,
22
,
23
,
24
,
25
], offering good generalisation to unseen attacks. In [
26
], images from face recognition
datasets were added to the two-class training set of an autoencoder, and improved cross-database generalisation was
reported. A similar behaviour was reported in [27] when images from the in-the-wild were added to the training set.
2.2 Databases
In this paper, the first training set is from Replay-Attack [
3
], a database consisting of 50 subjects of three types of
ethnicities, 76% Caucasian, 22% Asian, and 2% African. Our second training set is from SiW [
18
], a database consisting
of 165 subjects, of four types of ethnicities, 35% of Asian and 35% Caucasian and 23% Indian, and 7% African
American. The bias analysis is performed on SiW with the subject annotated for ethnicity type by us, and the already
annotated RFW database [28].
Regarding other databases, NUAA [
29
] was one of the first face anti-spoofing large databases. It is rarely used these
days as its low quality of imagery poses an unfair challenge in the cross-database validation of algorithms. MSU MFSD
[
30
] consists of 55 subjects, captured by four different devices, while OULU [
31
] has again 55 subjects captured by six
2
Race Bias Analysis of Bona Fide Errors in face anti-spoofing
different mobile devices. WMCA [
32
] contains 72 subjects and information is captured in RGB, depth, infrared, and
thermal. CASIA-SURF [33] consists of 1000 subjects captured in RGB, depth, and infrared.
The first face anti-spoofing database to include explicit ethnic labels was CASIA-SURF CeFA [
34
], which has 1,607 in
three ethnicities, captured in three modalities. In this paper, for bias analysis we use the RFW [
28
], which includes four
types of ethnicities, Caucasian, Asian, Indian, and African. RFW does not specialise in face anti-spoofing, and it is
more widely used in the bias analysis literature.
2.3 Bias in machine learning
In [
35
], several high profile cases of machine learning bias are documented; Google search results appeared to be biased
towards women in 2015; Hewlett-Packard’s software for web cameras struggled to recognize dark skin tones; and
Nikon’s camera software was inaccurately identifying Asian people as blinking.
Thus, given also the ethical, legal, and regulatory issues associated with the problem of bias within human populations,
there is a considerable amount of research on the subject, especially in face recognition (FR). A recent comprehensive
survey can be found in [
36
], where the significant sources of bias [
37
,
38
] are categorised and discussed, and the
negative effect of bias on downstream learning tasks is pointed out. We also note that while the current deep learning
based FR algorithms are under intense scrutiny for potential bias [
39
], this is due to their wider deployment in real life
applications, rather than any evidence that they are more biased than traditional approaches.
In one of the earliest studies of bias in FR, predating deep learning, [
40
] reported differences in the performance on
humans of Caucasian and East Asian decent between Western and East Asia developed algorithms. In [
41
], several
deep learning based FR algorithms are analysed and a small amount of bias is detected in all of them. Then, the authors
show how this bias can be exploited to enhance the power of malicious morphing attacks to FR based security systems.
In [
42
], the authors compute cluster validation measures on the clusters of the various demographics inside the whole
population, aiming at measuring the algorithm’s potential for bias. Their result is negative, and they argue for the need of
more sophisticated clustering approaches. We note that in our paper, an investigation in the latent space of the potential
for bias, by measuring the discriminative power of SVMs over the various ethnicities, returned a similarly negative
result. In [
43
], the aim is the detection of bias by analysing the activation ratios at the various layers of the network.
Similarly to our work, their target application is the detection of race bias on a binary classification problem, gender
classification in their case. Their result is positive in that they report a correlation between the measured activation
ratios and bias in the final outcomes of the classifier. However, it is not clear if their method can be used to measure and
assess the statistical significance of the expected bias.
In Cavazos et al. [
44
], similarly to our approach, most of the analysis assumes a one-sided error cost, in their case the
false acceptance rate, and the decision thresholds are treated as user defined variables. However, the analytical tools
they used, mostly visual inspection of ROC curves, do not allow for a deep study of the distributions of the similarity
scores, while, here, we give a more detailed analysis of the distribution of the responses, which is the equivalent of the
similarity scores. In Pereira and Marcel [
45
], a fairness metric is proposed, which can be optimised over the decision
thresholds, but again, there is no in-depth statistical analysis of the scores, as we do here for the responses, and thus
they offer a more limited insight.
2.3.1 Bias in Presentation Attack Detection
The literature on bias in presentation attacks is more sparse. Race bias was the key theme in the competition of face
anti-spoofing algorithm on the CASIA-SURF CeFA database [
46
]. Bias was assessed by the performance of the
algorithm under a cross-ethnicity validation scenario. Standard performance metrics, such as APCER, BPCER and
ACER we reported. In [
47
], the standard CNN models Resnet 50 and VGG16, were compared for gender bias against
the debiasing-VAE proposed in [
48
], and several performance metrics were reported. In a recent white paper by the ID
R&D company, which develops face anti-spoofing software, the results of a large scale bias assessment experiment
conducted by Bixelab, a NIST accredited independent laboratory [
1
]. Similarly to our approach, they focus on the bona
fide errors, and their aim is the BPCER error metric to be below a prespecified threshold across all demographics.
Regarding other biometric identification modalities, [
49
] studied gender bias in iris PAD algorithms. They reported
three error metrics, APCER, BPCER, and HTER, finding that female users would be less protected against iris PAD
attacks.
3
摘要:

RACEBIASANALYSISOFBONAFIDEERRORSINFACEANTI-SPOOFINGLatifahAbduh&IoannisIvrissimtzisDepartmentofComputerScience,DurhamUniversity,UKe-mail:{latifah.a.abduh,ioannis.ivrissimtzis}@durham.ac.ukABSTRACTThestudyofbiasinMachineLearningisreceivingalotofattentioninrecentyears,however,fewonlypapersdealexplicit...

展开>> 收起<<
RACE BIASANALYSIS OF BONA FIDEERRORS IN FACE ANTI -SPOOFING Latifah Abduh Ioannis Ivrissimtzis.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:7.36MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注