RACE BIASANALYSIS OF BONA FIDEERRORS IN FACE ANTI -SPOOFING Latifah Abduh Ioannis Ivrissimtzis

2025-04-29 0 0 7.36MB 15 页 10玖币

侵权投诉

RACE BIAS ANALYSIS OF BONA FIDE ERRORS

IN FACE ANTI-SPOOFING

Latifah Abduh & Ioannis Ivrissimtzis

Department of Computer Science, Durham University, UK

e-mail:{latifah.a.abduh, ioannis.ivrissimtzis}@durham.ac.uk

ABSTRACT

The study of bias in Machine Learning is receiving a lot of attention in recent years, however, few

only papers deal explicitly with the problem of race bias in face anti-spooﬁng. In this paper, we

present a systematic study of race bias in face anti-spooﬁng with three key characteristics: the focus

is on analysing potential bias in the bona ﬁde errors, where signiﬁcant ethical and legal issues lie; the

analysis is not restricted to the ﬁnal binary outcomes of the classiﬁer, but also covers the classiﬁer’s

scalar responses and its latent space; the threshold determining the operating point of the classiﬁer is

considered a variable. We demonstrate the proposed bias analysis process on a VQ-VAE based face

anti-spooﬁng algorithm, trained on the Replay Attack and the Spoof in the Wild (SiW) databases,

and analysed for bias on the SiW and Racial Faces in the Wild (RFW), databases. The results

demonstrate that race bias is not necessarily the result of different mean response values among the

various populations. Instead, it can be better understood as the combined effect of several possible

characteristics of the response distributions: different means; different variances; bimodal behaviour;

existence of outliers.

Keywords Face presentation attacks ·face anti-spooﬁng ·fairness ·race bias.

1 Introduction

Face recognition is the method of choice behind some of the most widely deployed biometric authentication systems,

currently supporting a range of applications, from passport control at airports, to mobile phone or laptop login. A key

weaknesses of the technology, preventing it from being employed in security sensitive applications in uncontrolled

environments, as for example ATM machines for money withdrawal, is its vulnerability to presentation attacks, where

imposters attempt to gain wrongful access by presenting in front of the system’s camera a photo, or a video, or by

wearing a mask resembling a registered person. As a solution to this problem, algorithms for presentation attack

detection (PAD) are developed, that is, binary classiﬁers trained to distinguish between the bona ﬁde samples coming

from live subjects, and those coming from imposters.

The large variety in the types of possible presentation attacks, and the large variation in the environmental conditions

under which they might take place, make PAD a particularly challenging problem. However, the current state-of-the-art,

utilising the power of deep learning, comprises classiﬁers with excellent accuracy rates, and a satisfactory generalisation

power to at least a limited number of previously unseen attacks. Cross-database generalisation is still problematic,

however, it is debatable if this is a real obstacle to the deployment of PAD algorithms in practical applications, since

such algorithms as usually embedded in speciﬁc face recognition systems, with given camera speciﬁcations and

conﬁgurations.

Here, we deal with the problem of race bias in face anti-spooﬁng algorithms. It is a topic that has attracted considerably

less research interest than accuracy and generalisation power, despite the fact that it raises ethical, legal, and regulatory

considerations, which, by their own, can prevent adoption in speciﬁc applications. Addressing this gap, the aim of this

paper is to provide a framework for studying the question: Does the classiﬁer work equally well on people from all

races?.

arXiv:2210.05366v1 [cs.CV] 11 Oct 2022

Race Bias Analysis of Bona Fide Errors in face anti-spooﬁng

The proposed race bias analysis process has three key characteristics. First, the focus is on the bona ﬁde error, that is,

on genuine people wrongly classiﬁed as imposters. Bias in this type of error has signiﬁcant ethical, legal and regulatory

ramiﬁcations, and as it has recently point out “creates customer annoyance and inconvenience, and this is also where

bias can occur in PAD systems”, [

]. Secondly, we analyse various stages of the classiﬁcation process. Not just the ﬁnal

binary outcome, but also the scalar responses of the network prior to thresholding, and before that the representation of

the face image in the network’s latent space. Thirdly, we treat the value of the threshold that determines the classiﬁer’s

operating point on the ROC curve as a user-deﬁned variable. We do not assume it is ﬁxed by the vendor of the biometric

veriﬁcation system through a black-box process.

In the rest of the paper, we demonstrate the application of the proposed bias analysis approach on a face anti-spooﬁng

algorithm based on the recently proposed Vector Quantized Variational Autoencoder (VQ-VAE) architecture, [

]. The

network is trained and validated on the Replay Attack and the SiW databases, and tested for racial bias on bona ﬁde

samples from the SiW and the RFW databases. Hypotheses are tested using the chi-squared test on the binary outcomes,

the Mann–Whitney U test on the scalar responses, and Hartigan’s Dip for testing bimodality in the response distributions.

To test for bias in the latent space of the VQ-VAE network, we train an SVM with encoding vectors from two races, and

measure its performance as a binary classiﬁer.

The contributions of the paper are summarised as follows:

•

A demonstration that race bias can be attributed to several characteristics of the response distributions: different

means; different variances; bimodality; outliers.

•

A demonstration that non-specialised databases, such as RFW, can be used to analyse face anti-spooﬁng

algorithms.

• A VQ-VAE based network for face anti-spooﬁng.

The rest of the paper is organised as follows. In Section 2, we review the relevant literature. In Section 3, we describe

the VQ-VAE face anti-spooﬁng algorithm and the databases we used. In Section 4, we present the bias analysis on the

SiW database, and in Section 5 the bias analysis on the RFW database. We brieﬂy conclude in Section 6.

2 Background

We brieﬂy review the area of face anti-spooﬁng, and then studies of bias in machine learning, and PAD in particular.

2.1 Face anti-spooﬁng

The earlier machine learning approaches to PAD were based on handcrafted features [

], with Histogram of Oriented

Gradient (HOG) [5], and Local Binary Patterns (LBP) [3], among the most popular.

More recent approaches were based on CNNs, [

], or combinations of various deep network types [

], leading

to the current state of the art being based on various forms of deep learning [

], such as

Central Difference Convolutional Networks (CDCN) [

], or transformers [

]. Following some earlier approaches

[

], the current state of the art algorithms also utilise depth information [

], which can be estimated

by a independently trained neural network, while the use of GANs to estimate Near Infrared (NIR) information was

proposed [14].

The experiment presented in this paper is based on a one class trained autoencoder. Anomaly detection is a popular

approach [

], offering good generalisation to unseen attacks. In [

], images from face recognition

datasets were added to the two-class training set of an autoencoder, and improved cross-database generalisation was

reported. A similar behaviour was reported in [27] when images from the in-the-wild were added to the training set.

2.2 Databases

In this paper, the ﬁrst training set is from Replay-Attack [

], a database consisting of 50 subjects of three types of

ethnicities, 76% Caucasian, 22% Asian, and 2% African. Our second training set is from SiW [

], a database consisting

of 165 subjects, of four types of ethnicities, 35% of Asian and 35% Caucasian and 23% Indian, and 7% African

American. The bias analysis is performed on SiW with the subject annotated for ethnicity type by us, and the already

annotated RFW database [28].

Regarding other databases, NUAA [

] was one of the ﬁrst face anti-spooﬁng large databases. It is rarely used these

days as its low quality of imagery poses an unfair challenge in the cross-database validation of algorithms. MSU MFSD

[

] consists of 55 subjects, captured by four different devices, while OULU [

] has again 55 subjects captured by six

Race Bias Analysis of Bona Fide Errors in face anti-spooﬁng

different mobile devices. WMCA [

] contains 72 subjects and information is captured in RGB, depth, infrared, and

thermal. CASIA-SURF [33] consists of 1000 subjects captured in RGB, depth, and infrared.

The ﬁrst face anti-spooﬁng database to include explicit ethnic labels was CASIA-SURF CeFA [

], which has 1,607 in

three ethnicities, captured in three modalities. In this paper, for bias analysis we use the RFW [

], which includes four

types of ethnicities, Caucasian, Asian, Indian, and African. RFW does not specialise in face anti-spooﬁng, and it is

more widely used in the bias analysis literature.

2.3 Bias in machine learning

In [

], several high proﬁle cases of machine learning bias are documented; Google search results appeared to be biased

towards women in 2015; Hewlett-Packard’s software for web cameras struggled to recognize dark skin tones; and

Nikon’s camera software was inaccurately identifying Asian people as blinking.

Thus, given also the ethical, legal, and regulatory issues associated with the problem of bias within human populations,

there is a considerable amount of research on the subject, especially in face recognition (FR). A recent comprehensive

survey can be found in [

], where the signiﬁcant sources of bias [

] are categorised and discussed, and the

negative effect of bias on downstream learning tasks is pointed out. We also note that while the current deep learning

based FR algorithms are under intense scrutiny for potential bias [

], this is due to their wider deployment in real life

applications, rather than any evidence that they are more biased than traditional approaches.

In one of the earliest studies of bias in FR, predating deep learning, [

] reported differences in the performance on

humans of Caucasian and East Asian decent between Western and East Asia developed algorithms. In [

], several

deep learning based FR algorithms are analysed and a small amount of bias is detected in all of them. Then, the authors

show how this bias can be exploited to enhance the power of malicious morphing attacks to FR based security systems.

In [

], the authors compute cluster validation measures on the clusters of the various demographics inside the whole

population, aiming at measuring the algorithm’s potential for bias. Their result is negative, and they argue for the need of

more sophisticated clustering approaches. We note that in our paper, an investigation in the latent space of the potential

for bias, by measuring the discriminative power of SVMs over the various ethnicities, returned a similarly negative

result. In [

], the aim is the detection of bias by analysing the activation ratios at the various layers of the network.

Similarly to our work, their target application is the detection of race bias on a binary classiﬁcation problem, gender

classiﬁcation in their case. Their result is positive in that they report a correlation between the measured activation

ratios and bias in the ﬁnal outcomes of the classiﬁer. However, it is not clear if their method can be used to measure and

assess the statistical signiﬁcance of the expected bias.

In Cavazos et al. [

], similarly to our approach, most of the analysis assumes a one-sided error cost, in their case the

false acceptance rate, and the decision thresholds are treated as user deﬁned variables. However, the analytical tools

they used, mostly visual inspection of ROC curves, do not allow for a deep study of the distributions of the similarity

scores, while, here, we give a more detailed analysis of the distribution of the responses, which is the equivalent of the

similarity scores. In Pereira and Marcel [

], a fairness metric is proposed, which can be optimised over the decision

thresholds, but again, there is no in-depth statistical analysis of the scores, as we do here for the responses, and thus

they offer a more limited insight.

2.3.1 Bias in Presentation Attack Detection

The literature on bias in presentation attacks is more sparse. Race bias was the key theme in the competition of face

anti-spooﬁng algorithm on the CASIA-SURF CeFA database [

]. Bias was assessed by the performance of the

algorithm under a cross-ethnicity validation scenario. Standard performance metrics, such as APCER, BPCER and

ACER we reported. In [

], the standard CNN models Resnet 50 and VGG16, were compared for gender bias against

the debiasing-VAE proposed in [

], and several performance metrics were reported. In a recent white paper by the ID

R&D company, which develops face anti-spooﬁng software, the results of a large scale bias assessment experiment

conducted by Bixelab, a NIST accredited independent laboratory [

]. Similarly to our approach, they focus on the bona

ﬁde errors, and their aim is the BPCER error metric to be below a prespeciﬁed threshold across all demographics.

Regarding other biometric identiﬁcation modalities, [

] studied gender bias in iris PAD algorithms. They reported

three error metrics, APCER, BPCER, and HTER, ﬁnding that female users would be less protected against iris PAD

attacks.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RACEBIASANALYSISOFBONAFIDEERRORSINFACEANTI-SPOOFINGLatifahAbduh&IoannisIvrissimtzisDepartmentofComputerScience,DurhamUniversity,UKe-mail:{latifah.a.abduh,ioannis.ivrissimtzis}@durham.ac.ukABSTRACTThestudyofbiasinMachineLearningisreceivingalotofattentioninrecentyears,however,fewonlypapersdealexplicit...

展开>> 收起<<

RACE BIASANALYSIS OF BONA FIDEERRORS IN FACE ANTI -SPOOFING Latifah Abduh Ioannis Ivrissimtzis.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

RACE BIASANALYSIS OF BONA FIDEERRORS IN FACE ANTI -SPOOFING Latifah Abduh Ioannis Ivrissimtzis

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: