•
The architectures we synthesize via NAS and HPO generalize to other datasets and sensitive
attributes. Notably, these architectures also reduce the linear separability of protected attributes,
indicating their effectiveness in mitigating bias across different contexts.
We release our code and raw results at
https://github.com/dooleys/FR-NAS
, so that users can
easily adapt our approach to any bias metric or dataset.
2 Background and Related Work
Face Identification. Face recognition tasks can be broadly categorized into two distinct categories:
verification and identification. Our specific focus lies in face identification tasks which ask whether
a given person in a source image appears within a gallery composed of many target identities and
their associated images; this is a one-to-many comparison. Novel techniques in face recognition
tasks, such as ArcFace [108], CosFace [23], and MagFace [75], use deep networks (often called the
backbone) to extract feature representations of faces and then compare those to match individuals
(with mechanisms called the head). Generally, backbones take the form of image feature extractors
and heads resemble MLPs with specialized loss functions. Often, the term “head” refers to both
the last layer of the network and the loss function. Our analysis primarily centers around the face
identification task, and we focus our evaluation on examining how close images of similar identities
are in the feature space of trained models, since the technology relies on this feature representation to
differentiate individuals. An overview of these topics can be found in Wang and Deng [109].
Bias Mitigation in Face Recognition. The existence of differential performance of face recognition
on population groups and subgroups has been explored in a variety of settings. Earlier work [e.g.,
57
,
82
] focuses on single-demographic effects (specifically, race and gender) in pre-deep-learning face
detection and recognition. Buolamwini and Gebru
[5]
uncover unequal performance at the phenotypic
subgroup level in, specifically, a gender classification task powered by commercial systems. Raji
and Buolamwini
[90]
provide a follow-up analysis – exploring the impact of the public disclosures
of Buolamwini and Gebru
[5]
– where they discovered that named companies (IBM, Microsoft,
and Megvii) updated their APIs within a year to address some concerns that had surfaced. Further
research continues to show that commercial face recognition systems still have socio-demographic
disparities in many complex and pernicious ways [29,27,54,54,26].
Facial recognition is a large and complex space with many different individual technologies, some
with bias mitigation strategies designed just for them [
63
,
118
]. The main bias mitigation strategies
for facial identification are described in Section 4.2.
Neural Architecture Search (NAS) and Hyperparameter Optimization (HPO). Deep learning
derives its success from the manually designed feature extractors which automate the feature engi-
neering process. Neural Architecture Search (NAS) [
30
,
116
], on the other hand, aims at automating
the very design of network architectures for a task at hand. NAS can be seen as a subset of HPO
[
33
], which refers to the automated search for optimal hyperparameters, such as learning rate, batch
size, dropout, loss function, optimizer, and architectural choices. Rapid and extensive research on
NAS for image classification and object detection has been witnessed as of late [
67
,
125
,
121
,
88
,
6
].
Deploying NAS techniques in face recognition systems has also seen a growing interest [
129
,
113
].
For example, reinforcement learning-based NAS strategies [
121
] and one-shot NAS methods [
113
]
have been deployed to search for an efficient architecture for face recognition with low error. How-
ever, in a majority of these methods, the training hyperparameters for the architectures are fixed. We
observe that this practice should be reconsidered in order to obtain the fairest possible face recognition
systems. Moreover, one-shot NAS methods have also been applied for multi-objective optimization
[
39
,
7
], e.g., optimizing accuracy and parameter size. However, none of these methods can be applied
for a joint architecture and hyperparameter search, and none of them have been used to optimize
fairness.
For the case of tabular datasets, a few works have applied hyperparameter optimization to mitigate
bias in models. Perrone et al.
[87]
introduced a Bayesian optimization framework to optimize
accuracy of models while satisfying a bias constraint. Schmucker et al.
[97]
and Cruz et al.
[17]
extended Hyperband [
64
] to the multi-objective setting and showed its applications to fairness. Lin
et al.
[65]
proposed de-biasing face recognition models through model pruning. However, they only
considered two architectures and just one set of fixed hyperparameters. To the best of our knowledge,
3