FAIRNESS VIA ADVERSARIAL ATTRIBUTE NEIGHBOURHOOD ROBUST LEARNING
2. Related Work
Bias Mitigation
To address the social bias towards certain demographic groups in deep neural network (DNN)
models (Lin et al., 2022; Zhang, 2021; Kiritchenko and Mohammad, 2018; Adragna et al., 2020; Buolamwini
and Gebru, 2018), many efficient methods have been proposed to reduce the model discrimination (Wang et al.,
2019; Wadsworth et al., 2018; Edwards and Storkey, 2015; Kim et al., 2019; Elazar and Goldberg, 2018; Singh
et al., 2020; Zunino et al., 2021; Rieger et al., 2020; Liu and Avci, 2019; Kusner et al., 2017; Kilbertus et al.,
2017; Cheng et al., 2021; Kang et al., 2019). Most methods in the above literature mainly focus on improving
the fairness of encoder representation. The authors of (Wang et al., 2019; Wadsworth et al., 2018; Edwards
and Storkey, 2015; Elazar and Goldberg, 2018) took the advantage of the adversarial training to reduce the
group discrimination. Rieger et al. (2020); Zunino et al. (2021) made use of the model explainability to
remove subset features that incurs bias, while Singh et al. (2020); Kim et al. (2019) concentrated on the causal
fairness features to get rid of undesirable bias correlation in the training. Bechavod and Ligett (2017) penalized
unfairness by using surrogate functions of fairness metrics as regularizers. However, directly working on a
biased representation to improves classification-head remains rare. Recently, the RNF method (Du et al., 2021)
averages the representation of sample pairs from different protected groups in the biased representation space
to remove the bias in the classification head. In this paper, we propose a principled RAAN objective capable
of debiasing encoder representations and classification heads at the same time.
Robust Loss
Several robust loss has been proposed to improve the model robustness for different tasks. The
general cross entropy (GCE) loss was proposed to solve the noisy label problem which emphasizes more
on the clean samples (Zhang and Sabuncu, 2018). For the data imbalanced problem, distributionally robust
learning (DRO) (Qi et al., 2020a; Li et al., 2020; Sagawa et al., 2019; Qi et al., 2020b) and class balance
loss (Cui et al., 2019; Cao et al., 2019) use instance-level and class-level robust weights to pay more attention
on underrepresented groups, respectively. Recently, Sagawa et al. (2019) shows that group DRO is able to
prevent the models learning the specific spurious correlations. The above robust objective are defined on the
loss space with the assistance of label information. Exploiting useful information from feature representation
space to further benefit the task-specific training remains under-explored.
Invariant Risk Minimization (IRM)
IRM (Arjovsky et al., 2019) is a novel paradigm to enhance model
generalization in domain adaptation by learning the invariant sample feature representations across different
"domains" or "environments". By optimizing a practical version of IRM in the toxicity classification usecase
study, Adragna et al. (2020) shows the strength of IRM over ERM in improving the fairness of classifiers
on the biased subsets of Civil Comments dataset. To elicit an invariant feature representation, IRM is
casted into a constrained (bi-level) optimization problem where the classifier
wc
is constrained on a optimal
uncertainty set. Instead, the RAAN objective constrains the adversarial robust weights, which are defined in
pairwise representation similarity space penalized by KL divergence. When the embedding representation
z
is parameterized by trainable encoders
wf
, RAAN generates a more uniform representation space across
different sensitive groups.
Stochastic Optimization
Recently, several stochastic optimization technique has been leveraged to design
efficient stochastic algorithms with provable theoretical convergence for the robust surrogate objectives, such as
F-measure (Zhang et al., 2018b), average precision (AP) (Qi et al., 2021b), and area under curves (AUC) (Liu
et al., 2019, 2018; Yuan et al., 2021). In this paper, we cast the fairness promoting RAAN loss as a two-level
stochastic coupled compositional function with a general formulation of Eξ[f(Eζg(w;ζ, ξ))], where ξ, ζ are
independent and
ξ
has a finite support. By exploring the advanced stochastic compositional optimization
technique (Wang et al., 2017; Qi et al., 2021a), a stochastic algorithm SCRANN with both SGD-style and
Adam-style updates is proposed to solve the RAAN with provable convergence.
3. Robust Adversarial Attribute Neighbourhood (RAAN) Loss
3.1 Notations
We first introduce some notations. The collected data is denoted by
D={d}n
i=1 ={(xi, yi, ai)}n
i=1
,
where
xi∈ X
is the data,
yi∈ Y
is the label,
ai∈ A
is the corresponding attribute (e.g., race, gender),
3