FAIRNESS VIA ADVERSARIAL ATTRIBUTE NEIGHBOURHOOD ROBUST LEARNING Fairness via Adversarial Attribute Neighbourhood Robust Learning Qi QiyQI-QIUIOWA .EDU

2025-04-24 0 0 1.11MB 26 页 10玖币
侵权投诉
FAIRNESS VIA ADVERSARIAL ATTRIBUTE NEIGHBOURHOOD ROBUST LEARNING
Fairness via Adversarial Attribute Neighbourhood Robust Learning
Qi QiQI-QI@UIOWA.EDU
Shervin ArdeshirSHERVINA@NETFLIX.COM
Yi Xu-YXU@DLUT.EDU.CN
Tianbao Yang.TIANBAO-YANG@TAMU.EDU
Department of Computer Science, The University of Iowa, Iowa City, IA 52242, USA
Netflix,100 Winchester Circle, Los Gatos, CA 95032, USA
-School of Artificial Intelligence, Dalian University of Technology, Dalian, Liaoning 116024, China
.Department of Computer Science & Engineering, Texas A&M University, College Station, TX 77843, USA
Abstract
Improving fairness between privileged and less-privileged sensitive attribute groups (e.g, race, gender) has
attracted lots of attention. To enhance the model performs uniformly well in different sensitive attributes, we
propose a principled
R
obust
A
dversarial
A
ttribute
N
eighbourhood (RAAN) loss to debias the classification head
and promote a fairer representation distribution across different sensitive attribute groups. The key idea of RAAN
is to mitigate the differences of biased representations between different sensitive attribute groups by assigning
each sample an adversarial robust weight, which is defined on the representations of adversarial attribute
neighbors, i.e, the samples from different protected groups. To provide efficient optimization algorithms, we
cast the RAAN into a sum of coupled compositional functions and propose a stochastic adaptive (Adam-style)
and non-adaptive (SGD-style) algorithm framework SCRAAN with provable theoretical guarantee. Extensive
empirical studies on fairness-related benchmark datasets verify the effectiveness of the proposed method.
1. Introduction
With the excellent performance, machine learning methods have penetrated into many fields and brought impact
into our daily lifes, such as the recommendation (Lin et al., 2022; Zhang, 2021), sentiment analysis (Kiritchenko
and Mohammad, 2018; Adragna et al., 2020) and facial detection systems (Buolamwini and Gebru, 2018). Due
to the existing bias and confounding factors in the training data (Fabbrizzi et al., 2022; Torralba and Efros, 2011),
model predictions are often correlated with sensitive attributes, e.g, race, gender, which leads to undesirable
outcomes. Hence, fairness concern has become increasingly prominent. For example, the job recommendation
system recommends lower wage jobs more likely to women than men (Zhang, 2021). Buolamwini and Gebru
(2018) proposed an intersectional approach that quantitatively show that three commercial gender classifiers,
proposed by Microsoft, IBM and Face++, have higher error rate for the darker-skinned populations.
To alleviate the effect of spurious correlations
1
between the sensitive attribute groups and prediction, many
bias mitigation methods have been proposed to learn a debiased representation distribution at encoder level
by taking the advantage of the adversarial learning (Wang et al., 2019; Wadsworth et al., 2018; Edwards and
Storkey, 2015; Elazar and Goldberg, 2018), causal inference (Singh et al., 2020; Kim et al., 2019) and invariant
risk minimization (Adragna et al., 2020; Arjovsky et al., 2019). Recently, in order to further improve the
performance and reduce computational costs for large-scale data training, learning a classification head using
the representation of pretrained models have been widely used for different tasks. Taking image classification
for example, the downstream tasks are trained by finetuning the classification head of ImageNet pretrained
ResNet (He et al., 2016) model (Qi et al., 2020a; Kang et al., 2019). However, the pretrained model may
introduce the undesiarable bias for the downstreaming tasks. Debiasing the encoder of pretrained models to
have fairer representations by retraining is time-consuming and compuatational expensive. Hence, debiasing
the classification head on biased representations is also of great importance.
In this paper, we raise two research questions: Can we improve the fairness of the classification head
on a biased representation space? Can we further reduce the bias in the representation space? We give
affirmative answers by proposing a
R
obust
A
dversarial
A
ttribute
N
eighborhood (RAAN) loss. Our work is
1. misleading heuristics that work for most training examples but do not always hold.
©2022 Qi, Ardeshir, Xu, Yang.
arXiv:2210.06630v1 [cs.LG] 12 Oct 2022
QI, ARDESHIR, XU, YANG
Figure 1: The influence of different protected group distributions on the classification head. The colors ({red,
blue}) represent the sensitive attributes and shapes ({triangle, circle}) represent the ground truth class labels.
Figures (a), (b) are optimized using vanilla CE loss, while the figure (c) is optimized using the proposed
RAAN loss defined on the adversarial attributes neighborhood. The yellow and green background denote the
predicted classification space.
inspired by the RNF method (Du et al., 2021), which averages the representation of sample pairs from different
protected groups to alleviate the undesirable correlation between sensitive information and specific class labels.
But unlike RNF, RAAN obtains fairness-promoting adversarial robust weights by exploring the
A
dversarial
A
ttribute
N
eighborhood (AAN) representation structure for each sample to mitigate the differences of biased
sensitive attribute representations. To be more specific, the adversarial robust weight for each sample is the
aggregation of the pairwise robust weights defined on the representation similarity between the sample and its
AAN. Hence, the greater the representation similarity, the more uniform the distribution of protected groups in
the representation space. Therefore, by promoting higher pairwise weights for larger similarity pairs, RAAN
is able to mitigate the discrimination of the biased senstive attribute representations and promote a fairer
classification head. When the representation is fixed, RAAN is also applicable to debiasing the classification
head only.
We use a toy example of binary classification to express the advantages of RAAN over standard cross-
entropy (CE) training on the biased sensitive attribute group distributions in Figure 1. Figure 1 (a) represents a
uniform/fair distribution across different sensitive attributes while a biased distribution that the red samples
are more aggregated in the top left area than the blue samples are depicted in Figure 1 (b), (c). Then with
the vanilla CE training, Figure 1 (a) ends up with a fair classifier determined by the ground truth task labels
(shapes) while a biased classification head determined by sensitive attributes (colors) is generated in Figure 1
(b). Instead, our RAAN method generates a fair classifier in Figure 1 (c), the same as a classifier learned
from the Figure 1 (a) generated from a fair distribution. To this end, the main contributions of our work are
summarized below:
We propose a robust loss RAAN to debias the classification head by assigning adversarial robust weights
defined on the top of biased representation space. When the representation is parameterized by trainable
encoders such as convolutional layers in ResNets, RAAN is able to further debiase the representation
distribution.
We propose an efficient
S
tochastic
C
ompositional algorithm framework for RAAN (SCRAAN), which
includes the SGD-style and Adam-style updates with theoretical guarantee.
Empirical studies on fairness-related datasets verify the supreme performance of the proposed SCRAAN
on two fairness, Equalized Odd difference (
EO), Demographic Parity difference (
DP) and worst
group accuracy.
2
FAIRNESS VIA ADVERSARIAL ATTRIBUTE NEIGHBOURHOOD ROBUST LEARNING
2. Related Work
Bias Mitigation
To address the social bias towards certain demographic groups in deep neural network (DNN)
models (Lin et al., 2022; Zhang, 2021; Kiritchenko and Mohammad, 2018; Adragna et al., 2020; Buolamwini
and Gebru, 2018), many efficient methods have been proposed to reduce the model discrimination (Wang et al.,
2019; Wadsworth et al., 2018; Edwards and Storkey, 2015; Kim et al., 2019; Elazar and Goldberg, 2018; Singh
et al., 2020; Zunino et al., 2021; Rieger et al., 2020; Liu and Avci, 2019; Kusner et al., 2017; Kilbertus et al.,
2017; Cheng et al., 2021; Kang et al., 2019). Most methods in the above literature mainly focus on improving
the fairness of encoder representation. The authors of (Wang et al., 2019; Wadsworth et al., 2018; Edwards
and Storkey, 2015; Elazar and Goldberg, 2018) took the advantage of the adversarial training to reduce the
group discrimination. Rieger et al. (2020); Zunino et al. (2021) made use of the model explainability to
remove subset features that incurs bias, while Singh et al. (2020); Kim et al. (2019) concentrated on the causal
fairness features to get rid of undesirable bias correlation in the training. Bechavod and Ligett (2017) penalized
unfairness by using surrogate functions of fairness metrics as regularizers. However, directly working on a
biased representation to improves classification-head remains rare. Recently, the RNF method (Du et al., 2021)
averages the representation of sample pairs from different protected groups in the biased representation space
to remove the bias in the classification head. In this paper, we propose a principled RAAN objective capable
of debiasing encoder representations and classification heads at the same time.
Robust Loss
Several robust loss has been proposed to improve the model robustness for different tasks. The
general cross entropy (GCE) loss was proposed to solve the noisy label problem which emphasizes more
on the clean samples (Zhang and Sabuncu, 2018). For the data imbalanced problem, distributionally robust
learning (DRO) (Qi et al., 2020a; Li et al., 2020; Sagawa et al., 2019; Qi et al., 2020b) and class balance
loss (Cui et al., 2019; Cao et al., 2019) use instance-level and class-level robust weights to pay more attention
on underrepresented groups, respectively. Recently, Sagawa et al. (2019) shows that group DRO is able to
prevent the models learning the specific spurious correlations. The above robust objective are defined on the
loss space with the assistance of label information. Exploiting useful information from feature representation
space to further benefit the task-specific training remains under-explored.
Invariant Risk Minimization (IRM)
IRM (Arjovsky et al., 2019) is a novel paradigm to enhance model
generalization in domain adaptation by learning the invariant sample feature representations across different
"domains" or "environments". By optimizing a practical version of IRM in the toxicity classification usecase
study, Adragna et al. (2020) shows the strength of IRM over ERM in improving the fairness of classifiers
on the biased subsets of Civil Comments dataset. To elicit an invariant feature representation, IRM is
casted into a constrained (bi-level) optimization problem where the classifier
wc
is constrained on a optimal
uncertainty set. Instead, the RAAN objective constrains the adversarial robust weights, which are defined in
pairwise representation similarity space penalized by KL divergence. When the embedding representation
z
is parameterized by trainable encoders
wf
, RAAN generates a more uniform representation space across
different sensitive groups.
Stochastic Optimization
Recently, several stochastic optimization technique has been leveraged to design
efficient stochastic algorithms with provable theoretical convergence for the robust surrogate objectives, such as
F-measure (Zhang et al., 2018b), average precision (AP) (Qi et al., 2021b), and area under curves (AUC) (Liu
et al., 2019, 2018; Yuan et al., 2021). In this paper, we cast the fairness promoting RAAN loss as a two-level
stochastic coupled compositional function with a general formulation of Eξ[f(Eζg(w;ζ, ξ))], where ξ, ζ are
independent and
ξ
has a finite support. By exploring the advanced stochastic compositional optimization
technique (Wang et al., 2017; Qi et al., 2021a), a stochastic algorithm SCRANN with both SGD-style and
Adam-style updates is proposed to solve the RAAN with provable convergence.
3. Robust Adversarial Attribute Neighbourhood (RAAN) Loss
3.1 Notations
We first introduce some notations. The collected data is denoted by
D={d}n
i=1 ={(xi, yi, ai)}n
i=1
,
where
xi∈ X
is the data,
yi∈ Y
is the label,
ai∈ A
is the corresponding attribute (e.g., race, gender),
3
QI, ARDESHIR, XU, YANG
and
n
is the number of samples. We divide the data into different subsets based on labels and attributes.
For any label
c∈ Y
and attribute
a∈ A
, we denote
Dc
a={(xi, yi, ai)|ai=ayi=c}n
i=1
and
Dc={(xi, yi, ai)|yi=c}n
i=1
. Then we have
Dc=∪{Dc
a}|A|
a=1
. Given a deep neural network, the model
weights
w
can be decomposed into two parts, the
F
eature presentation parameters
wf
and the
C
lassification
head parameters
wc
, i.e,
w= [wf,wc]
. For example,
wf
and
wc
are mapped into the convolutions layers and
fully connected layers in ResNets, respectively.
Fwf(·)
represents the feature encoder mapping from
X → Z
,
and Hwc(·)represents the classification head mapping from Z → Y. Then zi(wf) = Fwf(xi)∈ Z denotes
the embedding representation of the sample
di
.
Hwc(zi(wf))
represents the output of the classification head.
The key idea of RAAN is to assign a fairness-promoting adversarial robust weight for each sample by
exploring the AAN representation structure to reduce the disparity across different sensitive attributes. The
AAN of the sample
di= (xi, yi=c, ai=a)
is defined as the samples from the same class but with
different attributes, i.e,
Pi=Pc
a=Dc\Dc
a
. For example, considering a binary protected sensitive attribute
{female, male}
, the AAN of a sample belonging to the
male
protected group with class label of
c∈ Y
is the collection of the
female
attribute samples with the same class
c
. Then the adversarial robust weight
for every sample
di∼ D
is represented as
pAAN
i
, which is an aggregation of the pairwise weights between
di
and its AAN neighbours in
Pi
. Next, we denote the pairwise robust weights between the sample
di
and
dj∈ Pi
in the representation space as
pAAN
ij
. When the context is clear, we abuse the notations by using
pAAN
i= [pAAN
i1,··· , pAAN
ij ,···]R|Pi|
to represent the pairwise robust weights vector defined in
Pi
, i.e, the
AAN of di.
3.2 RAAN Objective
To explore the AAN representation structure and obtain the pairwise robust weights, we define the following
robust constrained objective for di∼ D,
`AAN
i=X
j∈Pi
pAAN
ij `(w;xj, yj, aj)(1)
s.t. max
pAAN
i|Pi|X
j∈Pi
pAAN
ij zi(wf)>zj(wf)τKL pAAN
i,1
|Pi|,1R|Pi|(2)
where
is a simplex that
P|Pi|
j=1 pij = 1
. The robust loss (1) is a weighted average combination of the
AAN loss. The robust constraint (2) is defined in the pairwise representation similarity between the sample i
and its AAN neighbours penalized by the KL divergence regularizer, which has been extensively studied in
d
istributionally
r
obust learning
o
bjective (DRO) to improve the robustness of the model in the loss space (Qi
et al., 2020a). Here, we adopt the DRO with KL divergence constraint to the representation space to generate a
uniform distribution across different sensitive attributes.
Controlled by the hyperparameter
τ
, the close form solution of
pAAN
i
in (2) guarantees that the larger the
pairwise similarity
z>
i(wf)zj(wf)
is, the higher the
pAAN
ij
will be. When
τ= 0
, the close form solution of (2)
is 1 for the pair with the largest similarity and 0 on others. When τ > 0, due to the strong convexity in terms
of pAAN
i, the close form solution of (2) for each pair weight between diand dj∈ Piis:
pAAN
ij =exp(z>
i(wf)zj(wf)
τ)
P
k∈Pi
exp(z>
i(wf)zk(wf)
τ)
.(3)
Hence the larger the
τ
is, the more uniform of
pAAN
i
will be. And it is apparent to see that the robust objective
generates equal weights for every pair such that
pAAN
ij =1
|Pi|
for every
dj∈ Pi
when
τ
approaches to
the infinity in (3). When we have a fair representation, the embeddings of different protected groups are
uniform distributed in the representation space. The vanilla average loss training is good enough to have a fair
classification head, which equals to RAAN with
τ
goes to infinity. When we have biased representations, we
4
FAIRNESS VIA ADVERSARIAL ATTRIBUTE NEIGHBOURHOOD ROBUST LEARNING
Figure 2: Improvement of Representation Fairness
Figure 3: Training overview of (RL)-
RAAN
use a smaller
τ
to emphasize on the similar representations that shared invariant feature from two different
protected groups to reduce the bias introduced from difference of the two protected group distributions.
To this end, after having the close form solution for every pairwise robust weights
pAAN
ij
(3) and plugging
back into `AAN
i(1) for any arbitrary sample di∼ D, the overall RAAN objective is defined as:
RAAN(w) := 1
C
C
X
c=1
1
A
A
X
a=1
1
|Dc
a|
|Dc
a|
X
i=1
`AAN
i=1
AC
n
X
j=1
pAAN
j`(w;xj, yj, aj),(4)
where
C=|Y|
,
A=|A|
,
`AAN
i=P
j∈Pi
pAAN
ij `(w;xj, yj, aj)
is defined in (1),
pAAN
ij
is defined in (3), and
pANN
j=1
|Pj|P
i∈Pj
exp( z>
i(wf)zj(wf)
τ)
P
k∈Pi
exp( z>
i(wf)zk(wf)
τ)
is obtained by we aggregating all the pairwise robust weights in
Pj
. Hence, the adversarial robust weights
pAAN
j
for each sample
dj= (xj, c, a)∼ D
, encodes the intrinsic
representation neighbourhood structure between the sample and its AAN neighbors di∈ Pj(the numerator)
and normalized by the similarity pairs from the same protected groups
dk∈ Pi
, i.e,
Dc
a
(the denominator).
Due to the limitation of space, we put the second equality derivation of equation (4) in Appendix.
3.3 Representation Learning Robust Adversarial Attribute Neighbourhood (RL-RAAN) Loss
AANs are defined over the encoder representation outputs
z(wf)
. By default, the RAAN loss aims to promote
a fairer classification head on a fixed bias representation distribution, i.e,
wf
(recall that
w= [wf,wc]
) is not
trainable. By parameterizing the AANs with trainable encoder parameters, i.e,
wf
is trainable, we extend the
RAAN to the
R
epresentation
L
earnining RAAN (RL-RAAN), which is able to further debias the representation
encoder. Hence, RAAN optimizes the
wc
while RL-RAAN jointly optimizes
[wf,wc]
. To design efficient
stochastic algorithms, RL-RAAN requires more sophisticated stochastic estimators than RAAN, which we will
discuss later in Section 3. Depending on whether
wf
is trainable, the red dashed arrow in Figure 3 describes
the optional gradients backward toward the feature representations during the training.
Here, we demonstrate the effectiveness of RL-RAAN in generating a more uniform representation distribu-
tion across different sensitive groups in Figure 2. To achieve this, we visualize the representation distributions
learned from vanilla CE (left plot) and RL-RAAN (right plot) methods using the Kernel-PCA dimensionality
reduction method with radial basis function (rbf) kernels. It is clear to see that white attribute samples are
more clustered in the upper left corner of CE representation projection while both the white and non-white
attributes samples are uniformly distributed in the representation projection of RL-RAAN.
4. Stochastic Compositional Optimization for RAAN (SCRAAN)
In this section, we provide a general
S
tochastic
C
ompositional optimization algorithm framework for RAAN
(SCRAAN). The SCRAAN applies to both RAAN and RL-RAAN objective. We first show that (RL)-RAAN
5
摘要:

FAIRNESSVIAADVERSARIALATTRIBUTENEIGHBOURHOODROBUSTLEARNINGFairnessviaAdversarialAttributeNeighbourhoodRobustLearningQiQiyQI-QI@UIOWA.EDUShervinArdeshirzSHERVINA@NETFLIX.COMYiXu-YXU@DLUT.EDU.CNTianbaoYang.TIANBAO-YANG@TAMU.EDUyDepartmentofComputerScience,TheUniversityofIowa,IowaCity,IA52242,USAzNeti...

展开>> 收起<<
FAIRNESS VIA ADVERSARIAL ATTRIBUTE NEIGHBOURHOOD ROBUST LEARNING Fairness via Adversarial Attribute Neighbourhood Robust Learning Qi QiyQI-QIUIOWA .EDU.pdf

共26页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:26 页 大小:1.11MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 26
客服
关注