FAIRNESS VIA ADVERSARIAL ATTRIBUTE NEIGHBOURHOOD ROBUST LEARNING Fairness via Adversarial Attribute Neighbourhood Robust Learning Qi QiyQI-QIUIOWA .EDU

2025-04-24 0 0 1.11MB 26 页 10玖币

侵权投诉

FAIRNESS VIA ADVERSARIAL ATTRIBUTE NEIGHBOURHOOD ROBUST LEARNING

Fairness via Adversarial Attribute Neighbourhood Robust Learning

Qi Qi†QI-QI@UIOWA.EDU

Shervin Ardeshir‡SHERVINA@NETFLIX.COM

Yi Xu-YXU@DLUT.EDU.CN

Tianbao Yang.TIANBAO-YANG@TAMU.EDU

†Department of Computer Science, The University of Iowa, Iowa City, IA 52242, USA

‡Netﬂix,100 Winchester Circle, Los Gatos, CA 95032, USA

-School of Artiﬁcial Intelligence, Dalian University of Technology, Dalian, Liaoning 116024, China

.Department of Computer Science & Engineering, Texas A&M University, College Station, TX 77843, USA

Abstract

Improving fairness between privileged and less-privileged sensitive attribute groups (e.g, race, gender) has

attracted lots of attention. To enhance the model performs uniformly well in different sensitive attributes, we

propose a principled

obust

dversarial

ttribute

eighbourhood (RAAN) loss to debias the classiﬁcation head

and promote a fairer representation distribution across different sensitive attribute groups. The key idea of RAAN

is to mitigate the differences of biased representations between different sensitive attribute groups by assigning

each sample an adversarial robust weight, which is deﬁned on the representations of adversarial attribute

neighbors, i.e, the samples from different protected groups. To provide efﬁcient optimization algorithms, we

cast the RAAN into a sum of coupled compositional functions and propose a stochastic adaptive (Adam-style)

and non-adaptive (SGD-style) algorithm framework SCRAAN with provable theoretical guarantee. Extensive

empirical studies on fairness-related benchmark datasets verify the effectiveness of the proposed method.

1. Introduction

With the excellent performance, machine learning methods have penetrated into many ﬁelds and brought impact

into our daily lifes, such as the recommendation (Lin et al., 2022; Zhang, 2021), sentiment analysis (Kiritchenko

and Mohammad, 2018; Adragna et al., 2020) and facial detection systems (Buolamwini and Gebru, 2018). Due

to the existing bias and confounding factors in the training data (Fabbrizzi et al., 2022; Torralba and Efros, 2011),

model predictions are often correlated with sensitive attributes, e.g, race, gender, which leads to undesirable

outcomes. Hence, fairness concern has become increasingly prominent. For example, the job recommendation

system recommends lower wage jobs more likely to women than men (Zhang, 2021). Buolamwini and Gebru

(2018) proposed an intersectional approach that quantitatively show that three commercial gender classiﬁers,

proposed by Microsoft, IBM and Face++, have higher error rate for the darker-skinned populations.

To alleviate the effect of spurious correlations

between the sensitive attribute groups and prediction, many

bias mitigation methods have been proposed to learn a debiased representation distribution at encoder level

by taking the advantage of the adversarial learning (Wang et al., 2019; Wadsworth et al., 2018; Edwards and

Storkey, 2015; Elazar and Goldberg, 2018), causal inference (Singh et al., 2020; Kim et al., 2019) and invariant

risk minimization (Adragna et al., 2020; Arjovsky et al., 2019). Recently, in order to further improve the

performance and reduce computational costs for large-scale data training, learning a classiﬁcation head using

the representation of pretrained models have been widely used for different tasks. Taking image classiﬁcation

for example, the downstream tasks are trained by ﬁnetuning the classiﬁcation head of ImageNet pretrained

ResNet (He et al., 2016) model (Qi et al., 2020a; Kang et al., 2019). However, the pretrained model may

introduce the undesiarable bias for the downstreaming tasks. Debiasing the encoder of pretrained models to

have fairer representations by retraining is time-consuming and compuatational expensive. Hence, debiasing

the classiﬁcation head on biased representations is also of great importance.

In this paper, we raise two research questions: Can we improve the fairness of the classiﬁcation head

on a biased representation space? Can we further reduce the bias in the representation space? We give

afﬁrmative answers by proposing a

obust

dversarial

ttribute

eighborhood (RAAN) loss. Our work is

1. misleading heuristics that work for most training examples but do not always hold.

arXiv:2210.06630v1 [cs.LG] 12 Oct 2022

QI, ARDESHIR, XU, YANG

Figure 1: The inﬂuence of different protected group distributions on the classiﬁcation head. The colors ({red,

blue}) represent the sensitive attributes and shapes ({triangle, circle}) represent the ground truth class labels.

Figures (a), (b) are optimized using vanilla CE loss, while the ﬁgure (c) is optimized using the proposed

RAAN loss deﬁned on the adversarial attributes neighborhood. The yellow and green background denote the

predicted classiﬁcation space.

inspired by the RNF method (Du et al., 2021), which averages the representation of sample pairs from different

protected groups to alleviate the undesirable correlation between sensitive information and speciﬁc class labels.

But unlike RNF, RAAN obtains fairness-promoting adversarial robust weights by exploring the

dversarial

ttribute

eighborhood (AAN) representation structure for each sample to mitigate the differences of biased

sensitive attribute representations. To be more speciﬁc, the adversarial robust weight for each sample is the

aggregation of the pairwise robust weights deﬁned on the representation similarity between the sample and its

AAN. Hence, the greater the representation similarity, the more uniform the distribution of protected groups in

the representation space. Therefore, by promoting higher pairwise weights for larger similarity pairs, RAAN

is able to mitigate the discrimination of the biased senstive attribute representations and promote a fairer

classiﬁcation head. When the representation is ﬁxed, RAAN is also applicable to debiasing the classiﬁcation

head only.

We use a toy example of binary classiﬁcation to express the advantages of RAAN over standard cross-

entropy (CE) training on the biased sensitive attribute group distributions in Figure 1. Figure 1 (a) represents a

uniform/fair distribution across different sensitive attributes while a biased distribution that the red samples

are more aggregated in the top left area than the blue samples are depicted in Figure 1 (b), (c). Then with

the vanilla CE training, Figure 1 (a) ends up with a fair classiﬁer determined by the ground truth task labels

(shapes) while a biased classiﬁcation head determined by sensitive attributes (colors) is generated in Figure 1

(b). Instead, our RAAN method generates a fair classiﬁer in Figure 1 (c), the same as a classiﬁer learned

from the Figure 1 (a) generated from a fair distribution. To this end, the main contributions of our work are

summarized below:

•

We propose a robust loss RAAN to debias the classiﬁcation head by assigning adversarial robust weights

deﬁned on the top of biased representation space. When the representation is parameterized by trainable

encoders such as convolutional layers in ResNets, RAAN is able to further debiase the representation

distribution.

•

We propose an efﬁcient

tochastic

ompositional algorithm framework for RAAN (SCRAAN), which

includes the SGD-style and Adam-style updates with theoretical guarantee.

•

Empirical studies on fairness-related datasets verify the supreme performance of the proposed SCRAAN

on two fairness, Equalized Odd difference (

∆

EO), Demographic Parity difference (

∆

DP) and worst

group accuracy.

FAIRNESS VIA ADVERSARIAL ATTRIBUTE NEIGHBOURHOOD ROBUST LEARNING

2. Related Work

Bias Mitigation

To address the social bias towards certain demographic groups in deep neural network (DNN)

models (Lin et al., 2022; Zhang, 2021; Kiritchenko and Mohammad, 2018; Adragna et al., 2020; Buolamwini

and Gebru, 2018), many efﬁcient methods have been proposed to reduce the model discrimination (Wang et al.,

2019; Wadsworth et al., 2018; Edwards and Storkey, 2015; Kim et al., 2019; Elazar and Goldberg, 2018; Singh

et al., 2020; Zunino et al., 2021; Rieger et al., 2020; Liu and Avci, 2019; Kusner et al., 2017; Kilbertus et al.,

2017; Cheng et al., 2021; Kang et al., 2019). Most methods in the above literature mainly focus on improving

the fairness of encoder representation. The authors of (Wang et al., 2019; Wadsworth et al., 2018; Edwards

and Storkey, 2015; Elazar and Goldberg, 2018) took the advantage of the adversarial training to reduce the

group discrimination. Rieger et al. (2020); Zunino et al. (2021) made use of the model explainability to

remove subset features that incurs bias, while Singh et al. (2020); Kim et al. (2019) concentrated on the causal

fairness features to get rid of undesirable bias correlation in the training. Bechavod and Ligett (2017) penalized

unfairness by using surrogate functions of fairness metrics as regularizers. However, directly working on a

biased representation to improves classiﬁcation-head remains rare. Recently, the RNF method (Du et al., 2021)

averages the representation of sample pairs from different protected groups in the biased representation space

to remove the bias in the classiﬁcation head. In this paper, we propose a principled RAAN objective capable

of debiasing encoder representations and classiﬁcation heads at the same time.

Robust Loss

Several robust loss has been proposed to improve the model robustness for different tasks. The

general cross entropy (GCE) loss was proposed to solve the noisy label problem which emphasizes more

on the clean samples (Zhang and Sabuncu, 2018). For the data imbalanced problem, distributionally robust

learning (DRO) (Qi et al., 2020a; Li et al., 2020; Sagawa et al., 2019; Qi et al., 2020b) and class balance

loss (Cui et al., 2019; Cao et al., 2019) use instance-level and class-level robust weights to pay more attention

on underrepresented groups, respectively. Recently, Sagawa et al. (2019) shows that group DRO is able to

prevent the models learning the speciﬁc spurious correlations. The above robust objective are deﬁned on the

loss space with the assistance of label information. Exploiting useful information from feature representation

space to further beneﬁt the task-speciﬁc training remains under-explored.

Invariant Risk Minimization (IRM)

IRM (Arjovsky et al., 2019) is a novel paradigm to enhance model

generalization in domain adaptation by learning the invariant sample feature representations across different

"domains" or "environments". By optimizing a practical version of IRM in the toxicity classiﬁcation usecase

study, Adragna et al. (2020) shows the strength of IRM over ERM in improving the fairness of classiﬁers

on the biased subsets of Civil Comments dataset. To elicit an invariant feature representation, IRM is

casted into a constrained (bi-level) optimization problem where the classiﬁer

is constrained on a optimal

uncertainty set. Instead, the RAAN objective constrains the adversarial robust weights, which are deﬁned in

pairwise representation similarity space penalized by KL divergence. When the embedding representation

is parameterized by trainable encoders

, RAAN generates a more uniform representation space across

different sensitive groups.

Stochastic Optimization

Recently, several stochastic optimization technique has been leveraged to design

efﬁcient stochastic algorithms with provable theoretical convergence for the robust surrogate objectives, such as

F-measure (Zhang et al., 2018b), average precision (AP) (Qi et al., 2021b), and area under curves (AUC) (Liu

et al., 2019, 2018; Yuan et al., 2021). In this paper, we cast the fairness promoting RAAN loss as a two-level

stochastic coupled compositional function with a general formulation of Eξ[f(Eζg(w;ζ, ξ))], where ξ, ζ are

independent and

has a ﬁnite support. By exploring the advanced stochastic compositional optimization

technique (Wang et al., 2017; Qi et al., 2021a), a stochastic algorithm SCRANN with both SGD-style and

Adam-style updates is proposed to solve the RAAN with provable convergence.

3. Robust Adversarial Attribute Neighbourhood (RAAN) Loss

3.1 Notations

We ﬁrst introduce some notations. The collected data is denoted by

D={d}n

i=1 ={(xi, yi, ai)}n

i=1

where

xi∈ X

is the data,

yi∈ Y

is the label,

ai∈ A

is the corresponding attribute (e.g., race, gender),

QI, ARDESHIR, XU, YANG

and

is the number of samples. We divide the data into different subsets based on labels and attributes.

For any label

c∈ Y

and attribute

a∈ A

, we denote

a={(xi, yi, ai)|ai=a∧yi=c}n

i=1

and

Dc={(xi, yi, ai)|yi=c}n

i=1

. Then we have

Dc=∪{Dc

a}|A|

a=1

. Given a deep neural network, the model

weights

can be decomposed into two parts, the

eature presentation parameters

and the

lassiﬁcation

head parameters

, i.e,

w= [wf,wc]

. For example,

and

are mapped into the convolutions layers and

fully connected layers in ResNets, respectively.

Fwf(·)

represents the feature encoder mapping from

X → Z

and Hwc(·)represents the classiﬁcation head mapping from Z → Y. Then zi(wf) = Fwf(xi)∈ Z denotes

the embedding representation of the sample

Hwc(zi(wf))

represents the output of the classiﬁcation head.

The key idea of RAAN is to assign a fairness-promoting adversarial robust weight for each sample by

exploring the AAN representation structure to reduce the disparity across different sensitive attributes. The

AAN of the sample

di= (xi, yi=c, ai=a)

is deﬁned as the samples from the same class but with

different attributes, i.e,

Pi=Pc

a=Dc\Dc

. For example, considering a binary protected sensitive attribute

{female, male}

, the AAN of a sample belonging to the

male

protected group with class label of

c∈ Y

is the collection of the

female

attribute samples with the same class

. Then the adversarial robust weight

for every sample

di∼ D

is represented as

pAAN

, which is an aggregation of the pairwise weights between

and its AAN neighbours in

. Next, we denote the pairwise robust weights between the sample

and

dj∈ Pi

in the representation space as

pAAN

. When the context is clear, we abuse the notations by using

pAAN

i= [pAAN

i1,··· , pAAN

ij ,···]∈R|Pi|

to represent the pairwise robust weights vector deﬁned in

, i.e, the

AAN of di.

3.2 RAAN Objective

To explore the AAN representation structure and obtain the pairwise robust weights, we deﬁne the following

robust constrained objective for ∀di∼ D,

`AAN

i=X

j∈Pi

pAAN

ij `(w;xj, yj, aj)(1)

s.t. max

pAAN

i∈∆|Pi|X

j∈Pi

pAAN

ij zi(wf)>zj(wf)−τKL pAAN

i,1

|Pi|,1∈R|Pi|(2)

where

∆

is a simplex that

P|Pi|

j=1 pij = 1

. The robust loss (1) is a weighted average combination of the

AAN loss. The robust constraint (2) is deﬁned in the pairwise representation similarity between the sample i

and its AAN neighbours penalized by the KL divergence regularizer, which has been extensively studied in

istributionally

obust learning

bjective (DRO) to improve the robustness of the model in the loss space (Qi

et al., 2020a). Here, we adopt the DRO with KL divergence constraint to the representation space to generate a

uniform distribution across different sensitive attributes.

Controlled by the hyperparameter

, the close form solution of

pAAN

in (2) guarantees that the larger the

pairwise similarity

i(wf)zj(wf)

is, the higher the

pAAN

will be. When

τ= 0

, the close form solution of (2)

is 1 for the pair with the largest similarity and 0 on others. When τ > 0, due to the strong convexity in terms

of pAAN

i, the close form solution of (2) for each pair weight between diand dj∈ Piis:

pAAN

ij =exp(z>

i(wf)zj(wf)

τ)

k∈Pi

exp(z>

i(wf)zk(wf)

τ)

.(3)

Hence the larger the

is, the more uniform of

pAAN

will be. And it is apparent to see that the robust objective

generates equal weights for every pair such that

pAAN

ij =1

|Pi|

for every

dj∈ Pi

when

approaches to

the inﬁnity in (3). When we have a fair representation, the embeddings of different protected groups are

uniform distributed in the representation space. The vanilla average loss training is good enough to have a fair

classiﬁcation head, which equals to RAAN with

goes to inﬁnity. When we have biased representations, we

FAIRNESS VIA ADVERSARIAL ATTRIBUTE NEIGHBOURHOOD ROBUST LEARNING

Figure 2: Improvement of Representation Fairness

Figure 3: Training overview of (RL)-

RAAN

use a smaller

to emphasize on the similar representations that shared invariant feature from two different

protected groups to reduce the bias introduced from difference of the two protected group distributions.

To this end, after having the close form solution for every pairwise robust weights

pAAN

(3) and plugging

back into `AAN

i(1) for any arbitrary sample di∼ D, the overall RAAN objective is deﬁned as:

RAAN(w) := 1

c=1

a=1

|Dc

i=1

`AAN

i=1

j=1

pAAN

j`(w;xj, yj, aj),(4)

where

C=|Y|

A=|A|

`AAN

i=P

j∈Pi

pAAN

ij `(w;xj, yj, aj)

is deﬁned in (1),

pAAN

is deﬁned in (3), and

pANN

j=1

|Pj|P

i∈Pj

exp( z>

i(wf)zj(wf)

τ)

k∈Pi

exp( z>

i(wf)zk(wf)

τ)

is obtained by we aggregating all the pairwise robust weights in

. Hence, the adversarial robust weights

pAAN

for each sample

dj= (xj, c, a)∼ D

, encodes the intrinsic

representation neighbourhood structure between the sample and its AAN neighbors di∈ Pj(the numerator)

and normalized by the similarity pairs from the same protected groups

dk∈ Pi

, i.e,

(the denominator).

Due to the limitation of space, we put the second equality derivation of equation (4) in Appendix.

3.3 Representation Learning Robust Adversarial Attribute Neighbourhood (RL-RAAN) Loss

AANs are deﬁned over the encoder representation outputs

z(wf)

. By default, the RAAN loss aims to promote

a fairer classiﬁcation head on a ﬁxed bias representation distribution, i.e,

(recall that

w= [wf,wc]

) is not

trainable. By parameterizing the AANs with trainable encoder parameters, i.e,

is trainable, we extend the

RAAN to the

epresentation

earnining RAAN (RL-RAAN), which is able to further debias the representation

encoder. Hence, RAAN optimizes the

while RL-RAAN jointly optimizes

[wf,wc]

. To design efﬁcient

stochastic algorithms, RL-RAAN requires more sophisticated stochastic estimators than RAAN, which we will

discuss later in Section 3. Depending on whether

is trainable, the red dashed arrow in Figure 3 describes

the optional gradients backward toward the feature representations during the training.

Here, we demonstrate the effectiveness of RL-RAAN in generating a more uniform representation distribu-

tion across different sensitive groups in Figure 2. To achieve this, we visualize the representation distributions

learned from vanilla CE (left plot) and RL-RAAN (right plot) methods using the Kernel-PCA dimensionality

reduction method with radial basis function (rbf) kernels. It is clear to see that white attribute samples are

more clustered in the upper left corner of CE representation projection while both the white and non-white

attributes samples are uniformly distributed in the representation projection of RL-RAAN.

4. Stochastic Compositional Optimization for RAAN (SCRAAN)

In this section, we provide a general

tochastic

ompositional optimization algorithm framework for RAAN

(SCRAAN). The SCRAAN applies to both RAAN and RL-RAAN objective. We ﬁrst show that (RL)-RAAN

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FAIRNESSVIAADVERSARIALATTRIBUTENEIGHBOURHOODROBUSTLEARNINGFairnessviaAdversarialAttributeNeighbourhoodRobustLearningQiQiyQI-QI@UIOWA.EDUShervinArdeshirzSHERVINA@NETFLIX.COMYiXu-YXU@DLUT.EDU.CNTianbaoYang.TIANBAO-YANG@TAMU.EDUyDepartmentofComputerScience,TheUniversityofIowa,IowaCity,IA52242,USAzNeti...

展开>> 收起<<

FAIRNESS VIA ADVERSARIAL ATTRIBUTE NEIGHBOURHOOD ROBUST LEARNING Fairness via Adversarial Attribute Neighbourhood Robust Learning Qi QiyQI-QIUIOWA .EDU.pdf

共26页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

FAIRNESS VIA ADVERSARIAL ATTRIBUTE NEIGHBOURHOOD ROBUST LEARNING Fairness via Adversarial Attribute Neighbourhood Robust Learning Qi QiyQI-QIUIOWA .EDU

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: