fAux Testing Individual Fairness via Gradient Alignment Giuseppe Castiglione1Ga Wu1Christopher Srinivasa1and Simon Prince1 1Borealis AI

2025-04-26 0 0 1.19MB 16 页 10玖币

侵权投诉

fAux: Testing Individual Fairness via Gradient Alignment

Giuseppe Castiglione1,Ga Wu1,Christopher Srinivasa1and Simon Prince1

1Borealis AI

{giuseppe.castiglione, ga.wu, christopher.srinivasa, simon.prince}@borealisai.com

Abstract

Machine learning models are vulnerable to biases

that result in unfair treatment of individuals from

different populations. Recent work that aims to test

a model’s fairness at the individual level either re-

lies on domain knowledge to choose metrics, or on

input transformations that risk generating out-of-

domain samples. We describe a new approach for

testing individual fairness that does not have either

requirement. We propose a novel criterion for eval-

uating individual fairness and develop a practical

testing method based on this criterion which we call

fAux (pronounced fox). This is based on comparing

the derivatives of the predictions of the model to

be tested with those of an auxiliary model, which

predicts the protected variable from the observed

data. We show that the proposed method effectively

identiﬁes discrimination on both synthetic and real-

world datasets, and has quantitative and qualitative

advantages over contemporary methods.

1 Introduction

Unfair treatment of different populations by machine learn-

ing models can result in undesired social impact [Berk et

al., 2017; Yucer et al., 2020]. There are three main research

challenges associated with this problem. The ﬁrst is to iden-

tify the source of the bias and understand how this inﬂu-

ences the models [Mehrabi et al., 2019; Sun et al., 2020].

The second is to modify the training strategy to prevent un-

fair predictions [Yurochkin et al., 2020; Ruoss et al., 2020;

Yurochkin and Sun, 2021]. The ﬁnal challenge, which is ad-

dressed in this paper, is to test the fairness of existing ML

models.

To test that a model is fair, we must ﬁrst agree on what

is meant by ‘fairness’. For all deﬁnitions, fairness is de-

ﬁned with respect to protected variables such as race, gender,

or age. However, the literature distinguishes between group

fairness (equivalent aggregate treatment of different protected

groups) and individual fairness (equivalent treatment of sim-

ilar individuals regardless of their protected group). This

paper addresses individual fairness, but even here there are

multiple, potentially conﬂicting criteria. For example, a

common deﬁnition (see [Gajane and Pechenizkiy, 2017]) is

a) b)

Figure 1: At the heart of our method is a simple idea; if we adjust the

model input so that the predicted protected variable changes, then

the model output should not change. a) Target model predicts yfrom

inputs x. We wish to test fairness at points 1,2,3,4. b) We construct

an auxiliary model that predicts protected variable cfrom inputs x.

fAux compares gradients (green arrows) of the two models. Point 2

is unfair because the target and auxiliary model gradients are large

and parallel; the model prediction changes as the protected variable

changes. The other points are fair since the gradients are orthogonal

(point 1) or one or other gradient is small (points 3,4).

fairness through unawareness (FTU) in which the model

should behave as though the protected variable is not present.

Conversely, [Dwork et al., 2012]proposed fairness through

awareness (FTA), which requires that similar individuals

have similar prediction outcomes. [Kusner et al., 2017]em-

phasized counterfactual fairness (CFF). This takes a data ex-

ample and synthesizes a counterfactual example in which the

protected variable and its descendants are changed. It re-

quires that the original predictions and those for the coun-

terfactual should be similar.

There exist methods to test models with all of these deﬁni-

tions. [Agarwal et al., 2018]and [Galhotra et al., 2017]use

the FTU deﬁnition, [John et al., 2020]and [Wachter et al.,

2018]concentrate on the FTA deﬁnition, and [Black et al.,

2019]use the CFF principle. However, each approach has

limitations. Features can act as surrogates of a protected vari-

able, which FTU ignores. Using FTA needs domain-speciﬁc

knowledge to deﬁne similarity metrics for inputs and outputs.

Using CFF requires building a generative model to produce

counterexamples. Moreover, we show experimentally that

methods based on FTA can exhibit low precision.

In this paper, we introduce fAux (pronounced fox), a new

framework for individual fairness testing which avoids these

difﬁculties. Our contributions are as follows:

arXiv:2210.06288v1 [stat.ML] 10 Oct 2022

1. We propose a simple and general criterion for individual

fairness testing: for a fair model, the derivative of the

model prediction with respect to the protected variable

should be small.

2. We introduce the concept of an auxiliary model which

describes the relationship between the input features and

the protected variable and show how to use this to help

evaluate this criterion. Figure 1 provides the intuition

behind our approach. We speciﬁcally focus on unfair

treatment that is created by historical bias in datasets

[Mehrabi et al., 2019].

3. To help evaluate fairness testing, we present a novel syn-

thetic data generation method that merges multiple real

datasets through a probabilistic graphical model to ﬂex-

ibly simulate realistic data with controllable bias levels.

2 Preliminary Material

In this section, we describe a deﬁnition of individual fairness

that unites a number of deﬁnitions in the literature. We then

analyze the technical challenges of existing tests, which serve

as motivation for our own. Throughout, we adopt the follow-

ing notation:

•X: Feature (input) variables. When features are ob-

served, we use xto represent the feature vector.

•Y: Prediction (output) variables. When a label is ob-

served, we use yto represent the label as a scalar. As

we do fairness testing on binary classiﬁcation tasks, the

prediction yin this paper is a probability.

•C: Protected variables (e.g., gender). We use cto repre-

sent the observed values.

•φ: Distance metric. φin(·,·)denotes a metric of input

space, and φout(·,·)a metric of output space.

•ftar: Target function for fairness testing. This takes fea-

tures xas input and produces a prediction ˆy.

•faux: Auxiliary model. This takes features xas input

and produces predictions cfor protected attributes C.

2.1 Individual Fairness Deﬁnitions

Individual fairness describes the tolerable discrepancy of

model predictions at the level of individual data points. There

are several characterizations that operate at this level, and to

facilitate their comparison we use the following deﬁnition.

Let the observed features xbe generated from underlying la-

tent variables z⊥and zkvia a function fg:

x=fg(z⊥,zk).(1)

Here, z⊥denotes latent vectors with no correlation with the

protected variables c, and zkis inﬂuenced by the protected

variables Cthrough a function zk=ψ(c).

Deﬁnition 1. A model ftar is individually fair, if it produces

exactly identical outcomes when given input feature vectors

xiand xjwhich share the same latent vector z⊥:

ftar(xi) = ftar(xj),(2)

where xi=fg(z⊥, ψ(c)) and xj=fg(z⊥, ψ(c0)).

At a high level, the deﬁnition states that given a pair of

similar data points xiand xj, they should receive equal treat-

ment from a target model. This perspective is shared by the

three most common families of individual fairness tests: Fair-

ness Through Unawareness (FTU) stipulates that protected

variables should not explicitly be used in the prediction pro-

cess. “Similar” points are thus points that differ only in the

protected variable, since the latter cannot affect the outputs.

Fairness Through Awareness (FTA) formally deﬁne “simi-

lar” points using a metric on the input space [Dwork et al.,

2012]. Finally, Counterfactual Fairness (CFF) stipulates

that the model predictions should not causally depend on pro-

tected variables C[Kusner et al., 2017]. For a given input xi,

the “similar” individual xjis the counterfactual constructed

by intervening on C.

For these three tests the main technical challenge is the

generation of “similar” pairs, either by searching within a

neighbourhood deﬁned by a metric, or by transformation of

the input. We now describe these challenges in more depth.

Technical Challenges in Employing Similarity Metrics

The main challenge to constructing similarity metrics is

the task-speciﬁc domain knowledge required [Dwork et al.,

2012]. Some methods simply employ unweighted lpnorms

[Wachter et al., 2018; John et al., 2020]. Others obtain

weights from a linear model trained to predict the protected

variable cfrom the input x[Ruoss et al., 2020; Yurochkin et

al., 2020]. Others still learn metrics from expertly-labelled

pairs [Mukherjee et al., 2020].

Technical Challenges in Transforming Inputs

With regards to input transformations, the main challenges

are stability, and the risk of generating out-of-distribution

samples. Both risks are faced by approaches that employ ad-

versarial techniques [Wachter et al., 2018; Maity et al., 2021].

When the dataset is small, and lower dimensional, Optimal

Transport (OT) can be used to deﬁne a mapping between

pairs of inputs based on pairwise-similarity [Dwork et al.,

2012; Gordaliza et al., 2019]. However, this method scales

poorly. To this end, approximate OT methods, based on dual-

formulations [Chiappa and Pacchiano, 2021]and Generative

Adversarial Networks [Black et al., 2019]have also been ex-

plored - though again, these present risks of instability and

out-of-distribution samples.

CFF also employs generative models, speciﬁcally, causal

graphical models. However, such graphs are rarely available.

Moreover, training such models via unsupervised learning is

hard [Srivastava et al., 2017], increasing the risk that the gen-

erated inputs are out-of-domain, or have limited coverage.

2.2 Mechanisms of Discrimination

We now describe the mechanism of discrimination our test

aims to detect. There are many reasons why models exhibit

unfair behaviour, but one of the most insidious is the mis-

handling of historical bias [Mehrabi et al., 2019]. Here, pre-

existing prejudices create spurious correlations between fea-

tures Xand protected variables C1(Figure 2a). These cor-

1More in-depth discussion and worked examples may be found

in Section Use Cases of the supplementary materials.

(a) Data Generation

(b) Fair Model

Figure 2: Graphical models to show how historical bias causes

unfairness. (a) Generation process of a biased training dataset. Red

dashed line denotes that Yand Cmay have correlations due to his-

torical bias. (b) Fair models learn to infer Ywhile cancelling the

impact from protected variables C. Solid arrows show generative

dependence. Dashed arrow shows learned inference mapping. (c)

Unfair models infer protected variables to support their predictions.

relations provide a model with two possible inference paths:

one legitimate, and one that implicitly infers C. The latter

paths render the FTU deﬁnition unreliable, as models learn to

exploit surrogate features even when Cis omitted [Barocas et

al., 2019]. By contrast, depending on the metric, FTA is sen-

sitive to all variations in the input, and thus may ﬂag instances

where inference was legitimate.

In the next section, we present an approach that can pre-

cisely distinguish between these two paths. It does so scalably

and operates within distribution. Moreover, there is lower

overhead, as it requires no domain-expertise, and it employs

only supervised learning techniques.

3 Local Fairness Tests with fAux

We now develop a novel fairness testing method that satis-

ﬁes Deﬁnition 1, but does not suffer from the limitations de-

scribed previously. We start by using the graphical model in

Figure 2 to propose a criterion for individual fairness based

on sensitivity analysis. We use this to motivate the Local In-

dependence Criterion (LIC) which examines whether a model

suffers from historical bias, and show that this satisﬁes Deﬁ-

nition 1. Finally, we introduce the idea of an Auxiliary Model

which is needed to create a practical test.

3.1 Local Independence Criterion

A sufﬁcient condition for a model to violate the individual

fairness criterion of Deﬁnition 1 is that its prediction depends

on the protected variable c. We may reveal this dependence

locally using the partial derivative

∂ftar (x)

∂c6= 0,(3)

which indicates that the prediction is sensitive to a small per-

turbation of c. In practice, the machine learning model and

data introduce inevitable noise, thus we relax the above ex-

pression with a pre-deﬁned tolerance δ. We then obtain:

Theorem 1. Let there exist a generative model fgthat in-

ﬂuences features Xwith protected variables C, such that

x=fg(z⊥, ψ(c)). If a machine learning model ftar violates

the Local Independence Criterion (LIC)



∂ftar (x)

∂c



∞

≤δ, (4)

by a pre-deﬁned threshold δ, then the model ftar violates the

individual fairness criteria in Deﬁnition 1.

A proof is provided in Appendix A.4, but intuitively, the

partial derivative considers the disparate treatment of two

inﬁnitesimally-close individuals.

To use the LIC, we need to estimate the derivative

∂ftar (x)/∂c, for which the chain rule gives:



∂ftar (x)

∂x

∂c



∞

≤δ. (5)

Unfortunately, the term ∂x/∂cis undeﬁned without access-

ing the underlying generative model that maps protected vari-

ables Cto the features Xand this is rarely available2.

3.2 Auxiliary Models

In this section we suggest an approximation of ∂x/∂cthat

requires neither generative model nor attempts to model the

latent representations z∈Z.

One approach would be to build a model to predict xfrom

cand use the derivative of this model to approximate ∂x/∂c.

However, this might be poor as the number of protected vari-

ables is often far smaller than the feature size. Instead, we

consider the inverse problem: we describe the mapping from

features Xto the protected variables Cusing an auxiliary

model c=faux(x). We then invert this model in a local

neighbourhood around a given point x0, to approximate the

desired derivative. To this end, given this auxiliary model

faux we apply the Taylor expansion around (x0,c0):

c−faux(x0)≈∂faux(x0)

∂x0>

(x−x0)(6)

where we replaced c0with its prediction faux(x0). The left-

hand side denotes the change in the space of protected vari-

ables. The right hand side is a Jacobian vector product.

We apply the Moore-Penrose pseudo-inverse to ﬁnd the

minimum norm solution for x:

f−1

aux(c) = x0+ (c−faux(x0)) ∇f>

aux∇faux−1∇f>

aux

(7)

where we use ∇faux to denote ∂faux(x0)/∂x0. This allows

us to approximate ∂x/∂cby

∂x

∂c≈∂f −1

aux(c)

∂c=∇f>

aux∇faux−1∇f>

aux.(8)

Finally, by combining Equation 8 and the Equation 5, we can

approximate the LIC with:



∇ftar ∇f>

aux∇faux−1∇f>

aux



∞≤δ(9)

where we use ∇ftar to denote ∂ftar (x0)/∂x0.

3.3 On the Choice of the Auxiliary Model

In this paper, we employ Multilayer Perceptron (MLP) ar-

chitectures for our auxiliary models, to minimize the amount

of model overhead. Though inverting such models can po-

tentially yield low-ﬁdelity reconstructions of x, additional ﬁ-

delity may require modelling factors of xthat are indepen-

dent of c. The LIC avoids the need for these additional fac-

tors since the end goal is estimating the partial derivative of

2Note, the protected variables Care not necessarily continuous

as we will model the mapping through an auxiliary model later.

∂x

∂c only (for more details, see Appendix A.5). In our exper-

iments, we demonstrate that even such simple architectures

are sufﬁcient to achieve state of the art results.

Beyond their simplicity, another advantage of using MLPs

is their ﬂexibility, as they easily accommodate both real-

valued and categorical outputs. This makes it possible to ana-

lyze scenarios in which the protected variable Cis not binary.

3.4 Relaxations and Extensions

While the basic fAux described above is sufﬁcient for an in-

dividual fairness test, it depends heavily on the behaviour of

∇ftar, which may be ill-conditioned. In this section, we in-

troduce several variants of the basic fAux method.

Normalization of Gradient (fAux+NG): Different features

may have very different valid ranges and so the gradient of ei-

ther the target or the auxiliary model could be biased towards

a subset of features. To mitigate this problem, we use an l2

normalization of the gradients to give the criterion:



norm(∇ftar) norm(∇f>

aux)

∞≤δ. (10)

where we removed the inverse term as this normalization is

longer needed.

Integrated Gradient (fAux+IG): We substitute the raw gra-

dients ∇ftar and ∇faux, for integrated gradients [Sundarara-

jan et al., 2017]that provide a smoothed gradient signal.

4 Experiments

We now evaluate the proposed fAux test to answer the fol-

lowing research questions (RQs):

RQ1: Given the target model ftar trained on synthetic

datasets whose ground-truth degrees of bias are known, how

well does fAux perform compared to other testing methods?

RQ2: Given the target model ftar trained on the real dataset

whose ground-truth degree of bias is unknown, can fAux

identify discriminatory features?

RQ3: How efﬁcient is fAux compared to the existing ap-

proaches in terms of inference cost?

RQ4: Does fAux have any conditions needed to guarantee

reliable test performance? In particular, we want to know

how the effect of auxiliary model performance would impact

the test performance.

4.1 Experimental Setup

Candidate Testing Algorithms and Target Model

FTA: A local version of FTA, inspired by works such as [John

et al., 2020]. In these approaches, an -neighbourhood around

the input xiis rigorously searched over to bound the output

deviation φout(ftar (xi), ftar(xj)). In the limit that goes to

zero, this is equivalent to a bound on the lpnorm of ∇ftar.

We also construct a weighted lp-norm using a linear auxil-

iary model, as is common in the individual fairness literature

[Ruoss et al., 2020; Yurochkin et al., 2020].

Unfair Map: [Maity et al., 2021]uses a gradient-ﬂow at-

tack to generate pairs of points that violate FTA. This attack

is conducted within a neighbourhood deﬁned by a similarity

metric φin, and to this end we employ the same weighted lp

norm used in FTA.

FlipTest: [Black et al., 2019]approximates CFF by leverag-

ing Wasserstein GANs to generate pairs of inputs.

LIC-UB: An upper-bound on the Local Independence Crite-

rion (LIC). For the experiments on synthetic datasets, we can

compute the true gradient of the generative model to conduct

the LIC check (5). Since error introduced by the approxi-

mation is removed, the test performance should achieve its

upper-bound.

Target Models: Given datasets in the form D=

{· · · (x, y, c)· · · }, we train multi-layer fully connected net-

works as the target models ftar with only features xand label

y. The target models in our experiments are all classiﬁers that

aim to produce probabilistic predictions P(y|x). However,

as previously discussed, an unfair model may infer protected

variable C, resulting in it implicitly modeling P(y|x,c).

Full implementation details are found in Appendix A.11.

Synthetic Datasets with Ground Truth Bias

In producing our synthetic datasets, our goal was to re-

tain the noisy and nonlinear relationships that are present

in real datasets. To this end, we construct a pipeline

which joins real, unbiased datasets together via fusion op-

erations, based on a intentionally biased data sampling pro-

cess. Speciﬁcally, given two datasets ˆ

D={· · · (ˆ

xi,ˆyi)· · · }

and ˜

D={· · · (˜

xj,˜yj)· · · }, and a fusion operation ff us,

we can produce a synthetic dataset Dsyn such that Dsyn =

{· · · (x, y, c)· · · } ={· · · (ffus(ˆ

xi,˜

xj),ˆyi,˜yj)· · · }, where

ydef

= ˆyi,cdef

= ˜yj, and ffus is a fusion operation (see be-

low for examples). While this looks simple, the selection of

data indices iand jfor fusion is based on the predeﬁned gen-

erative model under the hood. Furthermore, the generative

model controls the degree of bias for the synthetic datasets

with hyper-parameters. Thus, the data generation process re-

produces the historical bias described in Section 2.2.

A full description of the synethetic data pipeline (including

generative model speciﬁcs) can be found in Appendix A.7.

Here, we summarize two key hyper-parameters of the data

generator, which we will use to control ground-truth data bias

and complexity.

Bias Level: The bias level controls the level of dependency

between Yand Cin the range of [0,1]. A Higher bias level

results in larger correlation between Yand Cin the generated

dataset.

Fusion Function: The fusion function merges feature vec-

tors from the two datasets. We have two variants: Concatena-

tion which stacks features without changing the element val-

ues (see [Kusner et al., 2017]), and the outer product which

blends features perfectly.

Evaluating Fairness Tests and Selecting δ

Having access to the ground-truth generative model, we can

compute the individual fairness score (IFS) described in Def-

inition 1 for each generated synthetic data sample. IFS will

serve as the ground-truth label in the following experiments

on synthetic datasets.

In ﬂagging discrimination in practice, it is necessary to set

a value for the threshold δ. This threshold is usually set by

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

fAux:TestingIndividualFairnessviaGradientAlignmentGiuseppeCastiglione1,GaWu1,ChristopherSrinivasa1andSimonPrince11BorealisAIfgiuseppe.castiglione,ga.wu,christopher.srinivasa,simon.princeg@borealisai.comAbstractMachinelearningmodelsarevulnerabletobiasesthatresultinunfairtreatmentofindividualsfromdiff...

收起<<

fAux Testing Individual Fairness via Gradient Alignment Giuseppe Castiglione1Ga Wu1Christopher Srinivasa1and Simon Prince1 1Borealis AI.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

fAux Testing Individual Fairness via Gradient Alignment Giuseppe Castiglione1Ga Wu1Christopher Srinivasa1and Simon Prince1 1Borealis AI

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: