fAux Testing Individual Fairness via Gradient Alignment Giuseppe Castiglione1Ga Wu1Christopher Srinivasa1and Simon Prince1 1Borealis AI

2025-04-26 0 0 1.19MB 16 页 10玖币
侵权投诉
fAux: Testing Individual Fairness via Gradient Alignment
Giuseppe Castiglione1,Ga Wu1,Christopher Srinivasa1and Simon Prince1
1Borealis AI
{giuseppe.castiglione, ga.wu, christopher.srinivasa, simon.prince}@borealisai.com
Abstract
Machine learning models are vulnerable to biases
that result in unfair treatment of individuals from
different populations. Recent work that aims to test
a model’s fairness at the individual level either re-
lies on domain knowledge to choose metrics, or on
input transformations that risk generating out-of-
domain samples. We describe a new approach for
testing individual fairness that does not have either
requirement. We propose a novel criterion for eval-
uating individual fairness and develop a practical
testing method based on this criterion which we call
fAux (pronounced fox). This is based on comparing
the derivatives of the predictions of the model to
be tested with those of an auxiliary model, which
predicts the protected variable from the observed
data. We show that the proposed method effectively
identifies discrimination on both synthetic and real-
world datasets, and has quantitative and qualitative
advantages over contemporary methods.
1 Introduction
Unfair treatment of different populations by machine learn-
ing models can result in undesired social impact [Berk et
al., 2017; Yucer et al., 2020]. There are three main research
challenges associated with this problem. The first is to iden-
tify the source of the bias and understand how this influ-
ences the models [Mehrabi et al., 2019; Sun et al., 2020].
The second is to modify the training strategy to prevent un-
fair predictions [Yurochkin et al., 2020; Ruoss et al., 2020;
Yurochkin and Sun, 2021]. The final challenge, which is ad-
dressed in this paper, is to test the fairness of existing ML
models.
To test that a model is fair, we must first agree on what
is meant by ‘fairness’. For all definitions, fairness is de-
fined with respect to protected variables such as race, gender,
or age. However, the literature distinguishes between group
fairness (equivalent aggregate treatment of different protected
groups) and individual fairness (equivalent treatment of sim-
ilar individuals regardless of their protected group). This
paper addresses individual fairness, but even here there are
multiple, potentially conflicting criteria. For example, a
common definition (see [Gajane and Pechenizkiy, 2017]) is
1
2
3
4
1
2
3
4
a) b)
Figure 1: At the heart of our method is a simple idea; if we adjust the
model input so that the predicted protected variable changes, then
the model output should not change. a) Target model predicts yfrom
inputs x. We wish to test fairness at points 1,2,3,4. b) We construct
an auxiliary model that predicts protected variable cfrom inputs x.
fAux compares gradients (green arrows) of the two models. Point 2
is unfair because the target and auxiliary model gradients are large
and parallel; the model prediction changes as the protected variable
changes. The other points are fair since the gradients are orthogonal
(point 1) or one or other gradient is small (points 3,4).
fairness through unawareness (FTU) in which the model
should behave as though the protected variable is not present.
Conversely, [Dwork et al., 2012]proposed fairness through
awareness (FTA), which requires that similar individuals
have similar prediction outcomes. [Kusner et al., 2017]em-
phasized counterfactual fairness (CFF). This takes a data ex-
ample and synthesizes a counterfactual example in which the
protected variable and its descendants are changed. It re-
quires that the original predictions and those for the coun-
terfactual should be similar.
There exist methods to test models with all of these defini-
tions. [Agarwal et al., 2018]and [Galhotra et al., 2017]use
the FTU definition, [John et al., 2020]and [Wachter et al.,
2018]concentrate on the FTA definition, and [Black et al.,
2019]use the CFF principle. However, each approach has
limitations. Features can act as surrogates of a protected vari-
able, which FTU ignores. Using FTA needs domain-specific
knowledge to define similarity metrics for inputs and outputs.
Using CFF requires building a generative model to produce
counterexamples. Moreover, we show experimentally that
methods based on FTA can exhibit low precision.
In this paper, we introduce fAux (pronounced fox), a new
framework for individual fairness testing which avoids these
difficulties. Our contributions are as follows:
arXiv:2210.06288v1 [stat.ML] 10 Oct 2022
1. We propose a simple and general criterion for individual
fairness testing: for a fair model, the derivative of the
model prediction with respect to the protected variable
should be small.
2. We introduce the concept of an auxiliary model which
describes the relationship between the input features and
the protected variable and show how to use this to help
evaluate this criterion. Figure 1 provides the intuition
behind our approach. We specifically focus on unfair
treatment that is created by historical bias in datasets
[Mehrabi et al., 2019].
3. To help evaluate fairness testing, we present a novel syn-
thetic data generation method that merges multiple real
datasets through a probabilistic graphical model to flex-
ibly simulate realistic data with controllable bias levels.
2 Preliminary Material
In this section, we describe a definition of individual fairness
that unites a number of definitions in the literature. We then
analyze the technical challenges of existing tests, which serve
as motivation for our own. Throughout, we adopt the follow-
ing notation:
X: Feature (input) variables. When features are ob-
served, we use xto represent the feature vector.
Y: Prediction (output) variables. When a label is ob-
served, we use yto represent the label as a scalar. As
we do fairness testing on binary classification tasks, the
prediction yin this paper is a probability.
C: Protected variables (e.g., gender). We use cto repre-
sent the observed values.
φ: Distance metric. φin(·,·)denotes a metric of input
space, and φout(·,·)a metric of output space.
ftar: Target function for fairness testing. This takes fea-
tures xas input and produces a prediction ˆy.
faux: Auxiliary model. This takes features xas input
and produces predictions cfor protected attributes C.
2.1 Individual Fairness Definitions
Individual fairness describes the tolerable discrepancy of
model predictions at the level of individual data points. There
are several characterizations that operate at this level, and to
facilitate their comparison we use the following definition.
Let the observed features xbe generated from underlying la-
tent variables zand zkvia a function fg:
x=fg(z,zk).(1)
Here, zdenotes latent vectors with no correlation with the
protected variables c, and zkis influenced by the protected
variables Cthrough a function zk=ψ(c).
Definition 1. A model ftar is individually fair, if it produces
exactly identical outcomes when given input feature vectors
xiand xjwhich share the same latent vector z:
ftar(xi) = ftar(xj),(2)
where xi=fg(z, ψ(c)) and xj=fg(z, ψ(c0)).
At a high level, the definition states that given a pair of
similar data points xiand xj, they should receive equal treat-
ment from a target model. This perspective is shared by the
three most common families of individual fairness tests: Fair-
ness Through Unawareness (FTU) stipulates that protected
variables should not explicitly be used in the prediction pro-
cess. “Similar” points are thus points that differ only in the
protected variable, since the latter cannot affect the outputs.
Fairness Through Awareness (FTA) formally define “simi-
lar” points using a metric on the input space [Dwork et al.,
2012]. Finally, Counterfactual Fairness (CFF) stipulates
that the model predictions should not causally depend on pro-
tected variables C[Kusner et al., 2017]. For a given input xi,
the “similar” individual xjis the counterfactual constructed
by intervening on C.
For these three tests the main technical challenge is the
generation of “similar” pairs, either by searching within a
neighbourhood defined by a metric, or by transformation of
the input. We now describe these challenges in more depth.
Technical Challenges in Employing Similarity Metrics
The main challenge to constructing similarity metrics is
the task-specific domain knowledge required [Dwork et al.,
2012]. Some methods simply employ unweighted lpnorms
[Wachter et al., 2018; John et al., 2020]. Others obtain
weights from a linear model trained to predict the protected
variable cfrom the input x[Ruoss et al., 2020; Yurochkin et
al., 2020]. Others still learn metrics from expertly-labelled
pairs [Mukherjee et al., 2020].
Technical Challenges in Transforming Inputs
With regards to input transformations, the main challenges
are stability, and the risk of generating out-of-distribution
samples. Both risks are faced by approaches that employ ad-
versarial techniques [Wachter et al., 2018; Maity et al., 2021].
When the dataset is small, and lower dimensional, Optimal
Transport (OT) can be used to define a mapping between
pairs of inputs based on pairwise-similarity [Dwork et al.,
2012; Gordaliza et al., 2019]. However, this method scales
poorly. To this end, approximate OT methods, based on dual-
formulations [Chiappa and Pacchiano, 2021]and Generative
Adversarial Networks [Black et al., 2019]have also been ex-
plored - though again, these present risks of instability and
out-of-distribution samples.
CFF also employs generative models, specifically, causal
graphical models. However, such graphs are rarely available.
Moreover, training such models via unsupervised learning is
hard [Srivastava et al., 2017], increasing the risk that the gen-
erated inputs are out-of-domain, or have limited coverage.
2.2 Mechanisms of Discrimination
We now describe the mechanism of discrimination our test
aims to detect. There are many reasons why models exhibit
unfair behaviour, but one of the most insidious is the mis-
handling of historical bias [Mehrabi et al., 2019]. Here, pre-
existing prejudices create spurious correlations between fea-
tures Xand protected variables C1(Figure 2a). These cor-
1More in-depth discussion and worked examples may be found
in Section Use Cases of the supplementary materials.
C
Y
X
(a) Data Generation
C
Y
X
(b) Fair Model
C
Y
X
(c) Unfair Model
Figure 2: Graphical models to show how historical bias causes
unfairness. (a) Generation process of a biased training dataset. Red
dashed line denotes that Yand Cmay have correlations due to his-
torical bias. (b) Fair models learn to infer Ywhile cancelling the
impact from protected variables C. Solid arrows show generative
dependence. Dashed arrow shows learned inference mapping. (c)
Unfair models infer protected variables to support their predictions.
relations provide a model with two possible inference paths:
one legitimate, and one that implicitly infers C. The latter
paths render the FTU definition unreliable, as models learn to
exploit surrogate features even when Cis omitted [Barocas et
al., 2019]. By contrast, depending on the metric, FTA is sen-
sitive to all variations in the input, and thus may flag instances
where inference was legitimate.
In the next section, we present an approach that can pre-
cisely distinguish between these two paths. It does so scalably
and operates within distribution. Moreover, there is lower
overhead, as it requires no domain-expertise, and it employs
only supervised learning techniques.
3 Local Fairness Tests with fAux
We now develop a novel fairness testing method that satis-
fies Definition 1, but does not suffer from the limitations de-
scribed previously. We start by using the graphical model in
Figure 2 to propose a criterion for individual fairness based
on sensitivity analysis. We use this to motivate the Local In-
dependence Criterion (LIC) which examines whether a model
suffers from historical bias, and show that this satisfies Defi-
nition 1. Finally, we introduce the idea of an Auxiliary Model
which is needed to create a practical test.
3.1 Local Independence Criterion
A sufficient condition for a model to violate the individual
fairness criterion of Definition 1 is that its prediction depends
on the protected variable c. We may reveal this dependence
locally using the partial derivative
ftar (x)
c6= 0,(3)
which indicates that the prediction is sensitive to a small per-
turbation of c. In practice, the machine learning model and
data introduce inevitable noise, thus we relax the above ex-
pression with a pre-defined tolerance δ. We then obtain:
Theorem 1. Let there exist a generative model fgthat in-
fluences features Xwith protected variables C, such that
x=fg(z, ψ(c)). If a machine learning model ftar violates
the Local Independence Criterion (LIC)
ftar (x)
c
δ, (4)
by a pre-defined threshold δ, then the model ftar violates the
individual fairness criteria in Definition 1.
A proof is provided in Appendix A.4, but intuitively, the
partial derivative considers the disparate treatment of two
infinitesimally-close individuals.
To use the LIC, we need to estimate the derivative
ftar (x)/∂c, for which the chain rule gives:
ftar (x)
x
x
c
δ. (5)
Unfortunately, the term x/∂cis undefined without access-
ing the underlying generative model that maps protected vari-
ables Cto the features Xand this is rarely available2.
3.2 Auxiliary Models
In this section we suggest an approximation of x/∂cthat
requires neither generative model nor attempts to model the
latent representations zZ.
One approach would be to build a model to predict xfrom
cand use the derivative of this model to approximate x/∂c.
However, this might be poor as the number of protected vari-
ables is often far smaller than the feature size. Instead, we
consider the inverse problem: we describe the mapping from
features Xto the protected variables Cusing an auxiliary
model c=faux(x). We then invert this model in a local
neighbourhood around a given point x0, to approximate the
desired derivative. To this end, given this auxiliary model
faux we apply the Taylor expansion around (x0,c0):
cfaux(x0)faux(x0)
x0>
(xx0)(6)
where we replaced c0with its prediction faux(x0). The left-
hand side denotes the change in the space of protected vari-
ables. The right hand side is a Jacobian vector product.
We apply the Moore-Penrose pseudo-inverse to find the
minimum norm solution for x:
f1
aux(c) = x0+ (cfaux(x0)) f>
auxfaux1f>
aux
(7)
where we use faux to denote faux(x0)/∂x0. This allows
us to approximate x/∂cby
x
cf 1
aux(c)
c=f>
auxfaux1f>
aux.(8)
Finally, by combining Equation 8 and the Equation 5, we can
approximate the LIC with:
ftar f>
auxfaux1f>
aux
δ(9)
where we use ftar to denote ftar (x0)/∂x0.
3.3 On the Choice of the Auxiliary Model
In this paper, we employ Multilayer Perceptron (MLP) ar-
chitectures for our auxiliary models, to minimize the amount
of model overhead. Though inverting such models can po-
tentially yield low-fidelity reconstructions of x, additional fi-
delity may require modelling factors of xthat are indepen-
dent of c. The LIC avoids the need for these additional fac-
tors since the end goal is estimating the partial derivative of
2Note, the protected variables Care not necessarily continuous
as we will model the mapping through an auxiliary model later.
x
c only (for more details, see Appendix A.5). In our exper-
iments, we demonstrate that even such simple architectures
are sufficient to achieve state of the art results.
Beyond their simplicity, another advantage of using MLPs
is their flexibility, as they easily accommodate both real-
valued and categorical outputs. This makes it possible to ana-
lyze scenarios in which the protected variable Cis not binary.
3.4 Relaxations and Extensions
While the basic fAux described above is sufficient for an in-
dividual fairness test, it depends heavily on the behaviour of
ftar, which may be ill-conditioned. In this section, we in-
troduce several variants of the basic fAux method.
Normalization of Gradient (fAux+NG): Different features
may have very different valid ranges and so the gradient of ei-
ther the target or the auxiliary model could be biased towards
a subset of features. To mitigate this problem, we use an l2
normalization of the gradients to give the criterion:
norm(ftar) norm(f>
aux)
δ. (10)
where we removed the inverse term as this normalization is
longer needed.
Integrated Gradient (fAux+IG): We substitute the raw gra-
dients ftar and faux, for integrated gradients [Sundarara-
jan et al., 2017]that provide a smoothed gradient signal.
4 Experiments
We now evaluate the proposed fAux test to answer the fol-
lowing research questions (RQs):
RQ1: Given the target model ftar trained on synthetic
datasets whose ground-truth degrees of bias are known, how
well does fAux perform compared to other testing methods?
RQ2: Given the target model ftar trained on the real dataset
whose ground-truth degree of bias is unknown, can fAux
identify discriminatory features?
RQ3: How efficient is fAux compared to the existing ap-
proaches in terms of inference cost?
RQ4: Does fAux have any conditions needed to guarantee
reliable test performance? In particular, we want to know
how the effect of auxiliary model performance would impact
the test performance.
4.1 Experimental Setup
Candidate Testing Algorithms and Target Model
FTA: A local version of FTA, inspired by works such as [John
et al., 2020]. In these approaches, an -neighbourhood around
the input xiis rigorously searched over to bound the output
deviation φout(ftar (xi), ftar(xj)). In the limit that goes to
zero, this is equivalent to a bound on the lpnorm of ftar.
We also construct a weighted lp-norm using a linear auxil-
iary model, as is common in the individual fairness literature
[Ruoss et al., 2020; Yurochkin et al., 2020].
Unfair Map: [Maity et al., 2021]uses a gradient-flow at-
tack to generate pairs of points that violate FTA. This attack
is conducted within a neighbourhood defined by a similarity
metric φin, and to this end we employ the same weighted lp
norm used in FTA.
FlipTest: [Black et al., 2019]approximates CFF by leverag-
ing Wasserstein GANs to generate pairs of inputs.
LIC-UB: An upper-bound on the Local Independence Crite-
rion (LIC). For the experiments on synthetic datasets, we can
compute the true gradient of the generative model to conduct
the LIC check (5). Since error introduced by the approxi-
mation is removed, the test performance should achieve its
upper-bound.
Target Models: Given datasets in the form D=
{· · · (x, y, c)· · · }, we train multi-layer fully connected net-
works as the target models ftar with only features xand label
y. The target models in our experiments are all classifiers that
aim to produce probabilistic predictions P(y|x). However,
as previously discussed, an unfair model may infer protected
variable C, resulting in it implicitly modeling P(y|x,c).
Full implementation details are found in Appendix A.11.
Synthetic Datasets with Ground Truth Bias
In producing our synthetic datasets, our goal was to re-
tain the noisy and nonlinear relationships that are present
in real datasets. To this end, we construct a pipeline
which joins real, unbiased datasets together via fusion op-
erations, based on a intentionally biased data sampling pro-
cess. Specifically, given two datasets ˆ
D={· · · (ˆ
xi,ˆyi)· · · }
and ˜
D={· · · (˜
xj,˜yj)· · · }, and a fusion operation ff us,
we can produce a synthetic dataset Dsyn such that Dsyn =
{· · · (x, y, c)· · · } ={· · · (ffus(ˆ
xi,˜
xj),ˆyi,˜yj)· · · }, where
ydef
= ˆyi,cdef
= ˜yj, and ffus is a fusion operation (see be-
low for examples). While this looks simple, the selection of
data indices iand jfor fusion is based on the predefined gen-
erative model under the hood. Furthermore, the generative
model controls the degree of bias for the synthetic datasets
with hyper-parameters. Thus, the data generation process re-
produces the historical bias described in Section 2.2.
A full description of the synethetic data pipeline (including
generative model specifics) can be found in Appendix A.7.
Here, we summarize two key hyper-parameters of the data
generator, which we will use to control ground-truth data bias
and complexity.
Bias Level: The bias level controls the level of dependency
between Yand Cin the range of [0,1]. A Higher bias level
results in larger correlation between Yand Cin the generated
dataset.
Fusion Function: The fusion function merges feature vec-
tors from the two datasets. We have two variants: Concatena-
tion which stacks features without changing the element val-
ues (see [Kusner et al., 2017]), and the outer product which
blends features perfectly.
Evaluating Fairness Tests and Selecting δ
Having access to the ground-truth generative model, we can
compute the individual fairness score (IFS) described in Def-
inition 1 for each generated synthetic data sample. IFS will
serve as the ground-truth label in the following experiments
on synthetic datasets.
In flagging discrimination in practice, it is necessary to set
a value for the threshold δ. This threshold is usually set by
摘要:

fAux:TestingIndividualFairnessviaGradientAlignmentGiuseppeCastiglione1,GaWu1,ChristopherSrinivasa1andSimonPrince11BorealisAIfgiuseppe.castiglione,ga.wu,christopher.srinivasa,simon.princeg@borealisai.comAbstractMachinelearningmodelsarevulnerabletobiasesthatresultinunfairtreatmentofindividualsfromdiff...

展开>> 收起<<
fAux Testing Individual Fairness via Gradient Alignment Giuseppe Castiglione1Ga Wu1Christopher Srinivasa1and Simon Prince1 1Borealis AI.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:1.19MB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注