
1. We propose a simple and general criterion for individual
fairness testing: for a fair model, the derivative of the
model prediction with respect to the protected variable
should be small.
2. We introduce the concept of an auxiliary model which
describes the relationship between the input features and
the protected variable and show how to use this to help
evaluate this criterion. Figure 1 provides the intuition
behind our approach. We specifically focus on unfair
treatment that is created by historical bias in datasets
[Mehrabi et al., 2019].
3. To help evaluate fairness testing, we present a novel syn-
thetic data generation method that merges multiple real
datasets through a probabilistic graphical model to flex-
ibly simulate realistic data with controllable bias levels.
2 Preliminary Material
In this section, we describe a definition of individual fairness
that unites a number of definitions in the literature. We then
analyze the technical challenges of existing tests, which serve
as motivation for our own. Throughout, we adopt the follow-
ing notation:
•X: Feature (input) variables. When features are ob-
served, we use xto represent the feature vector.
•Y: Prediction (output) variables. When a label is ob-
served, we use yto represent the label as a scalar. As
we do fairness testing on binary classification tasks, the
prediction yin this paper is a probability.
•C: Protected variables (e.g., gender). We use cto repre-
sent the observed values.
•φ: Distance metric. φin(·,·)denotes a metric of input
space, and φout(·,·)a metric of output space.
•ftar: Target function for fairness testing. This takes fea-
tures xas input and produces a prediction ˆy.
•faux: Auxiliary model. This takes features xas input
and produces predictions cfor protected attributes C.
2.1 Individual Fairness Definitions
Individual fairness describes the tolerable discrepancy of
model predictions at the level of individual data points. There
are several characterizations that operate at this level, and to
facilitate their comparison we use the following definition.
Let the observed features xbe generated from underlying la-
tent variables z⊥and zkvia a function fg:
x=fg(z⊥,zk).(1)
Here, z⊥denotes latent vectors with no correlation with the
protected variables c, and zkis influenced by the protected
variables Cthrough a function zk=ψ(c).
Definition 1. A model ftar is individually fair, if it produces
exactly identical outcomes when given input feature vectors
xiand xjwhich share the same latent vector z⊥:
ftar(xi) = ftar(xj),(2)
where xi=fg(z⊥, ψ(c)) and xj=fg(z⊥, ψ(c0)).
At a high level, the definition states that given a pair of
similar data points xiand xj, they should receive equal treat-
ment from a target model. This perspective is shared by the
three most common families of individual fairness tests: Fair-
ness Through Unawareness (FTU) stipulates that protected
variables should not explicitly be used in the prediction pro-
cess. “Similar” points are thus points that differ only in the
protected variable, since the latter cannot affect the outputs.
Fairness Through Awareness (FTA) formally define “simi-
lar” points using a metric on the input space [Dwork et al.,
2012]. Finally, Counterfactual Fairness (CFF) stipulates
that the model predictions should not causally depend on pro-
tected variables C[Kusner et al., 2017]. For a given input xi,
the “similar” individual xjis the counterfactual constructed
by intervening on C.
For these three tests the main technical challenge is the
generation of “similar” pairs, either by searching within a
neighbourhood defined by a metric, or by transformation of
the input. We now describe these challenges in more depth.
Technical Challenges in Employing Similarity Metrics
The main challenge to constructing similarity metrics is
the task-specific domain knowledge required [Dwork et al.,
2012]. Some methods simply employ unweighted lpnorms
[Wachter et al., 2018; John et al., 2020]. Others obtain
weights from a linear model trained to predict the protected
variable cfrom the input x[Ruoss et al., 2020; Yurochkin et
al., 2020]. Others still learn metrics from expertly-labelled
pairs [Mukherjee et al., 2020].
Technical Challenges in Transforming Inputs
With regards to input transformations, the main challenges
are stability, and the risk of generating out-of-distribution
samples. Both risks are faced by approaches that employ ad-
versarial techniques [Wachter et al., 2018; Maity et al., 2021].
When the dataset is small, and lower dimensional, Optimal
Transport (OT) can be used to define a mapping between
pairs of inputs based on pairwise-similarity [Dwork et al.,
2012; Gordaliza et al., 2019]. However, this method scales
poorly. To this end, approximate OT methods, based on dual-
formulations [Chiappa and Pacchiano, 2021]and Generative
Adversarial Networks [Black et al., 2019]have also been ex-
plored - though again, these present risks of instability and
out-of-distribution samples.
CFF also employs generative models, specifically, causal
graphical models. However, such graphs are rarely available.
Moreover, training such models via unsupervised learning is
hard [Srivastava et al., 2017], increasing the risk that the gen-
erated inputs are out-of-domain, or have limited coverage.
2.2 Mechanisms of Discrimination
We now describe the mechanism of discrimination our test
aims to detect. There are many reasons why models exhibit
unfair behaviour, but one of the most insidious is the mis-
handling of historical bias [Mehrabi et al., 2019]. Here, pre-
existing prejudices create spurious correlations between fea-
tures Xand protected variables C1(Figure 2a). These cor-
1More in-depth discussion and worked examples may be found
in Section Use Cases of the supplementary materials.