of the protected attribute conditioned on the true label. Sufficiency is satisfied if the true label is
independent of the protected attribute conditioned on the classifier prediction. Details on these
fairness criteria, both mathematically and with respect to different worldviews, may be found in
(Barocas 2019, Yeom 2018) [27,28] along with definitions of fairness metrics, such as statistical
parity difference for independence and average odds difference for separation, from (Garg et al. 2020,
Bellamy et al. 2019, Sharma et al. 2020) [8,9,10]. We are taking statistical parity difference and
average odds difference metrics into consideration for this paper while being aware of the fact that
there are various fairness metrics which are relevant to gauge biasness of models. Determining the
right measure to be used must consider the proper legal, ethical, and social context.
Using these fairness metrics, several bias mitigation algorithms are developed to satisfy the various
criteria of fairness for machine learning models to reduce bias. Methods to mitigate bias generally
fall into three categories. Pre-processing techniques transform the data so that the underlying
discrimination is removed (Alessandro 2017)[21]. If the algorithm is allowed to modify the training
data, then pre-processing can be used (Bellamy et al. 2018)[22]. Our proposed methodology falls
under this category as explained in the subsequent sections. In-processing techniques try to modify
and change state-of-the-art learning algorithms in order to remove discrimination during the model
training process (Alessandro 2017) [21]. If the algorithm can only treat the learned model as a black
box without any ability to modify the training data or learning algorithm, then only post-processing
can be used in which the labels assigned by the black-box model initially get reassigned based on a
function during the post-processing phase (Bellamy et al. 2018)[22].
The literature extensively discusses the inherent trade-off between accuracy and fairness – as we
pursue a higher degree of fairness, we may compromise accuracy (see for example (Kleinberg et al.
2017) [12]). Many papers have empirically supported the existence of this trade-off (Be-chavod 2017,
Friedler et al. 2019) [13, 14]. Generally, the aspiration of a fairness-aware algorithm is to develop
a model that is fair without significantly compromising the accuracy or other alternative notions of
utility.
In this paper, we are proposing a method to find bias inducing samples in the dataset and then
dropping these samples such that the pre-processed dataset represents a more equitable world. In
an equitable world, model outcome is independent of the protected attributes (such as gender, race,
etc). Our novel approach can be described as follows: given a dataset that contains a protected
attribute (such as gender, race, etc), samples with similar attributes but different protected attributes
and different outcomes are flagged. For example – Credit Risk dataset contains 2 samples: 1 male
and 1 female such that male and female sample have same attributes but model predicted low risk
for male and high risk for female. We establish that these instances induce bias as the attributes are
same but the outcome is different due to dependence on protected attributes (such as male, female)
and thereby result in unfair treatment by the model. Further, protected attributes are not used for
modelling, so such samples can confuse the model as they have nearly have the same attributes but
different label and thus can be viewed as pseudo label noise.
Hence our objective boils down to detect and remove such instances before training to make sure
the resultant model is fairer. In the process we also show that such close instance removal does
not compromise on the model performance, rather on the contrary, it improves the accuracy. This
simultaneous improvement of model fairness and accuracy which are in contrast of each other,
although seems to be astonishing, but we could provide rationale for this achievement using prior
work of Frénay et. al. [16]. This prior art explains the affect of such noisy instances on model
performance and claim that label noise hampers the performance of the classifier which is also backed
by (Long 2008)[15].
To summarize, in this paper, we make the following contributions :
1.
We propose a systematic way of identification of bias inducing instances as per our definition
in the previous section, and their subsequent removal from training data.
2.
We show that how the bias inducing instances removal ensures model fairness using the
standard fairness metric.
3.
Further we show improvement in model accuracy trained on the bias eliminated data along
with justification.
4.
We offer control in terms of adjustable hyperparameters to adjust fairness and accuracy as
per the dataset and business requirements.
2