
output. Notably, the risk of violating group sufficiency has arisen in a number of real-world scenarios.
E.g., in medical artificial intelligence, the machine learning algorithm is used to assess the clinic risk,
and guide decisions regarding initiating medical therapy. However, [
5
,
6
] revealed a significant racial
bias in such algorithms: when the algorithm predicts the same clinical risk score
f(X)
for white and
black patients, black patients are actually at a higher risk of severe illness:
E[Y|f(X), A =black]
E[Y|f(X), A =white]
. The deployed algorithms have resulted in more referrals of white patients to
specialty healthcare services, resulting in both spending disparities and racial bias [5].
In summary, this work aims to propose a novel principled framework for ensuring group sufficiency,
as well as preserving an informative prediction with a small generalization error. In particular, we
focus on one challenge scenario: the data includes multiple or even a large number of subgroups,
some with only limited samples, as often occurs in the real-world. For example, datasets for the
self-driving car are collected from a wide range of geographical regions, each with a limited number
of training samples [
7
]. How can we ensure group sufficiency as well as accurate predictions?
Specifically, our contributions are summarized as follows:
Figure 1: Illustration of the
proposed algorithm. Consider
three subgroups
S1, S2, S3
, e.g.,
datasets for three different races.
The proposed algorithm is then
formulated as a bilevel optimiza-
tion to learn an informative and
fair predictive-distribution
Q
. In
the lower-level (cyan), we learn
the subgroup specific predictive-
distribution
Q?
a
from dataset
Sa
(limited samples) and the prior
Q
. In the upper-level (brown),
Q
is then updated to be as close to
all of the learned subgroup spe-
cific Q?
aas possible.
Controlling group sufficiency
We adopted group sufficiency
gap to measure fairness w.r.t. group sufficiency of a classifier
f
(Sec.3), and then derive an upper bound of the group sufficiency
gap (Theorem 4.1). Under proper assumptions, the upper bound
is controlled by the discrepancy between the classifier
f
and the
subgroup Bayes predictors. Namely, minimizing the upper bound
also encourages an informative classifier.
Algorithmic contribution
Motivated by the upper bound of
the group sufficiency gap, we develop a principled algorithm.
Concretely, we adopt a randomized algorithm that produces a
predictive-distribution
Q
over the classifier (
f∼Q
) to learn
informative and fair classification. We further formulate the prob-
lem as a bilevel optimization (Sec. 5.3), as shown in Fig.1. (1)
In the lower-level, the subgroup specific dataset
Sa
and the fair
predictive-distribution
Q
are used to learn the subgroup specific
predictive-distribution
Q?
a
, where
Q
is regarded as an informative
prior for learning limited data within each subgroup. Theorem 5.1
formally demonstrates that under proper assumptions, the lower-
level loss can effectively control the generalization error. (2) In the
upper-level, the fair predictive-distribution
Q
is then updated to
be close to all subgroup specific predictive-distributions, in order
to minimize the upper bound of the group sufficiency gap.
Empirical justifications
The proposed algorithm is applicable
to the general parametric and differentiable model, where we
adopt the neural network in the implementation. We evaluate
the proposed algorithm on two real-world NLP datasets that have
shown prediction disparities w.r.t. group sufficiency. Compared with baselines, the results indicate
that group sufficiency has been consistently improved, with almost no loss of accuracy. Code is
available at https://github.com/xugezheng/FAMS.
2 Related Work
Algorithmic fairness
Fairness has been attached great importance and widely studied in various
applications, such as natural language processing [
8
–
10
], natural language generation [
11
–
13
],
computer vision [
14
,
15
], and deep learning [
16
,
17
]. Then various approaches have been proposed
in algorithmic fairness. They typically add fair constraints during the training procedure, such as
demographic parity or equalized odds [
18
–
23
]. Apart from this, other fair notions are adopted
such as accuracy parity [
24
,
25
], which requires each subgroup to attain the same accuracy; small
prediction variance [
26
,
27
], which ensures small prediction variations among the subgroup; or small
prediction loss for all the subgroups [
28
–
31
]. Furthermore, based on the concept of Independence (e.g.
demographic parity
A⊥⊥ f(X)
) or conditional independence (e.g. equalized odds
A⊥⊥ f(X)|Y
or
2