
GECCO ’22, July 9–13, 2022, Boston, USA M. Zameshina1,2, O. Teytaud2, Fabien Teytaud, Vlad Hosu, Nathanael Carraz, Laurent Najman, and Markus Wagner
possibility is to increase editability: [
9
] disentangles latent variables
for separating editable and sensitive parts. Some works focus on
measuring fairness, for example, [
14
] uses causal methodologies
for measuring fairness in a counterfactual manner. Fairness can
be integrated directly into the training: [
26
] focuses on training a
GAN while protecting some variables.
1.3 Related work
[
3
] increases fairness in GANs in a supervised manner, i.e., given
the sensitive attributes. [
27
] targets and improves the fairness of
generated datasets. More similar to our work, [
10
] focuses on uncer-
tain sensitive variables, and [
13
] adds a bias in a GAN for mitigating
fairness issues. In the same fashion as the present work, [
28
] consid-
ers biasing a GAN without any retraining. We focus on generically
(i.e., independently of the application, data, and model) correcting
for potential bias present in a generative model, without knowing
the sensitive variables. The critical point is that sensitive variables
seem to often come up as a surprise: typically, people do not decide
to create an unfair algorithm actively. For example, in [
19
], the
designers of the faulty soap dispenser had just not imagined that
it might fail on black skins. Also, there may be relevant sensitive
variables that have not been initially considered: ethnicity or gen-
der are obvious sensitive variables, but aesthetics, body mass index,
social origin, or even the quality of the camera, geographical origin,
also matter.
Our goal is to have a generic correction independent of the
sensitive variables. The rst proposed method (Sections 3.1 and 3.2):
•
is not only for the fairness issues regarding sensitive vari-
ables: we also preserve diversity for more classical diversity
issues such as MC.
•does not need any retraining.
•
is more or less eective depending on cases but is designed
for (almost) never being detrimental (Section 4.2).
The second proposed method, which can be combined with the
previous one, proposes several generations and then lets the user
choose. Therefore, the user experience is modied: we expect the
user to assist the method by actively selecting relevant outputs.
Contrary to the generic method proposed above, which we will
implement thanks to reweighting, the new approach is not a drop-in
replacement. Moreover, this also does not need retraining.
1.4 Outline
Section 2 presents tools useful for the present work:
•
Use of Image Quality Assessment (IQA) to improve image
generation (Section 2.1): we connect this method to our re-
search by investigating how much this quality improvement
degrades fairness and how our proposed methods can miti-
gate such issues.
•
Reweighting via simple rejection sampling to improve fair-
ness and reduce MC when the variables used for computing
the reweighting values are correlated to the target sensitive
variables (Section 3.1).
Section 3 presents our proposed algorithms:
•
Reweighting as above, but with reweighed variables unre-
lated to target classes (Section 3.2). This second context is
Class A B C D
Frequency 17.8% 52.2% 17.5% 12.4%
Rank-correlation AvA -0.07 0.22 -0.11 0.06
Rank-correlation K512 -0.02 0.16 -0.08 0.02
Table 1: For four distinct classes of individuals A, B, C and
D (obtained using R), we present the rank-correlation of
the frequency of that class with AvA and K512 scores re-
spectively. AvA and K512 are visual quality estimators, deal-
ing with aesthetics and technical quality respectively. Vi-
sual quality assessment is a task fairly independent of se-
mantics and therefore should exhibit little if any ethnicity-
related biases. Dataset: faces generated by StyleGan2 (see
thispersondoesnotexist.com). Classes: ethnicity evaluated
by R (see R in Table 2). Observation: the biggest class has
the strongest, positive correlation.
therefore applicable when we do not know the target classes.
We propose a method which is a drop-in improvement of an
arbitrary generative model: as soon as we have features and
a generative model, we can apply Alg. 1.
•
Multi-objective optimization, through computation of sev-
eral solutions (typically Pareto fronts), to mitigate diversity
loss by providing more frequently at least one output of the
category desired/expected by the user.
Section 4 is a mathematical analysis. Section 5 presents experimen-
tal results.
2 PRELIMINARIES
2.1 Correlations image quality / sensitive
variables
We investigate the known correlation between the estimated quality
of an image and its membership to a frequent class [15, 24].
In order to demonstrate that this is easily observable, Table 1
presents the rank correlation between the aesthetic quality of an
image and the logit of that image for each of four classes of individ-
uals. We note that the most positively correlated class is the most
frequent. Our interpretation is that the technical quality of gener-
ated images is higher for the most frequent classes, inuencing the
aesthetics score.
2.2 Image generation: GAN, PGAN, and
EvolGan
Our work specializes in image generation, and in particular on faces.
We use the following image generation tools. Our baseline GAN is
Pytorch GAN Zoo ([
21
], based on progressive GANs (PGANs) [
11
]).
We also use EvolGan [
23
], which improves Pytorch GAN Zoo by
biasing the random choice of latent variables
𝑧
using K512 [
8
]. We
use three congurations of EvolGan, as it uses as a budget the
number of calls to the original GAN; the three congurations then
correspond to budgets 10, 20, and 40 (named
𝐸𝐺
10,
𝐸𝐺
20, and
𝐸𝐺
40
respectively). Besides the one based on a random search, EvolGan
has an option for CMA search [
5
] and PortfolioDiscrete-
(
1
+
1
)
(i.e.
the variant of the Discrete
(
1
+
1
)
-ES as in [
4
]): we also employ
these variants, with notation respectively EG-CMA-10 and EG-
D
(
1
+
1
)
-10 for budget 10, and similar variants for budget 20 and