ferent sentiment distributions of completed sen-
tences when the occupation word is counterfactu-
ally changed in the prompts. Bordia and Bowman
(2019) revealed a more frequent co-occurrence of
the ’doctor’ with male pronouns and ’nurse’ with
female pronouns in generated text. However, these
biases, directly encapsulated within the text, can
be more easily analyzed. To the best of our knowl-
edge, our work is pioneering in exploring how per-
sonalization, bridging NLP and recommendation,
associates the bias in NLG with protected attributes
of users.
Fairness Notions: Various fairness notions exist in
the literature, with group-wise fairness notions be-
ing the firstly studied ones (Zafar et al.,2015;Hardt
et al.,2016;Zafar et al.,2017). Yet, group-wise
fairness has different quantitative definitions that
are generally incompatible (Kleinberg et al.,2017;
Berk et al.,2021). Some definitions can even exac-
erbate discrimination (Kusner et al.,2017). Individ-
ual fairness (Zemel et al.,2013;Joseph et al.,2016)
requires similar users to receive similar predictions.
But it relies on carefully chosen domain-specific
similarity metrics (Dwork et al.,2012). In contrast,
counterfactual fairness (CF) (Kusner et al.,2017),
considering fairness from a causal perspective, has
gained prominence recently as a more robust fair-
ness notion (Russell et al.,2017;Wu et al.,2019;
Makhlouf et al.,2020), which can also enhance
group-wise fairness in certain scenarios (Zhang
and Bareinboim,2018;Khademi et al.,2019).
Though CF has been studied in some non-
personalized NLP tasks (Huang et al.,2020;Garg
et al.,2019), most existing works study the depen-
dency of model outputs on attribute-specific words
within the input text (Blodgett et al.,2020;Liang
et al.,2021;Sheng et al.,2021). In such cases,
CI can be easily performed on the input text it-
self, such as changing male pronouns to female
pronouns (Huang et al.,2020;Garg et al.,2019).
However, CF in PTG necessitates CI on the pro-
tected attributes of users being served-an area yet
to be thoroughly explored.
3 Problem Formulation
In the following discussions, we consider a sin-
gle protected attribute on the user side for sim-
plicity, but our proposed framework is versatile
to accommodate multiple attributes on either the
user or the item side. The value of a user’s pro-
tected attribute is denoted by a variable
A∈ A
,
where
A
is the set of possible attribute values, e.g.,
A={male, f emale, other}
for gender. Each
dataset entry is a tuple of
(u, i, a, e)
, correspond-
ing to user ID, item ID, observed attribute value,
and ground-truth explanation. The explanation gen-
erator Gθis a language model parameterized by θ.
Given a user
u
, an item
i
, and observed attribute
value
a
, an explanation can be sampled from the
generator as
Y∼Gθ(u, i|A=a)
. The linguis-
tic quality of any explanation
y
is measured by a
function
Q(y)
. Notably, we treat
Q
as a black box
oracle—a quality measure that can only be queried.
This is essential in practice and offers the flexibility
to arbitrarily tailor
Q
based on the fairness require-
ments of the application. An explanation can be
gauged in various ways by customizing
Q
, such as
using an explicit function, human evaluation, or a
tool provided by authorities. We assume, without
loss of generality, that higher
Q
values represent
superior quality. CF on any measure
Q
of explana-
tions is achieved when, given a user
u
and an item
i,
P(Q(YA←a)|u, i, a) = P(Q(YA←a′)|u, i, a),(1)
where
YA←a′∼Gθ(u, i|A=a′)
is the expla-
nation generated when we counterfactually as-
sign the value of the user’s protected attribute by
A←a′, a′̸=a
. The right side of Eq. (1) evaluates
the quality distribution of explanations generated
had the user’s protected attribute value been
a′
,
given that the observed attribute value is
a
(Kusner
et al.,2017;Li et al.,2021b).
Denote the loss of the generator for a given user
u
and item
i
by
Lgen(Gθ(u, i|A=a), e)
, which
is typically the negative log-likelihood (NLL) loss
or a combination of several losses (Li et al.,2017;
Yang et al.,2021;Li et al.,2021a). We consider
training the generator for fair explanation genera-
tion as a constrained optimization problem:
minLgen(Gθ(u, i|A=a), e)
s.t. EYA←a[Q(YA←a)|u, i, a] =
EYA←a′[Q(YA←a′)|u, i, a]
(2)
For ease of presentation, we consider a single user-
item pair, and the total loss on a dataset is simply
summed over all user-item pairs with the constraint
applied to every pair. In this work, we apply the
first-order moment of the quality of generated ex-
planations to construct the constraint, and leave the
extension to other moment-matching constraints
for future work. We further simplify the expression
of the constraint as E[Q(YA←a)] = E[Q(YA←a′)].