Depending on historical processes, various social
groups may be located in different stereotypic quad-
rants (high vs. low on warmth and competence)
based on this two-dimensional model.
Here, we propose that SCM-based debiasing can
provide a theory-driven and scalable solution for
mitigating social-group biases in word embeddings.
In our experiments, we find that by debiasing with
respect to the subspace defined by warmth and
competence, our SCM-based approach performs
comparably with group-specific debiasing. Our ap-
proach fares well both in terms of bias reduction
and the preservation of embedding utility (i.e., the
preservation of semantic and syntactic information)
(Bolukbasi et al.,2016), while having the advan-
tage of being social-group-agnostic.
2 Background
2.1 Post hoc Word Embedding Debiasing
Our work builds on post hoc debiasing, removing
biases by modifying pre-trained word embeddings.
Most work we review focuses on gender-related
debiasing (e.g., Bolukbasi et al.,2016;Zhao et al.,
2018;Dev and Phillips,2019); importantly, we
focus our work on other social categories as well,
bringing attention to these understudied groups.
Originally, Bolukbasi et al. (2016) proposed
Hard Debiasing (HD) for gender bias. HD re-
moves the gender component from inherently non-
gendered words and enforces an equidistance prop-
erty for inherently gendered word pairs (equality
sets). Two follow-ups to this work include Manzini
et al. (2019), which formulated a multiclass ver-
sion of HD for attributes such as race, and Dev and
Phillips (2019), which introduced Partial Projec-
tion, a method that does not require equality sets
and is more effective than HD in reducing bias. Ex-
tending these approaches to other social attributes
is not trivial because a set of definitional word pairs
has to be curated for each social group, which is a
dynamic and context-dependent task because these
pairs are dependent on historical moment.
Gonen and Goldberg (2019) provided evidence
that gender bias in word embeddings is deeper than
previously thought, and methods based on project-
ing words onto a “gender dimension” only hide
bias superficially. They showed that after debias-
ing, most words maintain their relative position in
the debiased subspace. In this work, we address
the shortcomings highlighted by Gonen and Gold-
berg and Agarwal et al. with a theory-driven bias
subspace, rather than algorithmic improvement.
2.2 Bias and the Stereotype Content Model
The bias found in language models is rooted in hu-
man biases (Caliskan and Lewis,2022); thus, to
alleviate such biases, we should ground our debi-
asing approaches in social psychological theories
of stereotyping (Blodgett et al.,2020). The Stereo-
type Content Model (SCM) (Fiske et al.,2002;
Cuddy et al.,2009) is a social psychological theory
positing that stereotyping of different social groups
can be captured along two orthogonal dimensions,
“warmth” and “competence.” The warmth dimen-
sion of stereotypes has to do with people’s inten-
tions in interpersonal interactions, while the com-
petence dimension has to do with assessing others’
ability to act on those intentions. While there are
a number of other social psychological theories
capturing outgroup biases (e.g., Zou and Cheryan,
2017;Koch et al.,2016), SCM has been shown
to predict emotional and behavioral reactions to
societal outgroups.
2.3 The SCM and Language
SCM is a well-established theoretical frameworks
of stereotyping, and has begun to be applied in NLP.
Recently Nicolas et al. (2021) developed dictionar-
ies to measure warmth and competence in textual
data. Each dictionary was initialized with a set of
seed words from the literature which was further
expanded using WordNet (Miller,1995) to increase
the coverage of stereotypes collected from a sam-
ple of Americans. Fraser et al. (2021) showed that,
in word embeddings, SCM dictionaries capture the
group stereotypes documented in social psycholog-
ical research. Recently, Mostafazadeh Davani et al.
(2021) applied SCM dictionaries to quantify social
group stereotypes embedded in language, demon-
strating that patterns of prediction biases can be
explained using social groups’ warmth and compe-
tence embedded in language.
3 Methods & Evaluation
There are two components to each post hoc de-
biasing approach: the
Bias Subspace
, which de-
termines the subspace over which the algorithms
operate, and the
Algorithm
, which is how the word
embeddings are modified with respect to the bias
subspace. In this section, we review the concept
of bias subspaces, established algorithms for de-
biasing, and how bias is quantified in word em-