MABEL Attenuating Gender Bias using Textual Entailment Data Jacqueline HeMengzhou Xia Christiane Fellbaum Danqi Chen Department of Computer Science Princeton University

2025-05-02 0 0 1.35MB 22 页 10玖币
侵权投诉
MABEL: Attenuating Gender Bias using Textual Entailment Data
Jacqueline HeMengzhou Xia Christiane Fellbaum Danqi Chen
Department of Computer Science, Princeton University
jacquelinehe00@gmail.com
{mengzhou, fellbaum, danqic}@cs.princeton.edu
Abstract
Pre-trained language models encode undesir-
able social biases, which are further exacer-
bated in downstream use. To this end, we
propose MABEL (a Method for Attenuating
Gender Bias using Entailment Labels), an in-
termediate pre-training approach for mitigat-
ing gender bias in contextualized representa-
tions. Key to our approach is the use of a
contrastive learning objective on counterfac-
tually augmented, gender-balanced entailment
pairs from natural language inference (NLI)
datasets. We also introduce an alignment reg-
ularizer that pulls identical entailment pairs
along opposite gender directions closer. We
extensively evaluate our approach on intrinsic
and extrinsic metrics, and show that MABEL
outperforms previous task-agnostic debiasing
approaches in terms of fairness. It also pre-
serves task performance after fine-tuning on
downstream tasks. Together, these findings
demonstrate the suitability of NLI data as an
effective means of bias mitigation, as opposed
to only using unlabeled sentences in the lit-
erature. Finally, we identify that existing ap-
proaches often use evaluation settings that are
insufficient or inconsistent. We make an effort
to reproduce and compare previous methods,
and call for unifying the evaluation settings
across gender debiasing methods for better fu-
ture comparison.1
1 Introduction
Pre-trained language models have reshaped the
landscape of modern natural language process-
ing (Peters et al.,2018;Devlin et al.,2019;Liu
et al.,2019). As these powerful networks are opti-
mized to learn statistical properties from large train-
ing corpora imbued with significant social biases
(e.g., gender, racial), they produce encoded repre-
sentations that inherit undesirable associations as a
This work was done before JH graduated from Princeton
University.
1
Our code is publicly available at
https://github.com/
princeton-nlp/MABEL.
byproduct (Zhao et al.,2019;Webster et al.,2020;
Nadeem et al.,2021). More concerningly, models
trained on these representations can not only prop-
agate but also amplify discriminatory judgments in
downstream applications (Kurita et al.,2019).
A multitude of recent efforts have focused on
alleviating biases in language models. These can
be classed into two categories (Table 1): 1) task-
specific approaches perform bias mitigation during
downstream fine-tuning, and require data to be an-
notated for sensitive attributes; 2) task-agnostic
approaches directly improve pre-trained representa-
tions, most commonly either by removing discrim-
inative biases through projection (Dev et al.,2020;
Liang et al.,2020;Kaneko and Bollegala,2021), or
by performing intermediate pre-training on gender-
balanced data (Webster et al.,2020;Cheng et al.,
2021;Lauscher et al.,2021;Guo et al.,2022), re-
sulting in a new encoder that can transfer fairness
effects downstream via standard fine-tuning.
In this work, we present MABEL, a novel and
lightweight method for attenuating gender bias.
MABEL is task-agnostic and can be framed as
an intermediate pre-training approach with a con-
trastive learning framework. Our approach hinges
on the use of entailment pairs from supervised nat-
ural language inference datasets (Bowman et al.,
2015;Williams et al.,2018). We augment the train-
ing data by swapping gender words in both premise
and hypothesis sentences and model them using a
contrastive objective. We also propose an align-
ment regularizer, which minimizes the distance
between the entailment pair and its augmented
one. MABEL optionally incorporates a masked
language modeling objective, so that it can be used
for token-level downstream tasks.
To the best of our knowledge, MABEL is the
first to exploit supervised sentence pairs for learn-
ing fairer contextualized representations. Super-
vised contrastive learning via entailment pairs is
known to learn a more uniformly distributed rep-
arXiv:2210.14975v1 [cs.CL] 26 Oct 2022
resentation space, wherein similarity measures be-
tween sentences better correspond to their seman-
tic meanings (Gao et al.,2021). Meanwhile, our
proposed alignment loss, which pulls identical sen-
tences along contrasting gender directions closer,
is well-suited to learning a fairer semantic space.
We systematically evaluate MABEL on a com-
prehensive suite of intrinsic and extrinsic measures
spanning language modeling, text classification,
NLI, and coreference resolution. MABEL per-
forms well against existing gender debiasing ef-
forts in terms of both fairness and downstream task
performance, and it also preserves language under-
standing on the GLUE benchmark (Wang et al.,
2019). Altogether, these results demonstrate the
effectiveness of harnessing NLI data for bias at-
tenuation, and underscore MABELs potential as a
general-purpose fairer encoder.
Lastly, we identify two major issues in existing
gender bias mitigation literature. First, many pre-
vious approaches solely quantify bias through the
Sentence Encoding Association Test (SEAT) (May
et al.,2019), a metric that compares the geometric
relations between sentence representations. De-
spite scoring well on SEAT, many debiasing meth-
ods do not show the same fairness gains across
other evaluation settings. Second, previous ap-
proaches evaluate on extrinsic benchmarks in an
inconsistent manner. For a fairer comparison, we
either reproduce or summarize the performance of
many recent methodologies on major evaluation
tasks. We believe that unifying the evaluation set-
tings lays the groundwork for more meaningful
methodological comparisons in future research.
2 Background
2.1 Debiasing Contextualized
Representations
Debiasing attempts in NLP can be divided into two
categories. In the first category, the model learns to
disregard the influence of sensitive attributes in rep-
resentations during fine-tuning, through projection-
based (Ravfogel et al.,2020,2022), adversar-
ial (Han et al.,2021a,b) or contrastive (Shen et al.,
2021;Chi et al.,2022) downstream objectives. This
approach is task-specific as it requires fine-tuning
data that is annotated for the sensitive attribute.
The second type, task-agnostic training, mitigates
bias by leveraging textual information from gen-
eral corpora. This can involve computing a gender
subspace and eliminating it from encoded represen-
tations (Dev et al.,2020;Liang et al.,2020;Dev
et al.,2021;Kaneko and Bollegala,2021), or by
re-training the encoder with a higher dropout (Web-
ster et al.,2020) or equalizing objectives (Cheng
et al.,2021;Guo et al.,2022) to alleviate unwanted
gender associations.
We summarize recent efforts of both task-
specific and task-agnostic approaches in Table 1.
Compared to task-specific approaches that only
debias for the task at hand, task-agnostic models
produce fair encoded representations that can be
used toward a variety of applications. MABEL
is task-agnostic, as it produces a general-purpose
debiased model. Some recent efforts have broad-
ened the scope of task-specific approaches. For in-
stance, Meade et al. (2022) adapt the task-specific
Iterative Nullspace Linear Projection (INLP) (Rav-
fogel et al.,2020) algorithm to rely on Wikipedia
data for language model probing. While non-task-
agnostic approaches can potentially be adapted to
general-purpose debiasing, we primarily consider
other task-agnostic approaches in this work.
2.2 Evaluating Biases in NLP
The recent surge of interest in fairer NLP systems
has surfaced a key question: how should bias be
quantified? Intrinsic metrics directly probe the up-
stream language model, whether by measuring the
geometry of the embedding space (Caliskan et al.,
2017;May et al.,2019;Guo and Caliskan,2021),
or through likelihood-scoring (Kurita et al.,2019;
Nangia et al.,2020;Nadeem et al.,2021). Extrinsic
metrics evaluate for fairness by comparing the sys-
tem’s predictions across different populations on a
downstream task (De-Arteaga et al.,2019a;Zhao
et al.,2019;Dev et al.,2020). Though opaque,
intrinsic metrics are fast and cheap to compute,
which makes them popular among contemporary
works (Meade et al.,2022;Qian et al.,2022). Com-
paratively, though extrinsic metrics are more inter-
pretable and reflect tangible social harms, they are
often time- and compute-intensive, and so tend to
be less frequently used.2
To date, the most popular bias metric among
task-agnostic approaches is the Sentence Encoder
Association Test (SEAT) (May et al.,2019), which
compares the relative distance between the encoded
representations. Recent studies have cast doubt on
the predictive power of these intrinsic indicators.
2
As Table 17 in Appendix F indicates, many previous bias
mitigation approaches limit evaluation to 1 or 2 metrics.
Method Proj. Con. Gen. LM Fine- Intermediate pre-training data
based obj. aug. probe tune
Task-specific approaches
INLP (Ravfogel et al.,2020)3 3*3Wikipedia*
CON (Shen et al.,2021)3 3 -
DADV (Han et al.,2021b)3-
GATE (Han et al.,2021a)3-
R-LACE (Ravfogel et al.,2022)3 3 -
Task-agnostic approaches
CDA (Webster et al.,2020)3 3 3 Wikipedia (1M steps, 36h on 8x 16 TPU)
DROPOUT (Webster et al.,2020)3 3 Wikipedia (100K steps, 3.5h on 8x 16 TPU)
ADELE (Lauscher et al.,2021)3 3 3 Wikipedia, BookCorpus (105M sentences)
BIAS PROJECTION (Dev et al.,2020)3>3 3 Wikisplit (1M sentences)
OSCAR (Dev et al.,2021)>3SNLI](190.1K sentences)
SENT-DEBIAS (Liang et al.,2020)3 3 3 3 WikiText-2, SST, Reddit, MELD, POM
CONTEXT-DEBIAS (Kaneko and Bollegala,2021)3>3 3 News-commentary-v1 (87.66K sentences)
AUTO-DEBIAS (Guo et al.,2022)3Bias prompts generated from Wikipedia (500)
FAIRFIL (Cheng et al.,2021)3 3 "3WikiText-2, SST, Reddit, MELD, POM
HMABEL (ours) 3 3 3 3 MNLI, SNLI with gender terms (134k sentences)
Table 1: Properties of existing gender debiasing approaches for contextualized representations. Proj. based:
projection-based. Con. obj.: based on contrastive objectives. Gen. aug.: these approaches use a seed list of
gender terms for counterfactual data augmentation. LM probe and Fine-tune denote that the approach can be
used for language model probing or fine-tuning, respectively. : INLP was originally only used for task-specific
fine-tuning; Meade et al. (2022) later adapted it for task-agnostic training on Wikipedia for LM probing. ":
FAIRFIL shows poor LM probing performance in Table 2 as the debiasing filter is not trained with an MLM head.
MABEL fixes this issue by jointly training with an MLM objective. >: these works use a single gender pair
“he/she” to calculate the gender subspace. ]:Dev et al. (2021) fine-tunes on SNLI but does not use it for debiasing.
SEAT has been found to elicit counter-intuitive re-
sults from encoders (May et al.,2019) or exhibit
high variance across identical runs (Aribandi et al.,
2021). Goldfarb-Tarrant et al. (2021) show that
intrinsic metrics do not reliably correlate with ex-
trinsic metrics, meaning that a model could score
well on SEAT, but still form unfair judgements in
downstream conditions. This is especially concern-
ing as many debiasing studies (Liang et al.,2020;
Cheng et al.,2021) solely report on SEAT, which
is shown to be unreliable and incoherent. For these
reasons, we disregard SEAT as a main intrinsic
metric in this work.3
Bias evaluation is critical as it is the first step to-
wards detection and mitigation. Given that bias re-
flects across language in many ways, relying upon
a single bias indicator is insufficient (Silva et al.,
2021). Therefore, we benchmark not just MABEL,
but also current task-agnostic methods against a
diverse set of intrinsic and extrinsic indicators.
3 Method
MABEL attenuates gender bias in pre-trained lan-
guage models by leveraging entailment pairs from
natural language inference (NLI) data to produce
3
For comprehensiveness, we report MABEL’s results on
SEAT in Appendix G.
general-purpose debiased representations. To the
best of our knowledge, MABEL is the first method
that exploits semantic signals from supervised sen-
tence pairs for learning fairness.
3.1 Training Data
NLI data is shown to be especially effective in
training discriminative and high-quality sentence
representations (Conneau et al.,2017;Reimers
and Gurevych,2019;Gao et al.,2021). While
previous works in fair representation learning use
generic sentences from different domains (Liang
et al.,2020;Cheng et al.,2021;Kaneko and Bol-
legala,2021), we explore using sentence pairs
with an entailment relationship: a hypothesis sen-
tence that can be inferred to be true, based on a
premise sentence. Since gender is our area of in-
terest, we extract all entailment pairs that contain
at least one gendered term in either the premise
or the hypothesis from an NLI dataset. In our
experiments, we explore using two well-known
NLI datasets: the Stanford Natural Language Infer-
ence (SNLI) dataset (Bowman et al.,2015) and the
Multi-Genre Natural Language Inference (MNLI)
dataset (Williams et al.,2018).
As a pre-processing step, we first conduct coun-
terfactual data augmentation (Webster et al.,2020)
on the entailment pairs. For any sensitive attribute
A woman is working on furniture.
h
Man putting together wooden shelf.
A man is working on furniture.
̂
h
sim(p,h)
sim(̂
p,̂
h)
Two girls are looking at something.
Two boys are looking at something.
Three humans together.
Positive example
Negative example
Augmented premise
Augmented hypothesis
Original premise
Original hypothesis
Masked tokens
+
+
̂
p
Woman putting together wooden shelf.
p
LMLM
LAL
LCL
{hj,̂
hj}
Figure 1: MABEL consists of three losses: 1) an entailment-based contrastive loss (LCL)that uses the premises’s
hypothesis as a positive sample and other in-batch hypotheses as negative samples; 2) an alignment loss (LAL)that
minimizes the similarity difference between each original entailment pair and its gender-balanced counterpart; 3)
a masked language modeling loss (LMLM)to recover p= 15% of the masked tokens.
term in a word sequence, we swap it for a word
along the opposite bias direction, i.e., girl to boy,
and keep the non-attribute words unchanged.
4
This
transformation is systematically applied to each
sentence in every entailment pair. An example of
this augmentation, with gender bias as the sensitive
attribute, is shown in Figure 1.
3.2 Training Objective
Our training objective consists of three compo-
nents: a contrastive loss based on entailment pairs
and their augmentations, an alignment loss, and an
optional masked language modeling loss.
Entailment-based contrastive loss.
Training with
a contrastive loss induces a more isotropic repre-
sentation space, wherein the sentences’ geometric
positions can better align with their semantic mean-
ing (Wang and Isola,2020;Gao et al.,2021). We
hypothesize that this contrastive loss would be con-
ducive to bias mitigation, as concepts with similar
meanings, but along opposite gender directions,
move closer under this similarity measurement. In-
spired by Gao et al. (2021), we use a contrastive
loss that encourages the inter-association of en-
tailment pairs, with the goal of the encoder also
learning semantically richer associations.5
With
p
as the premise representation and
h
as the
hypothesis representation, let
{(pi, hi)}n
i=1
be the
sequence of representations for
n
original entail-
ment pairs, and
{(ˆpi,ˆ
hi)}n
i=1
be
n
counterfactually-
augmented entailment pairs. Each entailment pair
(and its corresponding augmented pair) forms a
4
We use the same list of attribute word pairs from Boluk-
basi et al. (2016), Liang et al. (2020), and Cheng et al. (2021),
which can be found in Appendix A.
5
In this work, we only refer to the supervised SimCSE
model, which leverages entailment pairs from NLI data.
positive pair, and the other in-batch sentences con-
stitute negative samples. With
m
pairs and their
augmentations in one training batch, the contrastive
objective for an entailment pair iis defined as:
L(i)
CL =log esim(pi,hi)
Pm
j=1 esim(pi,hj)+esim(pi,ˆ
hj)
log esim(ˆpi,ˆ
hi)
Pm
j=1 esim(ˆpi,hj)+esim(ˆpi,ˆ
hj),
where
sim(·,·)
denotes the cosine similarity func-
tion, and
τ
is the temperature.
LCL
is simply the av-
erage of all the losses in a training batch. Note that
when
hi=ˆ
hi
(i.e., when
hi
does not contain any
gender words and the augmentation is unchanged),
we exclude
ˆ
hi
from the denominator to avoid
hi
as
a positive sample and
ˆ
hi
as a negative sample for
pi, and vice versa.
Alignment loss.
We want a loss that encourages
the intra-association between the original entail-
ment pairs and their augmented counterparts. Intu-
itively, the features from an entailment pair and its
gender-balanced opposite should be taken as posi-
tive samples and be spatially close. Our alignment
loss minimizes the distance between the cosine sim-
ilarities of the original sentence pairs
(pi, hi)
and
the gender-opposite sentence pairs (ˆpi,ˆ
hi):
LAL =1
m
m
X
i=1 sim(ˆpi,ˆ
hi)sim(pi, hi)2.
We assume that a model is less biased if it as-
signs similar measurements to two gender-opposite
pairs, meaning that it maps the same concepts along
different gender directions to the same contexts.6
6
We also explore different loss functions for alignment and
report them in Appendix J.
Masked language modeling loss.
Optionally, we
can append an auxiliary masked language modeling
(MLM) loss to preserve the model’s language mod-
eling capability. Following Devlin et al. (2019), we
randomly mask
p= 15%
of tokens in all sentences.
By leveraging the surrounding context to predict
the original terms, the encoder is incentivized to
retain token-level knowledge.
In sum, our training objective is as follows:
L= (1 α)· LCL +α· LAL +λ· LMLM,
wherein the two contrastive losses are linearly in-
terpolated by a tunable coefficient
α
, and the MLM
loss is tempered by the hyper-parameter λ.
4 Evaluation Metrics
4.1 Intrinsic Metrics
StereoSet (Nadeem et al.,2021)
queries the lan-
guage model for stereotypical associations. Fol-
lowing Meade et al. (2022), we consider intra-
sentence examples from the gender domain. This
task can be formulated as a fill-in-the-blank style
problem, wherein the model is presented with an
incomplete context sentence, and must choose be-
tween a stereotypical word, an anti-stereotypical
word, and an irrelevant word. The Language Mod-
eling Score (LM) is the percentage of instances in
which the model chooses a valid word (either the
stereotype or the anti-stereotype) over the random
word; the Stereotype Score (SS) is the percentage
in which the model chooses the stereotype over the
anti-stereotype. The Idealized Context Association
Test (ICAT) score combines the LM and SS scores
into a single metric.
CrowS-Pairs (Nangia et al.,2020)
is an intra-
sentence dataset of minimal pairs, where one sen-
tence contains a disadvantaged social group that
either fulfills or violates a stereotype, and the other
sentence is minimally edited to contain a con-
trasting advantaged group. The language model
compares the masked token probability of tokens
unique to each sentence. Focusing only on gender
examples, we report the stereotype score (SS), the
percentage in which a model assigns a higher aggre-
gated masked token probability to a stereotypical
sentence over an anti-stereotypical one.
4.2 Extrinsic Metrics
As there has been some inconsistency in the eval-
uation settings in the literature, we mainly con-
sider the fine-tuning setting for extrinsic metrics
and leave the discussion of the linear probing set-
ting to Appendix I.
Bias-in-Bios (De-Arteaga et al.,2019b)
is a third-
person biography dataset annotated by occupation
and gender. We fine-tune the encoder, along with
a linear classification layer, to predict an individ-
ual’s profession given their biography. We report
overall task accuracy and accuracy by gender, as
well as two common fairness metrics (De-Arteaga
et al.,2019b;Ravfogel et al.,2020): 1)
GAP T P R
M
,
the difference in true positive rate (TPR) between
male- and female-labeled instances; 2)
GAP T P R
M,y
,
the root-mean square of the TPR gap of each occu-
pation class.
Bias-NLI (Dev et al.,2020)
is an NLI dataset con-
sisting of neutral sentence pairs. It is systematically
constructed by populating sentence templates with
a gendered word and an occupation word with a
strong gender connotation (e.g., The woman ate a
bagel; The nurse ate a bagel). Bias can be inter-
preted as a deviation from neutrality and is deter-
mined by three metrics: Net Neutral (NN), Fraction
Neutral (FN) and Threshold:
τ
(T:
τ
). A bias-free
model should score a value of 1 across all 3 metrics.
We fine-tune on SNLI and evaluate on Bias-NLI
during inference.
WinoBias (Zhao et al.,2018)
is an intra-sentence
coreference resolution task that evaluates a sys-
tem’s ability to correctly link a gendered pronoun
to an occupation across both pro-stereotypical and
anti-stereotypical contexts. Coreference can be
inferred based on syntactic cues in Type 1 sen-
tences or on more challenging semantic cues in
Type 2 sentences. We first fine-tune the model on
the OntoNotes 5.0 dataset (Hovy et al.,2006) be-
fore evaluating on the WinoBias benchmark. We
report the average F1-scores for pro-stereotypical
and anti-stereotypical instances, and the true pos-
itive rate difference in average F1-scores, across
Type 1 and Type 2 examples.
4.3 Language Understanding
To evaluate whether language models still preserve
general linguistic understanding after bias atten-
uation, we fine-tune them on seven classification
tasks and one regression task from the General Lan-
guage Understanding Evaluation (GLUE) bench-
mark (Wang et al.,2019).7
7
We also evaluate transfer performance on the SentEval
tasks (Conneau et al.,2017)inAppendix E.
摘要:

MABEL:AttenuatingGenderBiasusingTextualEntailmentDataJacquelineHeMengzhouXiaChristianeFellbaumDanqiChenDepartmentofComputerScience,PrincetonUniversityjacquelinehe00@gmail.com{mengzhou,fellbaum,danqic}@cs.princeton.eduAbstractPre-trainedlanguagemodelsencodeundesir-ablesocialbiases,whicharefurtherexa...

展开>> 收起<<
MABEL Attenuating Gender Bias using Textual Entailment Data Jacqueline HeMengzhou Xia Christiane Fellbaum Danqi Chen Department of Computer Science Princeton University.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:1.35MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注