Robustifying Sentiment Classification by Maximally Exploiting Few Counterfactuals Maarten De RaedtµºFréderic GodinµChris DevelderºThomas Demeesterº

2025-05-03 0 0 1.16MB 15 页 10玖币
侵权投诉
Robustifying Sentiment Classification
by Maximally Exploiting Few Counterfactuals
Maarten De Raedt♢♣ Fréderic GodinChris DevelderThomas Demeester
Sinch Chatlayer Ghent University
{maarten.deraedt, chris.develder, thomas.demeester}@ugent.be
frederic.godin@sinch.com
Abstract
For text classification tasks, finetuned lan-
guage models perform remarkably well. Yet,
they tend to rely on spurious patterns in train-
ing data, thus limiting their performance on
out-of-distribution (OOD) test data. Among
recent models aiming to avoid this spurious
pattern problem, adding extra counterfactual
samples to the training data has proven to be
very effective. Yet, counterfactual data gen-
eration is costly since it relies on human an-
notation. Thus, we propose a novel solution
that only requires annotation of a small frac-
tion (e.g., 1%) of the original training data,
and uses automatic generation of extra coun-
terfactuals in an encoding vector space. We
demonstrate the effectiveness of our approach
in sentiment classification, using IMDb data
for training and other sets for OOD tests (i.e.,
Amazon, SemEval and Yelp). We achieve
noticeable accuracy improvements by adding
only 1% manual counterfactuals: +3% com-
pared to adding +100% in-distribution training
samples, +1.3% compared to alternate counter-
factual approaches.
1 Introduction and Related Work
For a wide range of text classification tasks, finetun-
ing large pretrained language models (Devlin et al.,
2019;Liu et al.,2019;Clark et al.,2020;Lewis
et al.,2020) on task-specific data has been proven
very effective. Yet, analysis has shown that their
predictions tend to rely on spurious patterns (Poliak
et al.,2018;Gururangan et al.,2018;Kiritchenko
and Mohammad,2018;McCoy et al.,2019;Niven
and Kao,2019;Zmigrod et al.,2019;Wang and
Culotta,2020), i.e., features that from a human per-
spective are not indicative for the classifier’s label.
For instance, Kaushik et al. (2019) found the rather
neutral words “will”, “my” and “has” to be impor-
tant for a positive sentiment classification. Such
reliance on spurious patterns were suspected to de-
grade performance on out-of-distribution (OOD)
Original source documents
Manually created
counterfactuals
(paired with an original)
+
+
+
+
+
+
+
+
+
+
+
+
Document representation vector space
Articially created
counterfactual
representations
Fig. 1: We propose to generate counterfactuals in rep-
resentation space, learning — from only a few manu-
ally created counterfactuals — a mapping function tto
transform a document representation φ(x)to a coun-
terfactual one (having the opposite classification label).
Illustration for positively labeled originals only.
test data, distributionally different from training
data (Quiñonero-Candela et al.,2008). Specifically
for sentiment classification, this suspicion has been
confirmed by Kaushik et al. (2019,2020).
For mitigating the spurious pattern effect,
generic methods include regularization of masked
language models, which limits over-reliance on a
limited set of keywords (Moon et al.,2021). Al-
ternatively, to improve robustness in imbalanced
data settings, additional training samples can be
automatically created (Han et al.,2021). Other
approaches rely on adding extra training data by
human annotation. Specifically to avoid spurious
patterns, Kaushik et al. (2019) proposed Counter-
factually Augmented Data (CAD), where annota-
tors minimally revise training data to flip their la-
bels: training on both original and counterfactual
samples reduced spurious patterns. Rather than
editing existing samples, Katakkar et al. (2021)
propose to annotate them with text spans support-
ing the assigned labels as a “rationale” (Pruthi et al.,
2020;Jain et al.,2020), thus achieving increased
performance on OOD data. Similar in spirit, Wang
and Culotta (2020) have an expert annotating spu-
rious vs. causal sentiment words and use word-
level classification (spurious vs. genuine) to train
arXiv:2210.11805v1 [cs.CL] 21 Oct 2022
Original Sample (x) Counterfactually Revised Sample (xCAD )
NEGATIVE
POSITIVE
one of the
worst
ever scenes in a sports movie.
3stars out of 10.
one of the
wildest
ever scenes in a sports
movie. 8stars out of 10.
POSITIVE
NEGATIVE
The world of Atlantis, hidden beneath the
earth’s core, is fantastic.
The world of Atlantis, hidden beneath the
earth’s core is supposed to be fantastic.
Table 1: Two examples from Kaushik et al. (2019) of counterfactual revisions made by humans for IMDb.
robust classifiers that only rely on non-spurious
words. The cited works thus demonstrate that un-
wanted reliance on spurious patterns can be miti-
gated through extra annotation or (counterfactual)
data generation. We further explore the latter op-
tion, and specifically focus on sentiment classifi-
cation, as in (Kaushik et al.,2019;Katakkar et al.,
2021). Exploiting counterfactuals requires first to
(i) generate them, and then (ii) maximally benefit
from them in training. For (ii),Teney et al. (2020)
present a loss term to leverage the relation between
counterfactual and original samples. In this paper
we focus on (i), for which Wu et al. (2021) use
experts interacting with a finetuned GPT-2 (Rad-
ford et al.). Alternatively, Wang et al. (2021); Yang
et al. (2021) use a pretrained language model and
a sentiment lexicon. Yet, having human annota-
tors to create counterfactuals is still costly (e.g.,
5 min/sample, Kaushik et al. (2019)). Thus, we
pose the research question
(RQ)
:how to exploit
a
limited
amount of counterfactuals to avoid clas-
sifiers relying on spurious patterns? We consider
classifiers trained on representations obtained from
frozen state-of-the-art sentence encoders (Reimers
and Gurevych,2019;Gao et al.,2021). We require
only a few (human produced) counterfactuals, but
artificially create additional ones based on them,
directly in the encoding space (with a simple trans-
formation of original instance representations), as
sketched in Fig. 1. This follows the idea of efficient
sentence transformations in De Raedt et al. (2021).
We compare our approach against using (i) more
original samples and (ii) other models generating
counterfactuals. We surpass both (i)(ii) for senti-
ment classification, with in-distribution and coun-
terfactual training data from IMDb (Maas et al.,
2011;Kaushik et al.,2019) and OOD-test data
from Amazon (Ni et al.,2019), SemEval (Rosen-
thal et al.,2017) and Yelp (Kaushik et al.,2020).
2 Exploiting Few Counterfactuals
We consider binary sentiment classification of input
sentences/documents
xX
, with associated la-
bels yY={0,1}. We denote the training set of
labeled pairs
(x, y)
as
DID
, of size
nDID
. We
further assume that for a limited subset of
kn
pairs
(x, y)
we have corresponding manually con-
structed counterfactuals
(xCAD , yCAD )
, i.e.,
xCAD
is
a minimally edited version of
x
that has the oppo-
site label
yCAD =1y
(see Table 1for an example).
The resulting set of
k
counterfactuals is denoted
as
DCAD
. We will adopt a vector representation
of the input
φ(x)
, with
φXRd
. We aim
to obtain a classifier
fRdY
that, without
degrading in-distribution performance, performs
well on counterfactual samples and is robust under
distribution shift.
2.1 Exploiting Manual Counterfactuals
To learn the robust classifier
f
, we first present
well-chosen reference approaches that leverage the
n
in-distribution samples
DID
and the
k
counterfac-
tuals
DCAD
. For all of the models below, we adopt
logistic regression, but they differ in training data
and/or loss function.
The
Paired
model only uses the pairs for which
we have counterfactuals, i.e., the full set
DCAD
but
only the corresponding kpairs from DID .
The
Weighted
model uses the full set of
n
origi-
nals
DID
, as well as all counterfactuals
DCAD
, but
compensates for the resulting data imbalance by
scaling the loss function on DID by a factor k
n.
2.2 Generating Counterfactuals
The basic proposition of our method is to artifi-
cially create counterfactuals for the
nk
original
samples from
DID
that have no corresponding pair
in
DCAD
. For this, we learn to map an original input
document/sentence representation
φ(x)
to a coun-
terfactual one, i.e., a function
tRdRd
. We
learn two such functions,
t
to map representations
of positive samples
φ(x)
(with
y=1
) to nega-
tive counterfactual representations
φ(xCAD )
(with
yCAD =0
), and vice versa for
t+
. We thus apply
t
(respectively
t+
) to the positive (resp. negative) in-
put samples in
DID
for which we have no manually
created counterfactuals.
Mean Offset
Our first model is parameterless,
where we simply add the average offset between
representations of original positives
x
(with
y=1
)
and their corresponding
xCAD
to those positives for
which we have no counterfactuals (and correspond-
ingly for negatives). Thus, mathematically:
t(φ(x)) =φ(x)+o,with
o=avg
xy=1
φ(xCAD )φ(x)
(and correspondingly for
t+
based on counterfactu-
als of xfor which y=0).
Mean Offset + Regression
Since just taking the
average offset may be too crude, especially as
k
increases, we can apply an offset adjustment (noted
as
rRdRd
) learnt with linear regression.
Concretely, to create counterfactuals for positive
originals we define:
t(φ(x)) =φ(x)+o+r(φ(x))
with a linear function
r(φ(x)) =Wφ(x)+b
(learning
WRd×d
and
bRd
from the posi-
tive originals
x
with corresponding counterfactuals
xCAD) and oas defined above. Similarly for t+.
3 Experimental Setup
Datasets
For the in-distribution data, we use a
training set
Dtrain
ID
of 1,707 samples, and a test set
Dtest
ID
of 488 samples, with all of these instances
randomly sampled from the original IMDb senti-
ment dataset of 25k reviews (Maas et al.,2011).
The counterfactual sets
Dtrain
CAD
and
Dtest
CAD
are the
revised versions of
Dtrain
ID
and
Dtest
ID
, as rewritten
by Mechanical Turk workers recruited by Kaushik
et al. (2019). See Appendix Bfor further details.
We will also test on out-of-distribution (OOD) data
from Amazon (Ni et al.,2019), SemEval (Rosen-
thal et al.,2017) and Yelp (Kaushik et al.,2020)
(we note these datasets as DAMZN
OOD ,DSE
OOD ,DYELP
OOD ).
Sentence Encoders
To obtain
φ(x)
, we use the
sentence encoding frameworks SBERT and Sim-
CSE (Reimers and Gurevych (2019); Gao et al.
(2021)). The main results are presented with
SRoBERTa
large
and SimCSE-RoBERTa
large
and
they are kept frozen at all times. Appendix Blists
additional details; Appendix Ashows results for
other encoders.
Baselines
As a baseline for our few-
counterfactuals-based approaches, we present
results (Original) from a classifier trained on twice
the amount of original (unrevised, in-distribution)
samples. Further, we also investigate competitive
counterfactual-based approaches as proposed by
Wang and Culotta (2021), who leverage identified
causal sentiment words and a sentiment lexicon
to generate counterfactuals in the input space
(which we subsequently embed with the same
sentence encoders
φ
as before). They adopt three
settings, with increasing human supervision, to
identify causal words: (i) predicted from top: 32
causal words were identified automatically for
IMDb; (ii) annotated from top: a human manually
marked 65 words as causal from a top-231 word
list deemed most relevant for sentiment; and
(iii) annotated from all: a human labeled 282
causal words from the full 2,388 word vocabulary.
Training and Evaluation
For all presented ap-
proaches, the classifier
f
is implemented by logis-
tic regression with L2 regularization, where the
regularization parameter
λ
is established by 4-fold
cross-validation.
1
The results presented further in
the main paper body report are obtained by training
on the complete training set (i.e., all folds).
The Mean Offset + Regression model of §2.2,
to artificially generate counterfactuals, is imple-
mented by linear regression with ordinary least
squares. The Weighted and Paired classifiers of
§2.1 are trained on
n
samples from
Dtrain
ID
together
with
k
counterfactuals sampled from
Dtrain
CAD
. To
evaluate our classifiers with generated counterfac-
tuals, as described in §2.2, we train on the
n
origi-
nal samples,
k
manual counterfactuals and
nk
generated counterfactuals. The Original baseline
uses
2n
original samples
Dtrain
ID Dtrain
ID
, adding
an extra
Dtrain
ID
=n
that are sampled randomly
from the 25k original, unrevised IMDb reviews
(but not in
Dtrain
ID
and
Dtest
ID
). For the counterfactual-
based models of Wang and Culotta (2021), the
training set is expanded with
n′′ n
counterfac-
tuals (based on Dtrain
ID ) automatically generated, in
the input space.
We evaluate the accuracy on
Dtest
ID
,
Dtest
CAD
and the
OOD test sets
DAMZN
OOD
,
DSE
OOD
,
DYELP
OOD
(averaging the
accuracies over these 3 sets for OOD evaluation).
For each
k{16,32,...,128}
, we use 50 differ-
ent random seeds to sample: (i)
k/2
negative and
1
We experiment with both weak and strong regularization.
See Appendix A.1 for details.
摘要:

RobustifyingSentimentClassicationbyMaximallyExploitingFewCounterfactualsMaartenDeRaedtµºFrédericGodinµChrisDevelderºThomasDemeesterºµSinchChatlayerºGhentUniversity{maarten.deraedt,chris.develder,thomas.demeester}@ugent.befrederic.godin@sinch.comAbstractFortextclassicationtasks,netunedlan-guagemod...

展开>> 收起<<
Robustifying Sentiment Classification by Maximally Exploiting Few Counterfactuals Maarten De RaedtµºFréderic GodinµChris DevelderºThomas Demeesterº.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:1.16MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注