Robustifying Sentiment Classiﬁcation by Maximally Exploiting Few Counterfactuals Maarten De RaedtµºFréderic GodinµChris DevelderºThomas Demeesterº

2025-05-03 0 0 1.16MB 15 页 10玖币

侵权投诉

Robustifying Sentiment Classiﬁcation

by Maximally Exploiting Few Counterfactuals

Maarten De Raedt♢♣ Fréderic Godin♢Chris Develder♣Thomas Demeester♣

♢Sinch Chatlayer ♣Ghent University

{maarten.deraedt, chris.develder, thomas.demeester}@ugent.be

frederic.godin@sinch.com

Abstract

For text classiﬁcation tasks, ﬁnetuned lan-

guage models perform remarkably well. Yet,

they tend to rely on spurious patterns in train-

ing data, thus limiting their performance on

out-of-distribution (OOD) test data. Among

recent models aiming to avoid this spurious

pattern problem, adding extra counterfactual

samples to the training data has proven to be

very effective. Yet, counterfactual data gen-

eration is costly since it relies on human an-

notation. Thus, we propose a novel solution

that only requires annotation of a small frac-

tion (e.g., 1%) of the original training data,

and uses automatic generation of extra coun-

terfactuals in an encoding vector space. We

demonstrate the effectiveness of our approach

in sentiment classiﬁcation, using IMDb data

for training and other sets for OOD tests (i.e.,

Amazon, SemEval and Yelp). We achieve

noticeable accuracy improvements by adding

only 1% manual counterfactuals: +3% com-

pared to adding +100% in-distribution training

samples, +1.3% compared to alternate counter-

factual approaches.

1 Introduction and Related Work

For a wide range of text classiﬁcation tasks, ﬁnetun-

ing large pretrained language models (Devlin et al.,

2019;Liu et al.,2019;Clark et al.,2020;Lewis

et al.,2020) on task-speciﬁc data has been proven

very effective. Yet, analysis has shown that their

predictions tend to rely on spurious patterns (Poliak

et al.,2018;Gururangan et al.,2018;Kiritchenko

and Mohammad,2018;McCoy et al.,2019;Niven

and Kao,2019;Zmigrod et al.,2019;Wang and

Culotta,2020), i.e., features that from a human per-

spective are not indicative for the classiﬁer’s label.

For instance, Kaushik et al. (2019) found the rather

neutral words “will”, “my” and “has” to be impor-

tant for a positive sentiment classiﬁcation. Such

reliance on spurious patterns were suspected to de-

grade performance on out-of-distribution (OOD)

Original source documents

Manually created

counterfactuals

(paired with an original)

−

Document representation vector space

Articially created

counterfactual

representations

Fig. 1: We propose to generate counterfactuals in rep-

resentation space, learning — from only a few manu-

ally created counterfactuals — a mapping function tto

transform a document representation φ(x)to a coun-

terfactual one (having the opposite classiﬁcation label).

Illustration for positively labeled originals only.

test data, distributionally different from training

data (Quiñonero-Candela et al.,2008). Speciﬁcally

for sentiment classiﬁcation, this suspicion has been

conﬁrmed by Kaushik et al. (2019,2020).

For mitigating the spurious pattern effect,

generic methods include regularization of masked

language models, which limits over-reliance on a

limited set of keywords (Moon et al.,2021). Al-

ternatively, to improve robustness in imbalanced

data settings, additional training samples can be

automatically created (Han et al.,2021). Other

approaches rely on adding extra training data by

human annotation. Speciﬁcally to avoid spurious

patterns, Kaushik et al. (2019) proposed Counter-

factually Augmented Data (CAD), where annota-

tors minimally revise training data to ﬂip their la-

bels: training on both original and counterfactual

samples reduced spurious patterns. Rather than

editing existing samples, Katakkar et al. (2021)

propose to annotate them with text spans support-

ing the assigned labels as a “rationale” (Pruthi et al.,

2020;Jain et al.,2020), thus achieving increased

performance on OOD data. Similar in spirit, Wang

and Culotta (2020) have an expert annotating spu-

rious vs. causal sentiment words and use word-

level classiﬁcation (spurious vs. genuine) to train

arXiv:2210.11805v1 [cs.CL] 21 Oct 2022

Original Sample (x) Counterfactually Revised Sample (xCAD )

NEGATIVE

→

POSITIVE

one of the

worst

ever scenes in a sports movie.

3stars out of 10.

one of the

wildest

ever scenes in a sports

movie. 8stars out of 10.

POSITIVE

→

NEGATIVE

The world of Atlantis, hidden beneath the

earth’s core, is fantastic.

The world of Atlantis, hidden beneath the

earth’s core is supposed to be fantastic.

Table 1: Two examples from Kaushik et al. (2019) of counterfactual revisions made by humans for IMDb.

robust classiﬁers that only rely on non-spurious

words. The cited works thus demonstrate that un-

wanted reliance on spurious patterns can be miti-

gated through extra annotation or (counterfactual)

data generation. We further explore the latter op-

tion, and speciﬁcally focus on sentiment classiﬁ-

cation, as in (Kaushik et al.,2019;Katakkar et al.,

2021). Exploiting counterfactuals requires ﬁrst to

(i) generate them, and then (ii) maximally beneﬁt

from them in training. For (ii),Teney et al. (2020)

present a loss term to leverage the relation between

counterfactual and original samples. In this paper

we focus on (i), for which Wu et al. (2021) use

experts interacting with a ﬁnetuned GPT-2 (Rad-

ford et al.). Alternatively, Wang et al. (2021); Yang

et al. (2021) use a pretrained language model and

a sentiment lexicon. Yet, having human annota-

tors to create counterfactuals is still costly (e.g.,

5 min/sample, Kaushik et al. (2019)). Thus, we

pose the research question

(RQ)

:how to exploit

limited

amount of counterfactuals to avoid clas-

siﬁers relying on spurious patterns? We consider

classiﬁers trained on representations obtained from

frozen state-of-the-art sentence encoders (Reimers

and Gurevych,2019;Gao et al.,2021). We require

only a few (human produced) counterfactuals, but

artiﬁcially create additional ones based on them,

directly in the encoding space (with a simple trans-

formation of original instance representations), as

sketched in Fig. 1. This follows the idea of efﬁcient

sentence transformations in De Raedt et al. (2021).

We compare our approach against using (i) more

original samples and (ii) other models generating

counterfactuals. We surpass both (i)–(ii) for senti-

ment classiﬁcation, with in-distribution and coun-

terfactual training data from IMDb (Maas et al.,

2011;Kaushik et al.,2019) and OOD-test data

from Amazon (Ni et al.,2019), SemEval (Rosen-

thal et al.,2017) and Yelp (Kaushik et al.,2020).

2 Exploiting Few Counterfactuals

We consider binary sentiment classiﬁcation of input

sentences/documents

x∈X

, with associated la-

bels y∈Y={0,1}. We denote the training set of

labeled pairs

(x, y)

DID

, of size

n≜∣DID ∣

. We

further assume that for a limited subset of

k≪n

pairs

(x, y)

we have corresponding manually con-

structed counterfactuals

(xCAD , yCAD )

, i.e.,

xCAD

a minimally edited version of

that has the oppo-

site label

yCAD =1−y

(see Table 1for an example).

The resulting set of

counterfactuals is denoted

DCAD

. We will adopt a vector representation

of the input

φ(x)

, with

φ∶X→Rd

. We aim

to obtain a classiﬁer

f∶Rd→Y

that, without

degrading in-distribution performance, performs

well on counterfactual samples and is robust under

distribution shift.

2.1 Exploiting Manual Counterfactuals

To learn the robust classiﬁer

, we ﬁrst present

well-chosen reference approaches that leverage the

in-distribution samples

DID

and the

counterfac-

tuals

DCAD

. For all of the models below, we adopt

logistic regression, but they differ in training data

and/or loss function.

The

Paired

model only uses the pairs for which

we have counterfactuals, i.e., the full set

DCAD

but

only the corresponding kpairs from DID .

The

Weighted

model uses the full set of

origi-

nals

DID

, as well as all counterfactuals

DCAD

, but

compensates for the resulting data imbalance by

scaling the loss function on DID by a factor k

2.2 Generating Counterfactuals

The basic proposition of our method is to artiﬁ-

cially create counterfactuals for the

n−k

original

samples from

DID

that have no corresponding pair

DCAD

. For this, we learn to map an original input

document/sentence representation

φ(x)

to a coun-

terfactual one, i.e., a function

t∶Rd→Rd

. We

learn two such functions,

t−

to map representations

of positive samples

φ(x)

(with

y=1

) to nega-

tive counterfactual representations

φ(xCAD )

(with

yCAD =0

), and vice versa for

. We thus apply

t−

(respectively

) to the positive (resp. negative) in-

put samples in

DID

for which we have no manually

created counterfactuals.

Mean Offset

Our ﬁrst model is parameterless,

where we simply add the average offset between

representations of original positives

(with

y=1

)

and their corresponding

xCAD

to those positives for

which we have no counterfactuals (and correspond-

ingly for negatives). Thus, mathematically:

t−(φ(x)) =φ(x)+⃗o−,with

⃗o−=avg

x∶y=1

φ(xCAD )−φ(x)

(and correspondingly for

based on counterfactu-

als of xfor which y=0).

Mean Offset + Regression

Since just taking the

average offset may be too crude, especially as

increases, we can apply an offset adjustment (noted

r∶Rd→Rd

) learnt with linear regression.

Concretely, to create counterfactuals for positive

originals we deﬁne:

t−(φ(x)) =φ(x)+⃗o−+r−(φ(x))

with a linear function

r−(φ(x)) =W−⋅φ(x)+b−

(learning

W−∈Rd×d

and

b−∈Rd

from the posi-

tive originals

with corresponding counterfactuals

xCAD) and ⃗o−as deﬁned above. Similarly for t+.

3 Experimental Setup

Datasets

For the in-distribution data, we use a

training set

Dtrain

of 1,707 samples, and a test set

Dtest

of 488 samples, with all of these instances

randomly sampled from the original IMDb senti-

ment dataset of 25k reviews (Maas et al.,2011).

The counterfactual sets

Dtrain

CAD

and

Dtest

CAD

are the

revised versions of

Dtrain

and

Dtest

, as rewritten

by Mechanical Turk workers recruited by Kaushik

et al. (2019). See Appendix Bfor further details.

We will also test on out-of-distribution (OOD) data

from Amazon (Ni et al.,2019), SemEval (Rosen-

thal et al.,2017) and Yelp (Kaushik et al.,2020)

(we note these datasets as DAMZN

OOD ,DSE

OOD ,DYELP

OOD ).

Sentence Encoders

To obtain

φ(x)

, we use the

sentence encoding frameworks SBERT and Sim-

CSE (Reimers and Gurevych (2019); Gao et al.

(2021)). The main results are presented with

SRoBERTa

large

and SimCSE-RoBERTa

large

and

they are kept frozen at all times. Appendix Blists

additional details; Appendix Ashows results for

other encoders.

Baselines

As a baseline for our few-

counterfactuals-based approaches, we present

results (Original) from a classiﬁer trained on twice

the amount of original (unrevised, in-distribution)

samples. Further, we also investigate competitive

counterfactual-based approaches as proposed by

Wang and Culotta (2021), who leverage identiﬁed

causal sentiment words and a sentiment lexicon

to generate counterfactuals in the input space

(which we subsequently embed with the same

sentence encoders

as before). They adopt three

settings, with increasing human supervision, to

identify causal words: (i) predicted from top: 32

causal words were identiﬁed automatically for

IMDb; (ii) annotated from top: a human manually

marked 65 words as causal from a top-231 word

list deemed most relevant for sentiment; and

(iii) annotated from all: a human labeled 282

causal words from the full 2,388 word vocabulary.

Training and Evaluation

For all presented ap-

proaches, the classiﬁer

is implemented by logis-

tic regression with L2 regularization, where the

regularization parameter

is established by 4-fold

cross-validation.

The results presented further in

the main paper body report are obtained by training

on the complete training set (i.e., all folds).

The Mean Offset + Regression model of §2.2,

to artiﬁcially generate counterfactuals, is imple-

mented by linear regression with ordinary least

squares. The Weighted and Paired classiﬁers of

§2.1 are trained on

samples from

Dtrain

together

with

counterfactuals sampled from

Dtrain

CAD

. To

evaluate our classiﬁers with generated counterfac-

tuals, as described in §2.2, we train on the

origi-

nal samples,

manual counterfactuals and

n−k

generated counterfactuals. The Original baseline

uses

2⋅n

original samples

Dtrain

ID ∪Dtrain

′

, adding

an extra

∣Dtrain

′∣=n

that are sampled randomly

from the 25k original, unrevised IMDb reviews

(but not in

Dtrain

and

Dtest

). For the counterfactual-

based models of Wang and Culotta (2021), the

training set is expanded with

n′′ ≈n

counterfac-

tuals (based on Dtrain

ID ) automatically generated, in

the input space.

We evaluate the accuracy on

Dtest

CAD

and the

OOD test sets

DAMZN

OOD

DSE

OOD

DYELP

OOD

(averaging the

accuracies over these 3 sets for OOD evaluation).

For each

k∈{16,32,...,128}

, we use 50 differ-

ent random seeds to sample: (i)

k/2

negative and

We experiment with both weak and strong regularization.

See Appendix A.1 for details.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RobustifyingSentimentClassicationbyMaximallyExploitingFewCounterfactualsMaartenDeRaedtµºFrédericGodinµChrisDevelderºThomasDemeesterºµSinchChatlayerºGhentUniversity{maarten.deraedt,chris.develder,thomas.demeester}@ugent.befrederic.godin@sinch.comAbstractFortextclassicationtasks,netunedlan-guagemod...

展开>> 收起<<

Robustifying Sentiment Classiﬁcation by Maximally Exploiting Few Counterfactuals Maarten De RaedtµºFréderic GodinµChris DevelderºThomas Demeesterº.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Robustifying Sentiment Classiﬁcation by Maximally Exploiting Few Counterfactuals Maarten De RaedtµºFréderic GodinµChris DevelderºThomas Demeesterº

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: