Are All Spurious Features in Natural Language Alike An Analysis through a Causal Lens Nitish Joshi1Xiang Pan1He He12

2025-04-30 0 0 490.15KB 14 页 10玖币

侵权投诉

Are All Spurious Features in Natural Language Alike? An Analysis

through a Causal Lens

Nitish Joshi1∗Xiang Pan1∗He He1,2

1Department of Computer Science, New York University

2Center for Data Science, New York University

{nitish, xiangpan, hhe}@nyu.edu

Abstract

The term ‘spurious correlations’ has been used

in NLP to informally denote any undesirable

feature-label correlations. However, a corre-

lation can be undesirable because (i) the fea-

ture is irrelevant to the label (e.g. punctuation

in a review), or (ii) the feature’s effect on the

label depends on the context (e.g. negation

words in a review), which is ubiquitous in lan-

guage tasks. In case (i), we want the model

to be invariant to the feature, which is neither

necessary nor sufﬁcient for prediction. But in

case (ii), even an ideal model (e.g. humans)

must rely on the feature, since it is necessary

(but not sufﬁcient) for prediction. Therefore,

a more ﬁne-grained treatment of spurious fea-

tures is needed to specify the desired model

behavior. We formalize this distinction using

a causal model and probabilities of necessity

and sufﬁciency, which delineates the causal re-

lations between a feature and a label. We then

show that this distinction helps explain results

of existing debiasing methods on different spu-

rious features, and demystiﬁes surprising re-

sults such as the encoding of spurious features

in model representations after debiasing.

1 Introduction

Advancements in pre-trained language models (De-

vlin et al.,2019;Radford et al.,2019) and large

datasets (Rajpurkar et al.,2016;Wang et al.,2018)

have enabled tremendous progress on natural lan-

guage understanding (NLU). This progress has

been accompanied by the concern of models rely-

ing on superﬁcial features such as negation words

and lexical overlap (Poliak et al.,2018;Gururan-

gan et al.,2018;McCoy et al.,2019). Despite the

progress in building models robust to spurious fea-

tures (Clark et al.,2019;He et al.,2019;Sagawa*

et al.,2020;Veitch et al.,2021;Puli et al.,2022),

the term has been used to denote any feature that

∗equal contribution

Irrelevant features

Speilberg’s new ﬁlm is brilliant. −→ Positive

’s new ﬁlm is brilliant. −→ Positive

Necessary features

The differential compounds to a hefty sum over time.

The differential will not grow −→ Contradiction

The differential will grow −→ ?

Table 1: Difference between two spurious features: (a)

the director name can be replaced without affecting the

sentiment prediction; (b) the negation word is neces-

sary as it is not possible to determine the label without

it.

the model should not rely on, as judged by domain

experts.

Our key observation is that a feature can be con-

sidered spurious for different reasons. Compare

two such features studied in the literature (Table 1):

(a) director names (such as ‘Spielberg’) in senti-

ment analysis (Wang and Culotta,2020); (b) nega-

tion words in natural language inference (Gururan-

gan et al.,2018). We do not want the model to rely

on the director name because removing or chang-

ing it does not affect the sentiment. In contrast,

while models should not solely rely on the negation

word, they are still necessary for prediction—it is

impossible to determine the label without knowing

its presence.

In this work, we argue that many spurious fea-

tures studied in NLP are of the second type where

the feature is necessary (although not sufﬁcient)

for prediction, which is more complex to deal with

than completely irrelevant features in the ﬁrst case.

Current methods do not treat the two types of fea-

ture separately, and we show that this can lead to

misleading interpretation of the results.

To formalize the distinction illustrated in Table 1,

we borrow notions from causality (Wang and Jor-

dan,2021;Pearl,1999), and use probability of

necessity (PN) and probability of sufﬁciency (PS)

to describe the relation between a feature and a

label. Intuitively, high PN means that changing the

feature is likely to change the label (e.g. remov-

arXiv:2210.14011v1 [cs.CL] 25 Oct 2022

ing “not” will ﬂip the label); high PS means that

adding the feature to an example would produce

the label (e.g. adding “the movie is brilliant” to a

neutral review is likely to make it positive). Under

this framework, we deﬁne two types of spurious

features (Section 2): irrelevant features (e.g. the

director name) that have low PN and low PS, and

necessary features (e.g. the negation word) that

have high PN despite low PS.

Next, we describe the challenges in evaluating

and improving robustness to necessary spurious

features (Section 4). First, necessary features com-

pose with other features in the context to inﬂuence

the label. Thus, evaluating whether the model relies

solely on the necessary feature requires perturbing

the context. This process often introduces new fea-

tures and leads to inconsistent results depending on

how the context is perturbed.

Second, we analyze the effectiveness of two

classes of methods—data balancing and represen-

tation debiasing—on the two types of spurious

features. Data balancing breaks the correlation

between the label and the spurious feature (e.g.

Sagawa et al. (2020)); representation debiasing

directly removes the spurious feature from the

learned representation (e.g. Ravfogel et al. (2020)).

Although they are effective for irrelevant features,

we show that for necessary spurious features, (i)

data balancing does not lead to invariant perfor-

mance with respect to the spurious feature (Sec-

tion 5.1); and (ii) removing the spurious feature

from the representation signiﬁcantly hurts perfor-

mance (Section 5.2).

In sum, this work provides a formal characteri-

zation of spurious features in natural language. We

highlight that many common spurious features in

NLU are necessary (despite being not sufﬁcient) to

predict the label, which introduces new challenges

to both evaluation and learning.

2 Categorization of Spurious Features

2.1 Causal Models

We describe a structural causal model for text clas-

siﬁcation to illustrate the relation between differ-

ent spurious features and the label. Let

(X1, X2, .., Xn)

denote a sequence of input word-

s/features

and

the output label. We assume a

For illustration purposes, we assume that each feature is a

word in the input text. However, the same model and analysis

apply to cases where

denote a more complex feature (e.g.

named entities or text length) extracted from the input.

data generating model shown in Figure 1a. There

is a common cause

of the input (e.g. a review

writer, a PCFG or a semantic representation of the

sentence), conditioned on which the words are inde-

pendent to each other. Each word

may causally

affect the label Y.

Under this model, the dependence between

and a feature

can be induced by two processes.

The type 1 dependence is induced by a confounder

(in this case

) inﬂuencing both

and

due

to biases in data collection, e.g. search engines

return positive reviews for famous movies; we de-

note this non-causal association by the red path in

Figure 1b. The type 2 dependence is induced by

input words that causally affect

(the red path

in Figure 1c), e.g. negating an adjective affects

the sentiment. Importantly, the two processes can

and often do happen simultaneously. For example,

in NLI datasets, the association between negation

words and the label is also induced by crowdwork-

ers’ inclination of negating the premise to create a

contradiction example.

A type 1 dependence (“Titanic”-sentiment) is

clearly spurious because the feature and

are as-

sociated through

while having no causal rela-

tionship.

In contrast, a type 2 dependence (“not”-

sentiment) is not spurious per se—even a human

needs to rely on negation words to predict the label.

Now, how do we measure and differentiate the

two types of feature-label dependence? In the fol-

lowing, we describe ﬁne-grained notions of the

relationship between a feature and a label, which

will allow us to deﬁne the spuriousness of a feature.

2.2 Sufﬁciency and Necessity of a Feature

We borrow notions from causality to describe

whether a feature is a necessary or sufﬁcient cause

of a label (Pearl,1999;Wang and Jordan,2021).

Consider the examples in Table 1: intuitively, “not”

is necessary for the contradiction label because

in the absence of it (e.g. removing or replacing

it by other syntactically correct words) the exam-

ple would no longer be contradiction; in contrast,

“the movie is brilliant” is sufﬁcient to produce the

positive label because adding the sentence to a neg-

ative review is likely to increase its sentiment score.

Thus, the feature’s effect on the label relies on

counterfactual outcomes.

The two types of dependencies are also discussed in

Veitch et al. (2021), where the type 1 dependence is called

“purely spurious”.

(a) Data generating model.

Titanic

great

(b) Type 1 dependence.

It’s

not

good

Figure 1: Causal models for text classiﬁcation. (a) Cis the common cause of words in the input. Each word Xi

may be causally inﬂuence Y. (b) Y(sentiment label) and Xi(“Titanic”) are dependent because of the confounder

C(indicated by the red path). (c) Y(sentiment label) and Xi(“not”) are dependent because of a causal relation.

We use

Y(Xi=xi)

to denote the counterfactual

label of an example had we set

to the speciﬁc

value xi.3

Deﬁnition 1

(Probability of necessity)

The proba-

bility of necessity (PN) of a feature

Xi=xi

for the

label Y=yconditioned on context X−i=x−iis

PN(Xi=xi, Y =y|X−i=x−i),

p(Y(Xi6=xi)6=y|Xi=xi, X−i=x−i, Y =y).

Given an example

(x, y)

PN(xi, y |x−i)4

is the

probability that the label

would change had we

set

to a value different from

. The distribution

of the counterfactual label

Y(Xi6=xi)

is deﬁned

to be

Rp(Y(Xi))p(Xi|Xi6=xi) dXi

. This cor-

responds to the label distribution when we replace

the word

with a random word that ﬁts in the

context (e.g. “Titanic” to “Ip Man”). In practice,

we can simulate the intervention

Xi6=xi

by text

inﬁlling using masked language models (Devlin

et al.,2019).

Deﬁnition 2

(Probability of sufﬁciency)

The prob-

ability of sufﬁciency (PS) of a feature

Xi=xi

for the label

Y=y

conditioned on the context

X−i=x−iis

PS(Xi=xi, Y =y|X−i=x−i),

p(Y(Xi=xi) = y|Xi6=xi, X−i=x−i, Y 6=y).

Similarly,

PS(xi, y |x−i)

is the probability that

setting

would produce the label

on an

example where

is absent. For example, PS of

“not” for the negative sentiment measures the prob-

ability that a positive review will become negative

had we added “not” to the input.

The counterfactual label

Y(Xi=xi)

is also commonly

written as

Yxi

(Pearl,2009) but we follow the notation in

Wang and Jordan (2021)

For notational simplicity, we omit the random variables

(denoted by capital letters) when clear from the context.

We note that both PN and PS are context-

dependent—they measure the counterfactual out-

come of individual data points. For example, while

“not” has high PN for contradiction in the example

in Table 1, there are examples where it has low

PN.

Similarly, there can be examples where the

word “Titanic” has high PN.

To consider the av-

erage effect of a feature, we marginalize over the

context X−i:

PN(xi, y),ZPN(xi, y|X−i)p(X−i|xi, y) dX−i,

and similarly for PS.

Deﬁnition 3

(Spuriousness of a feature)

The spu-

riousness of a feature

Xi=xi

for a label

Y=y

1−PS(xi, y)

. We say a feature is spurious to the

label if its spuriousness is positive.

Our deﬁnition of the spuriousness of a feature

follows directly from the deﬁnition of PS, which

measures the extent to which a feature is a suf-

ﬁcient cause of the label (marginalized over the

context

X−i

). Following this deﬁnition, a feature

is non-spurious only if it is sufﬁcient in any con-

text. Admittedly, this deﬁnition may be too strict

for NLP tasks as arguably the effect of any feature

can be modulated by context, making all features

spurious. Therefore, practically we may consider a

feature non-spurious if it has low spuriousness (i.e.

high PS).

Feature categorization.

The above deﬁnitions

provide a framework for categorizing features by

their necessity and sufﬁciency to the label as shown

in Figure 2.

Consider the premise “The woman was happy” and the hy-

pothesis “The woman angrily remarked ‘This will not work!”’.

For example in sentiment analysis, consider ‘This movie

was on a similar level as Titanic’.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AreAllSpuriousFeaturesinNaturalLanguageAlike?AnAnalysisthroughaCausalLensNitishJoshi1XiangPan1HeHe1;21DepartmentofComputerScience,NewYorkUniversity2CenterforDataScience,NewYorkUniversity{nitish,xiangpan,hhe}@nyu.eduAbstractTheterm`spuriouscorrelations'hasbeenusedinNLPtoinformallydenoteanyundesirab...

展开>> 收起<<

Are All Spurious Features in Natural Language Alike An Analysis through a Causal Lens Nitish Joshi1Xiang Pan1He He12.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Are All Spurious Features in Natural Language Alike An Analysis through a Causal Lens Nitish Joshi1Xiang Pan1He He12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: