Are All Spurious Features in Natural Language Alike An Analysis through a Causal Lens Nitish Joshi1Xiang Pan1He He12

2025-04-30 0 0 490.15KB 14 页 10玖币
侵权投诉
Are All Spurious Features in Natural Language Alike? An Analysis
through a Causal Lens
Nitish Joshi1Xiang Pan1He He1,2
1Department of Computer Science, New York University
2Center for Data Science, New York University
{nitish, xiangpan, hhe}@nyu.edu
Abstract
The term ‘spurious correlations’ has been used
in NLP to informally denote any undesirable
feature-label correlations. However, a corre-
lation can be undesirable because (i) the fea-
ture is irrelevant to the label (e.g. punctuation
in a review), or (ii) the feature’s effect on the
label depends on the context (e.g. negation
words in a review), which is ubiquitous in lan-
guage tasks. In case (i), we want the model
to be invariant to the feature, which is neither
necessary nor sufficient for prediction. But in
case (ii), even an ideal model (e.g. humans)
must rely on the feature, since it is necessary
(but not sufficient) for prediction. Therefore,
a more fine-grained treatment of spurious fea-
tures is needed to specify the desired model
behavior. We formalize this distinction using
a causal model and probabilities of necessity
and sufficiency, which delineates the causal re-
lations between a feature and a label. We then
show that this distinction helps explain results
of existing debiasing methods on different spu-
rious features, and demystifies surprising re-
sults such as the encoding of spurious features
in model representations after debiasing.
1 Introduction
Advancements in pre-trained language models (De-
vlin et al.,2019;Radford et al.,2019) and large
datasets (Rajpurkar et al.,2016;Wang et al.,2018)
have enabled tremendous progress on natural lan-
guage understanding (NLU). This progress has
been accompanied by the concern of models rely-
ing on superficial features such as negation words
and lexical overlap (Poliak et al.,2018;Gururan-
gan et al.,2018;McCoy et al.,2019). Despite the
progress in building models robust to spurious fea-
tures (Clark et al.,2019;He et al.,2019;Sagawa*
et al.,2020;Veitch et al.,2021;Puli et al.,2022),
the term has been used to denote any feature that
equal contribution
Irrelevant features
Speilbergs new film is brilliant. Positive
s new film is brilliant. Positive
Necessary features
The differential compounds to a hefty sum over time.
The differential will not grow Contradiction
The differential will grow ?
Table 1: Difference between two spurious features: (a)
the director name can be replaced without affecting the
sentiment prediction; (b) the negation word is neces-
sary as it is not possible to determine the label without
it.
the model should not rely on, as judged by domain
experts.
Our key observation is that a feature can be con-
sidered spurious for different reasons. Compare
two such features studied in the literature (Table 1):
(a) director names (such as ‘Spielberg’) in senti-
ment analysis (Wang and Culotta,2020); (b) nega-
tion words in natural language inference (Gururan-
gan et al.,2018). We do not want the model to rely
on the director name because removing or chang-
ing it does not affect the sentiment. In contrast,
while models should not solely rely on the negation
word, they are still necessary for prediction—it is
impossible to determine the label without knowing
its presence.
In this work, we argue that many spurious fea-
tures studied in NLP are of the second type where
the feature is necessary (although not sufficient)
for prediction, which is more complex to deal with
than completely irrelevant features in the first case.
Current methods do not treat the two types of fea-
ture separately, and we show that this can lead to
misleading interpretation of the results.
To formalize the distinction illustrated in Table 1,
we borrow notions from causality (Wang and Jor-
dan,2021;Pearl,1999), and use probability of
necessity (PN) and probability of sufficiency (PS)
to describe the relation between a feature and a
label. Intuitively, high PN means that changing the
feature is likely to change the label (e.g. remov-
arXiv:2210.14011v1 [cs.CL] 25 Oct 2022
ing “not” will flip the label); high PS means that
adding the feature to an example would produce
the label (e.g. adding “the movie is brilliant” to a
neutral review is likely to make it positive). Under
this framework, we define two types of spurious
features (Section 2): irrelevant features (e.g. the
director name) that have low PN and low PS, and
necessary features (e.g. the negation word) that
have high PN despite low PS.
Next, we describe the challenges in evaluating
and improving robustness to necessary spurious
features (Section 4). First, necessary features com-
pose with other features in the context to influence
the label. Thus, evaluating whether the model relies
solely on the necessary feature requires perturbing
the context. This process often introduces new fea-
tures and leads to inconsistent results depending on
how the context is perturbed.
Second, we analyze the effectiveness of two
classes of methods—data balancing and represen-
tation debiasing—on the two types of spurious
features. Data balancing breaks the correlation
between the label and the spurious feature (e.g.
Sagawa et al. (2020)); representation debiasing
directly removes the spurious feature from the
learned representation (e.g. Ravfogel et al. (2020)).
Although they are effective for irrelevant features,
we show that for necessary spurious features, (i)
data balancing does not lead to invariant perfor-
mance with respect to the spurious feature (Sec-
tion 5.1); and (ii) removing the spurious feature
from the representation significantly hurts perfor-
mance (Section 5.2).
In sum, this work provides a formal characteri-
zation of spurious features in natural language. We
highlight that many common spurious features in
NLU are necessary (despite being not sufficient) to
predict the label, which introduces new challenges
to both evaluation and learning.
2 Categorization of Spurious Features
2.1 Causal Models
We describe a structural causal model for text clas-
sification to illustrate the relation between differ-
ent spurious features and the label. Let
X=
(X1, X2, .., Xn)
denote a sequence of input word-
s/features
1
and
Y
the output label. We assume a
1
For illustration purposes, we assume that each feature is a
word in the input text. However, the same model and analysis
apply to cases where
Xi
denote a more complex feature (e.g.
named entities or text length) extracted from the input.
data generating model shown in Figure 1a. There
is a common cause
C
of the input (e.g. a review
writer, a PCFG or a semantic representation of the
sentence), conditioned on which the words are inde-
pendent to each other. Each word
Xi
may causally
affect the label Y.
Under this model, the dependence between
Y
and a feature
Xi
can be induced by two processes.
The type 1 dependence is induced by a confounder
(in this case
C
) influencing both
Y
and
Xi
due
to biases in data collection, e.g. search engines
return positive reviews for famous movies; we de-
note this non-causal association by the red path in
Figure 1b. The type 2 dependence is induced by
input words that causally affect
Y
(the red path
in Figure 1c), e.g. negating an adjective affects
the sentiment. Importantly, the two processes can
and often do happen simultaneously. For example,
in NLI datasets, the association between negation
words and the label is also induced by crowdwork-
ers’ inclination of negating the premise to create a
contradiction example.
A type 1 dependence (“Titanic”-sentiment) is
clearly spurious because the feature and
Y
are as-
sociated through
C
while having no causal rela-
tionship.
2
In contrast, a type 2 dependence (“not”-
sentiment) is not spurious per se—even a human
needs to rely on negation words to predict the label.
Now, how do we measure and differentiate the
two types of feature-label dependence? In the fol-
lowing, we describe fine-grained notions of the
relationship between a feature and a label, which
will allow us to define the spuriousness of a feature.
2.2 Sufficiency and Necessity of a Feature
We borrow notions from causality to describe
whether a feature is a necessary or sufcient cause
of a label (Pearl,1999;Wang and Jordan,2021).
Consider the examples in Table 1: intuitively, “not”
is necessary for the contradiction label because
in the absence of it (e.g. removing or replacing
it by other syntactically correct words) the exam-
ple would no longer be contradiction; in contrast,
“the movie is brilliant” is sufficient to produce the
positive label because adding the sentence to a neg-
ative review is likely to increase its sentiment score.
Thus, the feature’s effect on the label relies on
counterfactual outcomes.
2
The two types of dependencies are also discussed in
Veitch et al. (2021), where the type 1 dependence is called
“purely spurious”.
C
X1
X2
.
.
.
Xd
Y
(a) Data generating model.
Titanic
is
C
great
Y
(b) Type 1 dependence.
It’s
not
C
good
Y
(c) Type 2 dependence.
Figure 1: Causal models for text classification. (a) Cis the common cause of words in the input. Each word Xi
may be causally influence Y. (b) Y(sentiment label) and Xi(“Titanic”) are dependent because of the confounder
C(indicated by the red path). (c) Y(sentiment label) and Xi(“not”) are dependent because of a causal relation.
We use
Y(Xi=xi)
to denote the counterfactual
label of an example had we set
Xi
to the specific
value xi.3
Definition 1
(Probability of necessity)
.
The proba-
bility of necessity (PN) of a feature
Xi=xi
for the
label Y=yconditioned on context Xi=xiis
PN(Xi=xi, Y =y|Xi=xi),
p(Y(Xi6=xi)6=y|Xi=xi, Xi=xi, Y =y).
Given an example
(x, y)
,
PN(xi, y |xi)4
is the
probability that the label
y
would change had we
set
Xi
to a value different from
xi
. The distribution
of the counterfactual label
Y(Xi6=xi)
is defined
to be
Rp(Y(Xi))p(Xi|Xi6=xi) dXi
. This cor-
responds to the label distribution when we replace
the word
xi
with a random word that fits in the
context (e.g. “Titanic” to “Ip Man”). In practice,
we can simulate the intervention
Xi6=xi
by text
infilling using masked language models (Devlin
et al.,2019).
Definition 2
(Probability of sufficiency)
.
The prob-
ability of sufficiency (PS) of a feature
Xi=xi
for the label
Y=y
conditioned on the context
Xi=xiis
PS(Xi=xi, Y =y|Xi=xi),
p(Y(Xi=xi) = y|Xi6=xi, Xi=xi, Y 6=y).
Similarly,
PS(xi, y |xi)
is the probability that
setting
Xi
to
xi
would produce the label
y
on an
example where
xi
is absent. For example, PS of
“not” for the negative sentiment measures the prob-
ability that a positive review will become negative
had we added “not” to the input.
3
The counterfactual label
Y(Xi=xi)
is also commonly
written as
Yxi
(Pearl,2009) but we follow the notation in
Wang and Jordan (2021)
4
For notational simplicity, we omit the random variables
(denoted by capital letters) when clear from the context.
We note that both PN and PS are context-
dependent—they measure the counterfactual out-
come of individual data points. For example, while
“not” has high PN for contradiction in the example
in Table 1, there are examples where it has low
PN.
5
Similarly, there can be examples where the
word “Titanic” has high PN.
6
To consider the av-
erage effect of a feature, we marginalize over the
context Xi:
PN(xi, y),ZPN(xi, y|Xi)p(Xi|xi, y) dXi,
and similarly for PS.
Definition 3
(Spuriousness of a feature)
.
The spu-
riousness of a feature
Xi=xi
for a label
Y=y
is
1PS(xi, y)
. We say a feature is spurious to the
label if its spuriousness is positive.
Our definition of the spuriousness of a feature
follows directly from the definition of PS, which
measures the extent to which a feature is a suf-
ficient cause of the label (marginalized over the
context
Xi
). Following this definition, a feature
is non-spurious only if it is sufficient in any con-
text. Admittedly, this definition may be too strict
for NLP tasks as arguably the effect of any feature
can be modulated by context, making all features
spurious. Therefore, practically we may consider a
feature non-spurious if it has low spuriousness (i.e.
high PS).
Feature categorization.
The above definitions
provide a framework for categorizing features by
their necessity and sufficiency to the label as shown
in Figure 2.
5
Consider the premise “The woman was happy” and the hy-
pothesis “The woman angrily remarked ‘This will not work!”’.
6
For example in sentiment analysis, consider ‘This movie
was on a similar level as Titanic’.
摘要:

AreAllSpuriousFeaturesinNaturalLanguageAlike?AnAnalysisthroughaCausalLensNitishJoshi1XiangPan1HeHe1;21DepartmentofComputerScience,NewYorkUniversity2CenterforDataScience,NewYorkUniversity{nitish,xiangpan,hhe}@nyu.eduAbstractTheterm`spuriouscorrelations'hasbeenusedinNLPtoinformallydenoteanyundesirab...

展开>> 收起<<
Are All Spurious Features in Natural Language Alike An Analysis through a Causal Lens Nitish Joshi1Xiang Pan1He He12.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:490.15KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注