Not another Negation Benchmark The NaN-NLI Test Suite for Sub-clausal Negation Hung Thinh Truong1Yulia Otmakhova1Timothy Baldwin13

2025-05-02 0 0 306.99KB 12 页 10玖币

侵权投诉

Not another Negation Benchmark:

The NaN-NLI Test Suite for Sub-clausal Negation

Hung Thinh Truong1,∗Yulia Otmakhova1,∗Timothy Baldwin1,3

Trevor Cohn 1Jey Han Lau1Karin Verspoor2,1

1The University of Melbourne 2RMIT University 3MBZUAI

{hungthinht,yotmakhova}@student.unimelb.edu.au tb@ldwin.net

trevor.cohn@unimelb.edu.au jeyhan.lau@gmail.com karin.verspoor@rmit.edu.au

Abstract

Negation is poorly captured by current lan-

guage models, although the extent of this prob-

lem is not widely understood. We introduce a

natural language inference (NLI) test suite to

enable probing the capabilities of NLP meth-

ods, with the aim of understanding sub-clausal

negation. The test suite contains premise–

hypothesis pairs where the premise contains

sub-clausal negation and the hypothesis is con-

structed by making minimal modiﬁcations to

the premise in order to reﬂect different possi-

ble interpretations. Aside from adopting stan-

dard NLI labels, our test suite is systematically

constructed under a rigorous linguistic frame-

work. It includes annotation of negation types

and constructions grounded in linguistic the-

ory, as well as the operations used to construct

hypotheses. This facilitates ﬁne-grained anal-

ysis of model performance. We conduct ex-

periments using pre-trained language models

to demonstrate that our test suite is more chal-

lenging than existing benchmarks focused on

negation, and show how our annotation sup-

ports a deeper understanding of the current

NLI capabilities in terms of negation and quan-

tiﬁcation.

1 Introduction

Negation is an important linguistic phenomenon

which denotes non-existence, denial, or contradic-

tion, and is core to language understanding. NLP

work on negation has mostly focused on detecting

instances of negation (Peng et al.,2018;Khandel-

wal and Sawant,2020;Truong et al.,2022), and

the effect of negation on downstream or probing

tasks (Kassner and Schütze,2020;Ettinger,2020;

Hossain et al.,2020). A consistent ﬁnding in recent

work on pre-trained language models (PLMs) is

that they struggle to correctly handle negation, but

also that existing NLP benchmarks are deﬁcient in

terms of their relative occurrence and variability

*Equal contribution

of negation (Barnes et al.,2021;Tang et al.,2021;

Hossain et al.,2022).

In this work, we address the problem of evaluat-

ing the ability of models to handle negation in the

English language using a systematic, linguistically-

based approach. Speciﬁcally, we adopt the typol-

ogy proposed by Pullum and Huddleston (2002)

whereby negation is classiﬁed based on both form

(verbal and non-verbal; analytic and synthetic) and

meaning (clausal and sub-clausal; ordinary and

meta-linguistic). Based on this typology, we ob-

serve that most negation instances occurring in ex-

isting benchmarks are analytic, verbal, and clausal,

which is arguably more straightforward to handle

than non-verbal, synthetic, and sub-clausal nega-

tion. For instance, the dataset proposed by Hossain

et al. (2020) is constructed by adding the syntactic

negation cue not to the main verb of the premise

and/or the hypothesis of MNLI (Williams et al.,

2018) training examples, resulting almost exclu-

sively in verbal, analytic, and clausal negation.

Motivated by this, we construct a new evalua-

tion dataset with a focus on sub-clausal negation,

where it is non-trivial to determine the correct nega-

tion scope. For instance, the negation in They saw

not one but three dolphins only scopes over the

modiﬁer one, and thus carries a positive mean-

ing (They saw three dolphins). We choose NLI

as the probing task based on the intuition that a

complete grasp of negation is required to make cor-

rect inference judgements. Moreover, we adopt

the test suite framework (Lehmann et al.,1996)

instead of naturally-occurring text corpora, to elicit

a full range of linguistic constructions that denote

sub-clausal negation. This facilitates systematic

evaluation of model performance along controlled

dimensions. We collect examples for each con-

struction from Pullum and Huddleston (2002) to

use as premises, and then construct corresponding

hypotheses by introducing minimum changes to

premises which highlight their possible interpreta-

arXiv:2210.03256v2 [cs.CL] 13 Oct 2022

tions. We manually annotate the constructed pairs

in terms of negation types, negation constructions,

and the operations used to construct the hypotheses.

In summary, our main contributions are:

We introduce the “NaN-NLI” test suite for

probing the capabilities of NLP models to cap-

ture sub-clausal negation.

In addition to stan-

dard NLI labels, it contains various linguis-

tic annotations related to negation, to facili-

tate ﬁne-grained analysis of different construc-

tional and semantic sub-types of negation;

We conduct extensive experiments to con-

ﬁrm that our test suite is more difﬁcult than

existing negation-focused NLI benchmarks,

and show how our annotations can be used

to guide error analysis and interpretation of

model performance; and

We present a subset of our test suite (NaN-

Quant) with samples involving not only nega-

tion but also quantiﬁcation, and show that

quantiﬁcation is an especially challenging phe-

nomenon that requires future exploration.

2 Related Work

To investigate the abilities of PLMs to assign the

correct interpretation to negation, many probing

tasks have been proposed. For instance, Kassner

and Schütze (2020); Ettinger (2020) formulated a

cloze-style ﬁll-in-the-blank task where BERT is

asked to predict words for two near-identical but

contrasting sentences (e.g. A bird can vs. A

bird cannot ). Hossain et al. (2020) constructed

an NLI dataset where negations essential to cor-

rectly judge the label for a premise–hypothesis pair

were manually added to existing NLI benchmarks.

Hartmann et al. (2021) constructed a multilingual

dataset with minimal pairs of NLI examples to

analyze model behavior in the presence/absence

of negation. Most recently, Hossain et al. (2022)

conducted a comprehensive analysis of the effect

of negation on a wide range of NLU tasks in the

GLUE (Wang et al.,2018) and SuperGLUE (Wang

et al.,2019) benchmarks. These papers expose var-

ious limitations of both current benchmarks and

PLMs in the face of negation. However, they all fo-

cus on verbal and clausal negation, which are more

The test suite and all code are available at

https://

github.com/joey234/nan-nli

straightforward to process, whereas our dataset tar-

gets non-verbal and sub-clausal negation, where it

is more difﬁcult to determine the correct scope.

The idea of using a test suite to measure the

performance of NLP models was introduced by

Lehmann et al. (1996), where the authors pro-

pose general guidelines for test suite construction.

Adopting this methodology for a domain-speciﬁc

task, Cohen et al. (2010) constructed a dataset for

benchmarking ontology concept recognition sys-

tems. Most recently, Ribeiro et al. (2020) proposed

a task-agnostic testing methodology which closely

follows the idea of behavioral testing from software

engineering to comprehensively test the linguistic

capabilities of NLP models. The main advantages

of test suites over datasets made up of naturally-

occurring examples are: (1) control over the pre-

cise composition of the data: we can undertake

a targeted evaluation of speciﬁc criteria (e.g. lin-

guistic features); (2) systematicity: a test suite has

speciﬁc structure, with samples classiﬁed into well-

deﬁned categories; and (3) control of redundancy:

we can remove samples with similar properties or

over-sample rare occurrences.

3 A Test Suite for Non-verbal Negation

3.1 Negation typology

According to Pullum and Huddleston (2002), nega-

tion can be classiﬁed according to four main as-

pects:

•Verbal vs. non-verbal

: verbal negation is

when the negation marker is associated with

the verb, while non-verbal negation is associ-

ated with an adjunct or object.

•Analytic vs. synthetic

: when the negation

marker’s only syntactic function is to mark

negation (e.g. not), it represents analytic nega-

tion, whereas in synthetic negation the marker

can have other syntactic functions (e.g. a com-

pound negator nothing can also be a subject

or an object).

•Clausal vs. sub-clausal

: Clausal negation

negates the entire clause it is contained in,

whereas the scope of sub-clausal negation is

strictly less than the entire clause. For in-

stance, in Not for the ﬁrst time, she felt utterly

betrayed, only the phrase Not for the ﬁrst time

is negated.

•Ordinary vs. meta-linguistic

: meta-

linguistic negation acts as a correction to

how the negative meaning is understood. For

instance, in The house is not big, it is huge,

the negation is understood as a correction,

since huge is a more correct way of describing

the size of the house.

The ﬁrst two categories relate to the syntax of nega-

tion itself while the last two relate to semantics. In

this work, we focus on sub-clausal negation as the

correct negation scope can be challenging to deter-

mine, which can lead to misunderstanding of the

negated instance. Although meta-linguistic nega-

tion can also cause difﬁculties with interpretation,

as this class is rare in practice, we did not include

them in our test suite.

3.2 Test suite construction process

3.2.1 Selecting premises

We manually collect sentences from Pullum and

Huddleston (2002) to use as premises. Most sam-

ples are special constructions of non-verbal nega-

tion where they denote sub-clausal negation. Below

we describe the main types of these constructions.

Not + quantiﬁers

:not combines with a quanti-

ﬁer and scopes only over that quantiﬁer.

Not all:not is used to deny the larger amount,

and imply a normal value. Possible quantiﬁers

include not all, not every, not many, not much, not

often.

Not one, not two:not one is used to denote a

complete non-existence of something, and has the

same meaning as nothing or no one. When com-

bining with a numbers larger than one (usually in

phrases of time and distance), not can convey the

meaning of as little as, as in not two years ago.

Not a little: This construction negates the lower

bound of the quantiﬁcation and asserts the upper

bound, denoting a fairly large amount. For in-

stance, not a little confusion is equivalent to much

confusion.

Not + focus particles (even/only):

Not even

generally marks clausal negation while not only

marks sub-clausal negation as it carries positive

meaning. For instance, Not even Ed approved of

the plan implies that Ed did not approve the plan,

whereas in Not only Ed approved of the plan, Ed

did in fact approve the plan.

Not + degree expressions

: Expressions such

as not very, not quite mark sub-clausal negation

by reducing the degree of adjectives, adverbs, or

determiners (e.g. not very conﬁdent).

Not + afﬁxially-negated adjectives/adverbs:

When accompanied by a gradable adjective, the

construction not un- has the meaning of negating

the lower end of the scale for that adjective. For

example, not unattractive suggests the appearance

ranks higher than intermediate.

Not in coordination:

Not can appear in a co-

ordinative construction and typically scopes over

only one of the coordinating parts, thus marking

sub-clausal negation. In They are now leaving not

on Friday but on Saturday,not scopes only over

Friday and denies They are leaving on Friday.

Not with PPs:

Not can modify prepositional

phrases (PPs) to denote sub-clausal negation. In

Not for the ﬁrst time, she felt utterly betrayed,not

only negates the PP for the ﬁrst time, and the sen-

tence has positive polarity in that she did feel utterly

betrayed.

Not in verbless subordinate clauses:

Not can

scope only over a verbless subordinate clause (e.g.

We need someone not afraid of taking risks.).

Not in implicit propositions with that

: The

construction not that has the function of denying

something that is natural or expected in the con-

text (e.g. There are spare blankets in here, not that

you’ll have any need of them.).

Absolute and approximate negators:

Abso-

lute negators (no, never) denote absolute non-

existence but can also denote sub-clausal negation

when they are part of a prepositional phrase. In

They were friends in no time, only the PP in no time

is negated. Approximate negators (rarely, seldom)

denote a quantiﬁcation that is close to zero. They

imply positive meaning and thus denote sub-clausal

negation.

3.2.2 Constructing premise–hypothesis pairs

When constructing hypothesis sentences for

premises, we aimed to keep lexical changes to a

minimum. This was especially so in the case of

neutral hypotheses: though it is trivial to create any

number of neutral hypotheses by changing seman-

tically important parts of a sentence to other lexical

items thus making it impossible to determine the

truth value, intuitively, it would make the sentence

embedding of the hypothesis quite different from

that of the premise and thus easier for models to

classify correctly. We also strove to make hypothe-

ses linguistically diverse by introducing various

changes to functional words rather than relying

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NotanotherNegationBenchmark:TheNaN-NLITestSuiteforSub-clausalNegationHungThinhTruong1;YuliaOtmakhova1;TimothyBaldwin1;3TrevorCohn1JeyHanLau1KarinVerspoor2;11TheUniversityofMelbourne2RMITUniversity3MBZUAI{hungthinht,yotmakhova}@student.unimelb.edu.autb@ldwin.nettrevor.cohn@unimelb.edu.aujeyhan.lau@...

展开>> 收起<<

Not another Negation Benchmark The NaN-NLI Test Suite for Sub-clausal Negation Hung Thinh Truong1Yulia Otmakhova1Timothy Baldwin13.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Not another Negation Benchmark The NaN-NLI Test Suite for Sub-clausal Negation Hung Thinh Truong1Yulia Otmakhova1Timothy Baldwin13

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: