Not another Negation Benchmark The NaN-NLI Test Suite for Sub-clausal Negation Hung Thinh Truong1Yulia Otmakhova1Timothy Baldwin13

2025-05-02 0 0 306.99KB 12 页 10玖币
侵权投诉
Not another Negation Benchmark:
The NaN-NLI Test Suite for Sub-clausal Negation
Hung Thinh Truong1,Yulia Otmakhova1,Timothy Baldwin1,3
Trevor Cohn 1Jey Han Lau1Karin Verspoor2,1
1The University of Melbourne 2RMIT University 3MBZUAI
{hungthinht,yotmakhova}@student.unimelb.edu.au tb@ldwin.net
trevor.cohn@unimelb.edu.au jeyhan.lau@gmail.com karin.verspoor@rmit.edu.au
Abstract
Negation is poorly captured by current lan-
guage models, although the extent of this prob-
lem is not widely understood. We introduce a
natural language inference (NLI) test suite to
enable probing the capabilities of NLP meth-
ods, with the aim of understanding sub-clausal
negation. The test suite contains premise–
hypothesis pairs where the premise contains
sub-clausal negation and the hypothesis is con-
structed by making minimal modifications to
the premise in order to reflect different possi-
ble interpretations. Aside from adopting stan-
dard NLI labels, our test suite is systematically
constructed under a rigorous linguistic frame-
work. It includes annotation of negation types
and constructions grounded in linguistic the-
ory, as well as the operations used to construct
hypotheses. This facilitates fine-grained anal-
ysis of model performance. We conduct ex-
periments using pre-trained language models
to demonstrate that our test suite is more chal-
lenging than existing benchmarks focused on
negation, and show how our annotation sup-
ports a deeper understanding of the current
NLI capabilities in terms of negation and quan-
tification.
1 Introduction
Negation is an important linguistic phenomenon
which denotes non-existence, denial, or contradic-
tion, and is core to language understanding. NLP
work on negation has mostly focused on detecting
instances of negation (Peng et al.,2018;Khandel-
wal and Sawant,2020;Truong et al.,2022), and
the effect of negation on downstream or probing
tasks (Kassner and Schütze,2020;Ettinger,2020;
Hossain et al.,2020). A consistent finding in recent
work on pre-trained language models (PLMs) is
that they struggle to correctly handle negation, but
also that existing NLP benchmarks are deficient in
terms of their relative occurrence and variability
*Equal contribution
of negation (Barnes et al.,2021;Tang et al.,2021;
Hossain et al.,2022).
In this work, we address the problem of evaluat-
ing the ability of models to handle negation in the
English language using a systematic, linguistically-
based approach. Specifically, we adopt the typol-
ogy proposed by Pullum and Huddleston (2002)
whereby negation is classified based on both form
(verbal and non-verbal; analytic and synthetic) and
meaning (clausal and sub-clausal; ordinary and
meta-linguistic). Based on this typology, we ob-
serve that most negation instances occurring in ex-
isting benchmarks are analytic, verbal, and clausal,
which is arguably more straightforward to handle
than non-verbal, synthetic, and sub-clausal nega-
tion. For instance, the dataset proposed by Hossain
et al. (2020) is constructed by adding the syntactic
negation cue not to the main verb of the premise
and/or the hypothesis of MNLI (Williams et al.,
2018) training examples, resulting almost exclu-
sively in verbal, analytic, and clausal negation.
Motivated by this, we construct a new evalua-
tion dataset with a focus on sub-clausal negation,
where it is non-trivial to determine the correct nega-
tion scope. For instance, the negation in They saw
not one but three dolphins only scopes over the
modifier one, and thus carries a positive mean-
ing (They saw three dolphins). We choose NLI
as the probing task based on the intuition that a
complete grasp of negation is required to make cor-
rect inference judgements. Moreover, we adopt
the test suite framework (Lehmann et al.,1996)
instead of naturally-occurring text corpora, to elicit
a full range of linguistic constructions that denote
sub-clausal negation. This facilitates systematic
evaluation of model performance along controlled
dimensions. We collect examples for each con-
struction from Pullum and Huddleston (2002) to
use as premises, and then construct corresponding
hypotheses by introducing minimum changes to
premises which highlight their possible interpreta-
arXiv:2210.03256v2 [cs.CL] 13 Oct 2022
tions. We manually annotate the constructed pairs
in terms of negation types, negation constructions,
and the operations used to construct the hypotheses.
In summary, our main contributions are:
1.
We introduce the “NaN-NLI” test suite for
probing the capabilities of NLP models to cap-
ture sub-clausal negation.
1
In addition to stan-
dard NLI labels, it contains various linguis-
tic annotations related to negation, to facili-
tate fine-grained analysis of different construc-
tional and semantic sub-types of negation;
2.
We conduct extensive experiments to con-
firm that our test suite is more difficult than
existing negation-focused NLI benchmarks,
and show how our annotations can be used
to guide error analysis and interpretation of
model performance; and
3.
We present a subset of our test suite (NaN-
Quant) with samples involving not only nega-
tion but also quantification, and show that
quantification is an especially challenging phe-
nomenon that requires future exploration.
2 Related Work
To investigate the abilities of PLMs to assign the
correct interpretation to negation, many probing
tasks have been proposed. For instance, Kassner
and Schütze (2020); Ettinger (2020) formulated a
cloze-style fill-in-the-blank task where BERT is
asked to predict words for two near-identical but
contrasting sentences (e.g. A bird can vs. A
bird cannot ). Hossain et al. (2020) constructed
an NLI dataset where negations essential to cor-
rectly judge the label for a premise–hypothesis pair
were manually added to existing NLI benchmarks.
Hartmann et al. (2021) constructed a multilingual
dataset with minimal pairs of NLI examples to
analyze model behavior in the presence/absence
of negation. Most recently, Hossain et al. (2022)
conducted a comprehensive analysis of the effect
of negation on a wide range of NLU tasks in the
GLUE (Wang et al.,2018) and SuperGLUE (Wang
et al.,2019) benchmarks. These papers expose var-
ious limitations of both current benchmarks and
PLMs in the face of negation. However, they all fo-
cus on verbal and clausal negation, which are more
1
The test suite and all code are available at
https://
github.com/joey234/nan-nli
straightforward to process, whereas our dataset tar-
gets non-verbal and sub-clausal negation, where it
is more difficult to determine the correct scope.
The idea of using a test suite to measure the
performance of NLP models was introduced by
Lehmann et al. (1996), where the authors pro-
pose general guidelines for test suite construction.
Adopting this methodology for a domain-specific
task, Cohen et al. (2010) constructed a dataset for
benchmarking ontology concept recognition sys-
tems. Most recently, Ribeiro et al. (2020) proposed
a task-agnostic testing methodology which closely
follows the idea of behavioral testing from software
engineering to comprehensively test the linguistic
capabilities of NLP models. The main advantages
of test suites over datasets made up of naturally-
occurring examples are: (1) control over the pre-
cise composition of the data: we can undertake
a targeted evaluation of specific criteria (e.g. lin-
guistic features); (2) systematicity: a test suite has
specific structure, with samples classified into well-
defined categories; and (3) control of redundancy:
we can remove samples with similar properties or
over-sample rare occurrences.
3 A Test Suite for Non-verbal Negation
3.1 Negation typology
According to Pullum and Huddleston (2002), nega-
tion can be classified according to four main as-
pects:
Verbal vs. non-verbal
: verbal negation is
when the negation marker is associated with
the verb, while non-verbal negation is associ-
ated with an adjunct or object.
Analytic vs. synthetic
: when the negation
marker’s only syntactic function is to mark
negation (e.g. not), it represents analytic nega-
tion, whereas in synthetic negation the marker
can have other syntactic functions (e.g. a com-
pound negator nothing can also be a subject
or an object).
Clausal vs. sub-clausal
: Clausal negation
negates the entire clause it is contained in,
whereas the scope of sub-clausal negation is
strictly less than the entire clause. For in-
stance, in Not for the first time, she felt utterly
betrayed, only the phrase Not for the first time
is negated.
Ordinary vs. meta-linguistic
: meta-
linguistic negation acts as a correction to
how the negative meaning is understood. For
instance, in The house is not big, it is huge,
the negation is understood as a correction,
since huge is a more correct way of describing
the size of the house.
The first two categories relate to the syntax of nega-
tion itself while the last two relate to semantics. In
this work, we focus on sub-clausal negation as the
correct negation scope can be challenging to deter-
mine, which can lead to misunderstanding of the
negated instance. Although meta-linguistic nega-
tion can also cause difficulties with interpretation,
as this class is rare in practice, we did not include
them in our test suite.
3.2 Test suite construction process
3.2.1 Selecting premises
We manually collect sentences from Pullum and
Huddleston (2002) to use as premises. Most sam-
ples are special constructions of non-verbal nega-
tion where they denote sub-clausal negation. Below
we describe the main types of these constructions.
Not + quantifiers
:not combines with a quanti-
fier and scopes only over that quantifier.
Not all:not is used to deny the larger amount,
and imply a normal value. Possible quantifiers
include not all, not every, not many, not much, not
often.
Not one, not two:not one is used to denote a
complete non-existence of something, and has the
same meaning as nothing or no one. When com-
bining with a numbers larger than one (usually in
phrases of time and distance), not can convey the
meaning of as little as, as in not two years ago.
Not a little: This construction negates the lower
bound of the quantification and asserts the upper
bound, denoting a fairly large amount. For in-
stance, not a little confusion is equivalent to much
confusion.
Not + focus particles (even/only):
Not even
generally marks clausal negation while not only
marks sub-clausal negation as it carries positive
meaning. For instance, Not even Ed approved of
the plan implies that Ed did not approve the plan,
whereas in Not only Ed approved of the plan, Ed
did in fact approve the plan.
Not + degree expressions
: Expressions such
as not very, not quite mark sub-clausal negation
by reducing the degree of adjectives, adverbs, or
determiners (e.g. not very confident).
Not + affixially-negated adjectives/adverbs:
When accompanied by a gradable adjective, the
construction not un- has the meaning of negating
the lower end of the scale for that adjective. For
example, not unattractive suggests the appearance
ranks higher than intermediate.
Not in coordination:
Not can appear in a co-
ordinative construction and typically scopes over
only one of the coordinating parts, thus marking
sub-clausal negation. In They are now leaving not
on Friday but on Saturday,not scopes only over
Friday and denies They are leaving on Friday.
Not with PPs:
Not can modify prepositional
phrases (PPs) to denote sub-clausal negation. In
Not for the first time, she felt utterly betrayed,not
only negates the PP for the first time, and the sen-
tence has positive polarity in that she did feel utterly
betrayed.
Not in verbless subordinate clauses:
Not can
scope only over a verbless subordinate clause (e.g.
We need someone not afraid of taking risks.).
Not in implicit propositions with that
: The
construction not that has the function of denying
something that is natural or expected in the con-
text (e.g. There are spare blankets in here, not that
you’ll have any need of them.).
Absolute and approximate negators:
Abso-
lute negators (no, never) denote absolute non-
existence but can also denote sub-clausal negation
when they are part of a prepositional phrase. In
They were friends in no time, only the PP in no time
is negated. Approximate negators (rarely, seldom)
denote a quantification that is close to zero. They
imply positive meaning and thus denote sub-clausal
negation.
3.2.2 Constructing premise–hypothesis pairs
When constructing hypothesis sentences for
premises, we aimed to keep lexical changes to a
minimum. This was especially so in the case of
neutral hypotheses: though it is trivial to create any
number of neutral hypotheses by changing seman-
tically important parts of a sentence to other lexical
items thus making it impossible to determine the
truth value, intuitively, it would make the sentence
embedding of the hypothesis quite different from
that of the premise and thus easier for models to
classify correctly. We also strove to make hypothe-
ses linguistically diverse by introducing various
changes to functional words rather than relying
摘要:

NotanotherNegationBenchmark:TheNaN-NLITestSuiteforSub-clausalNegationHungThinhTruong1;YuliaOtmakhova1;TimothyBaldwin1;3TrevorCohn1JeyHanLau1KarinVerspoor2;11TheUniversityofMelbourne2RMITUniversity3MBZUAI{hungthinht,yotmakhova}@student.unimelb.edu.autb@ldwin.nettrevor.cohn@unimelb.edu.aujeyhan.lau@...

展开>> 收起<<
Not another Negation Benchmark The NaN-NLI Test Suite for Sub-clausal Negation Hung Thinh Truong1Yulia Otmakhova1Timothy Baldwin13.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:306.99KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注