Syntactic Surprisal From Neural Models Predicts But Underestimates Human Processing Difficulty From Syntactic Ambiguities Suhas Arehalli

2025-05-02 0 0 521.96KB 12 页 10玖币
侵权投诉
Syntactic Surprisal From Neural Models Predicts, But Underestimates,
Human Processing Difficulty From Syntactic Ambiguities
Suhas Arehalli
Johns Hopkins University
suhas@jhu.edu
Brian Dillon
University of Massachusetts, Amherst
brian@linguist.umass.edu
Tal Linzen
New York University
linzen@nyu.edu
Abstract
Humans exhibit garden path effects: When reading
sentences that are temporarily structurally ambigu-
ous, they slow down when the structure is disam-
biguated in favor of the less preferred alternative.
Surprisal theory (Hale,2001;Levy,2008), a promi-
nent explanation of this finding, proposes that these
slowdowns are due to the unpredictability of each
of the words that occur in these sentences. Chal-
lenging this hypothesis, van Schijndel and Linzen
(2021) find that estimates of the cost of word pre-
dictability derived from language models severely
underestimate the magnitude of human garden path
effects. In this work, we consider whether this un-
derestimation is due to the fact that humans weight
syntactic factors in their predictions more highly
than language models do. We propose a method for
estimating syntactic predictability from a language
model, allowing us to weigh the cost of lexical and
syntactic predictability independently. We find that
treating syntactic predictability independently from
lexical predictability indeed results in larger esti-
mates of garden path. At the same time, even when
syntactic predictability is independently weighted,
surprisal still greatly underestimate the magnitude
of human garden path effects. Our results support
the hypothesis that predictability is not the only fac-
tor responsible for the processing cost associated
with garden path sentences.
1 Introduction
Readers exhibit garden path effects: When reading
a temporarily syntactically ambiguous sentence,
they tend to slow down when the sentence is dis-
ambiguated in favor of the less preferred parse. For
example, a participant who reads the sentence frag-
ment
(1) The suspect sent the file . . .
a. . . . to the lawyer.
b. . . . deserved further investigation.
can construct a partial parse in at least two dis-
tinct ways: In one reading, the verb sent acts as
the main verb of the sentence, and the continua-
tion of the sentence as an additional argument to
sent (as in 1a). In another, less likely, reading, sent
the file acts as a modifier in a complex subject,
which then requires an additional verb phrase to
form a complete sentence (as in 1b). Prior work
has demonstrated that regions like deserved further
investigation, which disambiguate these temporar-
ily ambiguous sentences in favor of the modifier
parse (1b), are read slower than those same words
would be in an unambiguous version of sentence,
such as the following:
(2)
The suspect who was sent the file deserved
further investigation.
In (2), the presence of who was signals to the reader
that sent the file acts as a modifier (Frazier and
Fodor,1978).
One account of this phenomenon, surprisal the-
ory (Hale,2001;Levy,2008), suggests that read-
ers maintain a probabilistic representation of all
possible parses of the input as they process the
sentence incrementally. Processing difficulty in
garden path sentences is the cost associated with
updating this representation; this cost is propor-
tional to the negative log probability, or surprisal,
of the newly observed material under the reader’s
model of upcoming words. This theory predicts
that the slowdown associated with garden path sen-
tences can be entirely captured by the differences
in surprisal between the disambiguating region in
ambiguous garden path sentences and that same
region in a matched unambiguous sentence.
Van Schijndel and Linzen (2021) tested this hy-
pothesis. They estimated the surprisals associated
with garden path sentences using LSTM language
models (LMs) trained over large natural language
arXiv:2210.12187v2 [cs.CL] 1 Aug 2023
corpora. Based on the core assumption of surprisal
theory—that processing difficulty on a word, when
all lexical factors are kept constant, stands in a
constant proportion to the word’s surprisal, regard-
less of its syntactic context—they estimated a con-
version factor between surprisal and reading times
from non-garden path sentences. Applying this con-
version factor to the critical words in garden path
sentences, van Schijndel and Linzen found that
surprisal theory, when paired with the surprisals
estimated by their models, severely underestimated
the magnitude of the garden path effect for three
garden path constructions, consistent with attempts
to estimate the magnitude of other syntactically-
modulated effects (Wilcox et al.,2021). Moreover,
the predicted reading times did not correctly pre-
dict differences across the difference garden path
constructions, suggesting that no single conversion
factor between surprisal and reading times could
predict the magnitude of the garden path effect in
all three constructions.
The underestimation documented by van Schi-
jndel and Linzen can be interpreted in one of two
ways: Either (1) surprisal theory cannot, on its own,
account for garden path effects; or (2) predictability
estimates derived from LSTM LMs fail to capture
some aspect of human prediction that is crucial to
explaining the processing of garden path sentences.
This work investigates the latter possibility. We
ask if the gap between the magnitude of human
garden path effects in humans and the magnitude
that surprisal theory predicts from LMs is due to
a mismatch between how humans and LMs weigh
two contributors to word-level surprisal: syntactic
and lexical predictability. We hypothesize that the
LM next-word prediction objective does not suf-
ficiently emphasize the importance that syntactic
structure carries for human readers, who may be
more actively concerned with interpreting the sen-
tence. In this scenario, since garden paths are the
product of unpredictable syntactic structure—as
opposed to an unpredictable lexical item—using a
LM predictability estimate for the next word could
lead to underestimation of garden path effects.
We test the hypothesis that the gap between
model and human effects can be bridged by teas-
ing apart the overall predictability of a word from
the surprisal associated with the syntactic structure
implied by the word (see Figure 1) and weighting
the two factors independently, possibly assigning
a higher weight to syntactic surprisal. In this rea-
Lexical Surprisal
Syntactic Surprisal
The newfound microbes were...
Three girls trying to
save up for a trip...
Owls are more exible...
Figure 1: A depiction of the relationship between syn-
tactic and lexical surprisal. Some word tokens, such as
are in the context of owls are more flexible, are highly
predictable in all respects. Others are unpredictable due
to the syntactic structures they imply (trying in girls
trying to save up), and are expected to be assigned high
syntactic and lexical surprisal. Tokens such as microbes
in the context the newfound microbes were, on the other
hand, appear in a predictable syntactic environment, but
are unpredictable due to their low lexical frequency;
such words should be assigned low syntactic surprisal
but high lexical surprisal. Since words that appear in
unpredictable syntactic environments are themselves
unpredictable, we do not expect to find words with high
syntactic surprisal but low lexical surprisal.
soning, we follow prior work on syntactic or unlex-
icalized surprisal carried out in the context of sym-
bolic parsers, where the probability of a structure
and particular lexical item can be explicitly disen-
tangled (Demberg and Keller,2008;Roark et al.,
2009). But while past work has demonstrated that
that unlexicalized suprisal from symbolic parsers
correlates with measures of human processing dif-
ficulty (Demberg and Keller,2008), simple recur-
rent neural networks trained to predict sequences
of part-of-speech tags have been shown to track
processing difficulty even more strongly (Frank,
2009), suggesting that even fairly limited syntac-
tic representations like part-of-speech tags can act
as a reasonable proxy of syntactic structure when
modeling human behavior.
To compute LSTM-based syntactic surprisal,
we train the LM with an auxiliary objective—
estimating the likelihood of the next word’s su-
pertag under the Combinatory Categorial Grammar
(CCG) framework (Steedman,1987)—following
Enguehard et al. (2017). Such supertags can be
viewed as enriched part-of-speech tags that encode
syntactic information about how a particular word
can be combined with its local environment. We
then define syntactic surprisal in terms of the like-
lihood of the next word’s CCG supertag, and pro-
pose a method of estimating that likelihood us-
ing our modified LMs. We validate our formula-
tion of syntactic surprisal by demonstrating that it
captures syntactic processing difficulty in garden
path sentences, while, crucially, not tracking un-
predictability that is due to low frequency lexical
items. Following van Schijndel and Linzen (2021),
we then use the syntactic and lexical surprisal val-
ues derived from those models to predict reading
times for three types of garden path sentences. We
find that adding syntactic surprisal as a separate
predictor does lead to larger estimates of garden
path effects, but those estimates are still an order of
magnitude lower than empirical garden path effects.
Finally, we discuss the implications of these find-
ings for surprisal theory and single-stage models
of syntactic processing.
2 Computing Syntactic Surprisal
1
Each incoming word can cause an adjustment in
the reader’s beliefs about the syntactic structure
of the sentence; when a syntactic structure that
was assigned a low probability prior to reading
the word now has high probability, the word can
be said to have high syntactic surprisal. We will
operationalize this intuition as the predictability
of next word’s supertag under the Combinatory
Categorial Grammar (CCG) formalism (Steedman,
1987):
surpsyn =log(P(cn|w1, ..., wn1)),(1)
where
cn
is the CCG supertag of the
n
-th word.
A CCG supertag encodes how a word combines
syntactically with adjacent constituents. For ex-
ample, a token with the tag S\NP combines with
an NP to its left to form an S constituent, and a
token with the tag (S\NP)/NP combines with an
NP to its right to form an S\NP constituent. Since
the sequence of supertags associated with all of the
words of a sentence often allows only one valid
parse, accurately predicting a sentence’s supertags
has been described as “almost parsing" (Bangalore
and Joshi,1999); consequently, incremental CCG
supertagging can be seen as almost incremental
parsing.
1
Code necessary to reproduce our experiments can
be found at
https://github.com/SArehalli/
SyntacticSurprisal
We contrast this syntactic surprisal measure with
the standard token surprisal measure, which we
refer to as lexical surprisal:
surplex =log(P(wn|w1, ..., wn1)).(2)
Note that what we call lexical surprisal captures all
factors that contribute to a token’s predictability,
including syntactic ones.
In order to compute syntactic and lexical sur-
prisal for a given word, we need models that pre-
dict, given a left context, not only the next token,
as a standard LM does, but also the next token’s su-
pertag. To do this, we train models with both a lan-
guage modeling and CCG supertagging objective,
and estimate the distribution over the next word’s
tag by marginalizing over the distribution over the
next word that is defined by the LM. Formally, for a
sequence of words
w1, ..., wnW
with supertags
c1, ..., cnC
, our model estimates the probability
of the next word given all observed words,
pwn+1 =
P(wn+1 |w1, ..., wn)
, and the probability of the
most recent word’s supertag given all currently ob-
served words,
pcn|wn=P(cn|w1, ..., wn)
. We
then infer the distribution over the next word’s su-
pertag as
P(cn+1|w1, ..., wn) = X
w
n+1W
pcn+1|w
n+1 pw
n+1
(3)
If we knew the supertag of the next word
cn+1
,
we could simply compute the surprisal of that su-
pertag,
log P(cn+1 |w1, ..., wn)
. By contrast
with lexical surprisal, however—where there is no
uncertainty about the identity of
wn+1
once that
word has been read—a word’s supertag is often am-
biguous during incremental processing. Consider
the verb gathered in the following sentences, for
example:
(3) The squirrels gathered near the tree.
(4) The squirrels gathered a few acorns.
In (3), gathered would eventually be assigned the
supertag S\NP, indicating that gathered is used in
its intransitive frame—a number of squirrels as-
sembled together as a group—and takes no direct
object. In (4), on the other hand, the appropri-
ate supertag would be (S\NP)/NP, which indicates
that in this sentence gathered is used in a transi-
tive frame and takes the noun phrase a few acorns
as a direct object. When processing this sentence
摘要:

SyntacticSurprisalFromNeuralModelsPredicts,ButUnderestimates,HumanProcessingDifficultyFromSyntacticAmbiguitiesSuhasArehalliJohnsHopkinsUniversitysuhas@jhu.eduBrianDillonUniversityofMassachusetts,Amherstbrian@linguist.umass.eduTalLinzenNewYorkUniversitylinzen@nyu.eduAbstractHumansexhibitgardenpatheff...

展开>> 收起<<
Syntactic Surprisal From Neural Models Predicts But Underestimates Human Processing Difficulty From Syntactic Ambiguities Suhas Arehalli.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:521.96KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注