Syntactic Surprisal From Neural Models Predicts But Underestimates Human Processing Difficulty From Syntactic Ambiguities Suhas Arehalli

2025-05-02 0 0 521.96KB 12 页 10玖币

侵权投诉

Syntactic Surprisal From Neural Models Predicts, But Underestimates,

Human Processing Difﬁculty From Syntactic Ambiguities

Suhas Arehalli

Johns Hopkins University

suhas@jhu.edu

Brian Dillon

University of Massachusetts, Amherst

brian@linguist.umass.edu

Tal Linzen

New York University

linzen@nyu.edu

Abstract

Humans exhibit garden path effects: When reading

sentences that are temporarily structurally ambigu-

ous, they slow down when the structure is disam-

biguated in favor of the less preferred alternative.

Surprisal theory (Hale,2001;Levy,2008), a promi-

nent explanation of this ﬁnding, proposes that these

slowdowns are due to the unpredictability of each

of the words that occur in these sentences. Chal-

lenging this hypothesis, van Schijndel and Linzen

(2021) ﬁnd that estimates of the cost of word pre-

dictability derived from language models severely

underestimate the magnitude of human garden path

effects. In this work, we consider whether this un-

derestimation is due to the fact that humans weight

syntactic factors in their predictions more highly

than language models do. We propose a method for

estimating syntactic predictability from a language

model, allowing us to weigh the cost of lexical and

syntactic predictability independently. We ﬁnd that

treating syntactic predictability independently from

lexical predictability indeed results in larger esti-

mates of garden path. At the same time, even when

syntactic predictability is independently weighted,

surprisal still greatly underestimate the magnitude

of human garden path effects. Our results support

the hypothesis that predictability is not the only fac-

tor responsible for the processing cost associated

with garden path sentences.

1 Introduction

Readers exhibit garden path effects: When reading

a temporarily syntactically ambiguous sentence,

they tend to slow down when the sentence is dis-

ambiguated in favor of the less preferred parse. For

example, a participant who reads the sentence frag-

ment

(1) The suspect sent the ﬁle . . .

a. . . . to the lawyer.

b. . . . deserved further investigation.

can construct a partial parse in at least two dis-

tinct ways: In one reading, the verb sent acts as

the main verb of the sentence, and the continua-

tion of the sentence as an additional argument to

sent (as in 1a). In another, less likely, reading, sent

the ﬁle acts as a modiﬁer in a complex subject,

which then requires an additional verb phrase to

form a complete sentence (as in 1b). Prior work

has demonstrated that regions like deserved further

investigation, which disambiguate these temporar-

ily ambiguous sentences in favor of the modiﬁer

parse (1b), are read slower than those same words

would be in an unambiguous version of sentence,

such as the following:

(2)

The suspect who was sent the ﬁle deserved

further investigation.

In (2), the presence of who was signals to the reader

that sent the ﬁle acts as a modiﬁer (Frazier and

Fodor,1978).

One account of this phenomenon, surprisal the-

ory (Hale,2001;Levy,2008), suggests that read-

ers maintain a probabilistic representation of all

possible parses of the input as they process the

sentence incrementally. Processing difﬁculty in

garden path sentences is the cost associated with

updating this representation; this cost is propor-

tional to the negative log probability, or surprisal,

of the newly observed material under the reader’s

model of upcoming words. This theory predicts

that the slowdown associated with garden path sen-

tences can be entirely captured by the differences

in surprisal between the disambiguating region in

ambiguous garden path sentences and that same

region in a matched unambiguous sentence.

Van Schijndel and Linzen (2021) tested this hy-

pothesis. They estimated the surprisals associated

with garden path sentences using LSTM language

models (LMs) trained over large natural language

arXiv:2210.12187v2 [cs.CL] 1 Aug 2023

corpora. Based on the core assumption of surprisal

theory—that processing difﬁculty on a word, when

all lexical factors are kept constant, stands in a

constant proportion to the word’s surprisal, regard-

less of its syntactic context—they estimated a con-

version factor between surprisal and reading times

from non-garden path sentences. Applying this con-

version factor to the critical words in garden path

sentences, van Schijndel and Linzen found that

surprisal theory, when paired with the surprisals

estimated by their models, severely underestimated

the magnitude of the garden path effect for three

garden path constructions, consistent with attempts

to estimate the magnitude of other syntactically-

modulated effects (Wilcox et al.,2021). Moreover,

the predicted reading times did not correctly pre-

dict differences across the difference garden path

constructions, suggesting that no single conversion

factor between surprisal and reading times could

predict the magnitude of the garden path effect in

all three constructions.

The underestimation documented by van Schi-

jndel and Linzen can be interpreted in one of two

ways: Either (1) surprisal theory cannot, on its own,

account for garden path effects; or (2) predictability

estimates derived from LSTM LMs fail to capture

some aspect of human prediction that is crucial to

explaining the processing of garden path sentences.

This work investigates the latter possibility. We

ask if the gap between the magnitude of human

garden path effects in humans and the magnitude

that surprisal theory predicts from LMs is due to

a mismatch between how humans and LMs weigh

two contributors to word-level surprisal: syntactic

and lexical predictability. We hypothesize that the

LM next-word prediction objective does not suf-

ﬁciently emphasize the importance that syntactic

structure carries for human readers, who may be

more actively concerned with interpreting the sen-

tence. In this scenario, since garden paths are the

product of unpredictable syntactic structure—as

opposed to an unpredictable lexical item—using a

LM predictability estimate for the next word could

lead to underestimation of garden path effects.

We test the hypothesis that the gap between

model and human effects can be bridged by teas-

ing apart the overall predictability of a word from

the surprisal associated with the syntactic structure

implied by the word (see Figure 1) and weighting

the two factors independently, possibly assigning

a higher weight to syntactic surprisal. In this rea-

Lexical Surprisal

Syntactic Surprisal

The newfound microbes were...

Three girls trying to

save up for a trip...

Owls are more exible...

Figure 1: A depiction of the relationship between syn-

tactic and lexical surprisal. Some word tokens, such as

are in the context of owls are more ﬂexible, are highly

predictable in all respects. Others are unpredictable due

to the syntactic structures they imply (trying in girls

trying to save up), and are expected to be assigned high

syntactic and lexical surprisal. Tokens such as microbes

in the context the newfound microbes were, on the other

hand, appear in a predictable syntactic environment, but

are unpredictable due to their low lexical frequency;

such words should be assigned low syntactic surprisal

but high lexical surprisal. Since words that appear in

unpredictable syntactic environments are themselves

unpredictable, we do not expect to ﬁnd words with high

syntactic surprisal but low lexical surprisal.

soning, we follow prior work on syntactic or unlex-

icalized surprisal carried out in the context of sym-

bolic parsers, where the probability of a structure

and particular lexical item can be explicitly disen-

tangled (Demberg and Keller,2008;Roark et al.,

2009). But while past work has demonstrated that

that unlexicalized suprisal from symbolic parsers

correlates with measures of human processing dif-

ﬁculty (Demberg and Keller,2008), simple recur-

rent neural networks trained to predict sequences

of part-of-speech tags have been shown to track

processing difﬁculty even more strongly (Frank,

2009), suggesting that even fairly limited syntac-

tic representations like part-of-speech tags can act

as a reasonable proxy of syntactic structure when

modeling human behavior.

To compute LSTM-based syntactic surprisal,

we train the LM with an auxiliary objective—

estimating the likelihood of the next word’s su-

pertag under the Combinatory Categorial Grammar

(CCG) framework (Steedman,1987)—following

Enguehard et al. (2017). Such supertags can be

viewed as enriched part-of-speech tags that encode

syntactic information about how a particular word

can be combined with its local environment. We

then deﬁne syntactic surprisal in terms of the like-

lihood of the next word’s CCG supertag, and pro-

pose a method of estimating that likelihood us-

ing our modiﬁed LMs. We validate our formula-

tion of syntactic surprisal by demonstrating that it

captures syntactic processing difﬁculty in garden

path sentences, while, crucially, not tracking un-

predictability that is due to low frequency lexical

items. Following van Schijndel and Linzen (2021),

we then use the syntactic and lexical surprisal val-

ues derived from those models to predict reading

times for three types of garden path sentences. We

ﬁnd that adding syntactic surprisal as a separate

predictor does lead to larger estimates of garden

path effects, but those estimates are still an order of

magnitude lower than empirical garden path effects.

Finally, we discuss the implications of these ﬁnd-

ings for surprisal theory and single-stage models

of syntactic processing.

2 Computing Syntactic Surprisal

Each incoming word can cause an adjustment in

the reader’s beliefs about the syntactic structure

of the sentence; when a syntactic structure that

was assigned a low probability prior to reading

the word now has high probability, the word can

be said to have high syntactic surprisal. We will

operationalize this intuition as the predictability

of next word’s supertag under the Combinatory

Categorial Grammar (CCG) formalism (Steedman,

1987):

surpsyn =−log(P(cn|w1, ..., wn−1)),(1)

where

is the CCG supertag of the

-th word.

A CCG supertag encodes how a word combines

syntactically with adjacent constituents. For ex-

ample, a token with the tag S\NP combines with

an NP to its left to form an S constituent, and a

token with the tag (S\NP)/NP combines with an

NP to its right to form an S\NP constituent. Since

the sequence of supertags associated with all of the

words of a sentence often allows only one valid

parse, accurately predicting a sentence’s supertags

has been described as “almost parsing" (Bangalore

and Joshi,1999); consequently, incremental CCG

supertagging can be seen as almost incremental

parsing.

Code necessary to reproduce our experiments can

be found at

https://github.com/SArehalli/

SyntacticSurprisal

We contrast this syntactic surprisal measure with

the standard token surprisal measure, which we

refer to as lexical surprisal:

surplex =−log(P(wn|w1, ..., wn−1)).(2)

Note that what we call lexical surprisal captures all

factors that contribute to a token’s predictability,

including syntactic ones.

In order to compute syntactic and lexical sur-

prisal for a given word, we need models that pre-

dict, given a left context, not only the next token,

as a standard LM does, but also the next token’s su-

pertag. To do this, we train models with both a lan-

guage modeling and CCG supertagging objective,

and estimate the distribution over the next word’s

tag by marginalizing over the distribution over the

next word that is deﬁned by the LM. Formally, for a

sequence of words

w1, ..., wn∈W

with supertags

c1, ..., cn∈C

, our model estimates the probability

of the next word given all observed words,

pwn+1 =

P(wn+1 |w1, ..., wn)

, and the probability of the

most recent word’s supertag given all currently ob-

served words,

pcn|wn=P(cn|w1, ..., wn)

. We

then infer the distribution over the next word’s su-

pertag as

P(cn+1|w1, ..., wn) = X

w∗

n+1∈W

pcn+1|w∗

n+1 pw∗

n+1

(3)

If we knew the supertag of the next word

cn+1

we could simply compute the surprisal of that su-

pertag,

−log P(cn+1 |w1, ..., wn)

. By contrast

with lexical surprisal, however—where there is no

uncertainty about the identity of

wn+1

once that

word has been read—a word’s supertag is often am-

biguous during incremental processing. Consider

the verb gathered in the following sentences, for

example:

(3) The squirrels gathered near the tree.

(4) The squirrels gathered a few acorns.

In (3), gathered would eventually be assigned the

supertag S\NP, indicating that gathered is used in

its intransitive frame—a number of squirrels as-

sembled together as a group—and takes no direct

object. In (4), on the other hand, the appropri-

ate supertag would be (S\NP)/NP, which indicates

that in this sentence gathered is used in a transi-

tive frame and takes the noun phrase a few acorns

as a direct object. When processing this sentence

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SyntacticSurprisalFromNeuralModelsPredicts,ButUnderestimates,HumanProcessingDifficultyFromSyntacticAmbiguitiesSuhasArehalliJohnsHopkinsUniversitysuhas@jhu.eduBrianDillonUniversityofMassachusetts,Amherstbrian@linguist.umass.eduTalLinzenNewYorkUniversitylinzen@nyu.eduAbstractHumansexhibitgardenpatheff...

展开>> 收起<<

Syntactic Surprisal From Neural Models Predicts But Underestimates Human Processing Difficulty From Syntactic Ambiguities Suhas Arehalli.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Syntactic Surprisal From Neural Models Predicts But Underestimates Human Processing Difficulty From Syntactic Ambiguities Suhas Arehalli

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: