
corpora. Based on the core assumption of surprisal
theory—that processing difficulty on a word, when
all lexical factors are kept constant, stands in a
constant proportion to the word’s surprisal, regard-
less of its syntactic context—they estimated a con-
version factor between surprisal and reading times
from non-garden path sentences. Applying this con-
version factor to the critical words in garden path
sentences, van Schijndel and Linzen found that
surprisal theory, when paired with the surprisals
estimated by their models, severely underestimated
the magnitude of the garden path effect for three
garden path constructions, consistent with attempts
to estimate the magnitude of other syntactically-
modulated effects (Wilcox et al.,2021). Moreover,
the predicted reading times did not correctly pre-
dict differences across the difference garden path
constructions, suggesting that no single conversion
factor between surprisal and reading times could
predict the magnitude of the garden path effect in
all three constructions.
The underestimation documented by van Schi-
jndel and Linzen can be interpreted in one of two
ways: Either (1) surprisal theory cannot, on its own,
account for garden path effects; or (2) predictability
estimates derived from LSTM LMs fail to capture
some aspect of human prediction that is crucial to
explaining the processing of garden path sentences.
This work investigates the latter possibility. We
ask if the gap between the magnitude of human
garden path effects in humans and the magnitude
that surprisal theory predicts from LMs is due to
a mismatch between how humans and LMs weigh
two contributors to word-level surprisal: syntactic
and lexical predictability. We hypothesize that the
LM next-word prediction objective does not suf-
ficiently emphasize the importance that syntactic
structure carries for human readers, who may be
more actively concerned with interpreting the sen-
tence. In this scenario, since garden paths are the
product of unpredictable syntactic structure—as
opposed to an unpredictable lexical item—using a
LM predictability estimate for the next word could
lead to underestimation of garden path effects.
We test the hypothesis that the gap between
model and human effects can be bridged by teas-
ing apart the overall predictability of a word from
the surprisal associated with the syntactic structure
implied by the word (see Figure 1) and weighting
the two factors independently, possibly assigning
a higher weight to syntactic surprisal. In this rea-
Lexical Surprisal
Syntactic Surprisal
The newfound microbes were...
Three girls trying to
save up for a trip...
Owls are more exible...
Figure 1: A depiction of the relationship between syn-
tactic and lexical surprisal. Some word tokens, such as
are in the context of owls are more flexible, are highly
predictable in all respects. Others are unpredictable due
to the syntactic structures they imply (trying in girls
trying to save up), and are expected to be assigned high
syntactic and lexical surprisal. Tokens such as microbes
in the context the newfound microbes were, on the other
hand, appear in a predictable syntactic environment, but
are unpredictable due to their low lexical frequency;
such words should be assigned low syntactic surprisal
but high lexical surprisal. Since words that appear in
unpredictable syntactic environments are themselves
unpredictable, we do not expect to find words with high
syntactic surprisal but low lexical surprisal.
soning, we follow prior work on syntactic or unlex-
icalized surprisal carried out in the context of sym-
bolic parsers, where the probability of a structure
and particular lexical item can be explicitly disen-
tangled (Demberg and Keller,2008;Roark et al.,
2009). But while past work has demonstrated that
that unlexicalized suprisal from symbolic parsers
correlates with measures of human processing dif-
ficulty (Demberg and Keller,2008), simple recur-
rent neural networks trained to predict sequences
of part-of-speech tags have been shown to track
processing difficulty even more strongly (Frank,
2009), suggesting that even fairly limited syntac-
tic representations like part-of-speech tags can act
as a reasonable proxy of syntactic structure when
modeling human behavior.
To compute LSTM-based syntactic surprisal,
we train the LM with an auxiliary objective—
estimating the likelihood of the next word’s su-
pertag under the Combinatory Categorial Grammar
(CCG) framework (Steedman,1987)—following
Enguehard et al. (2017). Such supertags can be
viewed as enriched part-of-speech tags that encode
syntactic information about how a particular word
can be combined with its local environment. We