adaptive surprisal considerably outperforms
non-adaptive surprisal at predicting human
reading times. They use a pre-trained LSTM
LM and, after estimating surprisal for a test
sentence, change the LM’s parameters based
on the sentence’s cross-entropy loss. After
that, the revised LM weights are used to pre-
dict the next test sentence. In our work, we
estimated the surprisal scores for each test
sentence using neural complexity toolkit by
adapting our base (non-adaptive) LSTM LM
to one preceding context sentence.
3.1.2 Dependency Locality Theory
Shorter dependencies are typically simpler to pro-
cess than longer ones, according to the Depen-
dency Locality Theory (Gibson,2000), which has
been demonstrated to be effective at predicting the
comprehension difficulty of a sequence (Temper-
ley,2007;Futrell et al.,2015;Liu et al.,2017, cf.
Demberg and Keller,2008). Following the work
by Temperley (2008) and Rajkumar et al. (2016),
we calculated sentence-level dependency length by
summing the head-dependent distances (measured
as the number of intervening words) in the depen-
dency trees of reference and variant sentences.
3.1.3 Information Status
Languages generally prefer to mention given refer-
ents, from earlier in the discourse, before introduc-
ing new ones (Clark and Haviland,1977;Chafe,
1976;Kaiser and Trueswell,2004). We assigned
aGiven tag to the subject and object constituents
in a sentence if any content word within them was
mentioned in the preceding sentence or if the head
of the phrase was a pronoun. All other phrases
were tagged as New. For each sentence, IS score
was computed as follows: a) Given-New order =
+1 b) New-Given order = -1 c) Given-Given and
New-New = 0. For illustration, see Appendix B,
which shows how givenness would be coded after
a context sentence.
4 Experiments and Results
We tested the hypothesis that surprisal enhanced
with inter-sentential discourse information (adap-
tive LSTM surprisal) predicts constituent ordering
in Hindi over other baseline cognitive controls,
including information status, dependency length,
lexical repetition, and non-adaptive surprisal. For
our adaptation experiments, we used an adaptive
learning rate of 2 as it minimized the perplexity of
the validation data set (see Table 5in Appendix C).
The Pearson’s correlation coefficients between dif-
ferent predictors are displayed in Figure 2in Ap-
pendix D. The adaptive LSTM surprisal has a high
correlation with all other surprisal features and a
low correlation with dependency length and infor-
mation status score. On specific verbs of interest,
we report the results of the regression and predic-
tion experiments (using 10-fold cross-validation,
i.e., a model trained on 9 folds was used to generate
predictions on the remaining fold). A prediction
experiment using feature ablation helped ascertain
the impact of syntactic priming independent of lex-
ical repetition effects. We conducted a fine-grained
verb-specific analysis of priming patterns on con-
junct verbs and Levin’s syntactic-semantic classes,
followed by a targeted human evaluation of Levin’s
verb classes.
4.1 Verb-Specific Priming
Individual verb biases are well known to influence
structural choices during language production (Fer-
reira and Schotter,2013;Thothathiri et al.,2017;
Yi et al.,2019) and priming effects are also contin-
gent on specific verbs (Gries,2005). Therefore, we
grouped Hindi verbs based on Levin’s syntactico-
semantic classes using the heuristics proposed by
Begum and Sharma (2017). Then we analyzed the
efficacy of adaptive surprisal at classifying refer-
ence and variant instances of Levin’s verb classes
(still training the classifier on the full training parti-
tion for each fold). Our results (Table 1, top block)
indicate that the GIVE verb class was susceptible
to priming, with adaptive surprisal producing a
significant improvement of 0.12% in classification
accuracy (p = 0.01 using McNemar’s two-tailed
test) over the baseline model. The regression coef-
ficients pertaining to Levin’s GIVE verb classes are
presented in Table 6in Appendix E. Other Levin
verb frames did not show syntactic priming.
Our results align with previous work in the prim-
ing literature that shows GIVE to be especially sus-
ceptible to priming, thus providing cross-linguistic
support to verb-based priming effects (Pickering
and Branigan,1998;Gries,2005;Bock,1986). The
GIVE verb class in our data set includes different