Dual Mechanism Priming Effects in Hindi Word Order Sidharth Ranjan IIT Delhi

2025-05-03 0 0 571.46KB 17 页 10玖币
侵权投诉
Dual Mechanism Priming Effects in Hindi Word Order
Sidharth Ranjan
IIT Delhi
sidharth.ranjan03@gmail.com
Marten van Schijndel
Cornell University
mv443@cornell.edu
Sumeet Agarwal
IIT Delhi
sumeet@iitd.ac.in
Rajakrishnan Rajkumar
IISER Bhopal
rajak@iiserb.ac.in
Abstract
Word order choices during sentence produc-
tion can be primed by preceding sentences. In
this work, we test the DUAL MECHANISM hy-
pothesis that priming is driven by multiple dif-
ferent sources. Using a Hindi corpus of text
productions, we model lexical priming with
an n-gram cache model and we capture more
abstract syntactic priming with an adaptive
neural language model. We permute the pre-
verbal constituents of corpus sentences, and
then use a logistic regression model to pre-
dict which sentences actually occurred in the
corpus against artificially generated meaning-
equivalent variants. Our results indicate that
lexical priming and lexically-independent syn-
tactic priming affect complementary sets of
verb classes. By showing that different prim-
ing influences are separable from one another,
our results support the hypothesis that mul-
tiple different cognitive mechanisms underlie
priming.
1 Introduction
Gries (2005) defines syntactic priming as the ten-
dency of speakers “to repeat syntactic structures
they have just encountered (produced or compre-
hended) before”. Starting with Bock (1986), a long
line of experimental and corpus-based work has
provided evidence for this phenomenon in the con-
text of language production (see Reitter et al.,2011,
for a through review). More recently, comprehen-
sion studies have also attested priming effects in a
wide variety of languages (Arai et al.,2007;Too-
ley and Traxler,2010), where prior experience of
a syntactic structure alleviates the comprehension
difficulty associated with subsequent similar syn-
tactic structures during reading. The experimental
record also demonstrates that lexical repetition af-
fects syntactic priming (Reitter et al.,2011, and
references therein). According to the DUAL MECH-
ANISM ACCOUNT proposed by Tooley and Traxler
(2010), lexically independent syntactic priming
effects are caused by an implicit learning mecha-
nism (Bock and Griffin,2000;Chang et al.,2006),
whereas lexically dependent priming effects are
caused by a more short-term mechanism, such as
residual activation (Pickering and Branigan,1998).
In the present work, we test this hypothesis of a
dual mechanism of priming by analyzing whether
different kinds of intersentential priming can ac-
count for the word order of different constructions
in Hindi. Our main contribution is that we deploy
precisely defined quantitative cognitive factors in
our statistical models along with minimally paired
alternative productions, whereas most previous ex-
perimental and corpus studies on priming only em-
ploy one or the other.
Hindi has a flexible word order, though SOV
is the canonical order (Kachru,2006). To investi-
gate constituent ordering preferences, we generate
meaning-equivalent grammatical variants of Hindi
sentences by linearizing preverbal constituents of
projective dependency trees of the Hindi-Urdu
Treebank corpus (HUTB; Bhatt et al.,2009) of
written text. We validated the assumptions un-
derlying this method using crowd-sourced human
judgments and compared the performance of our
machine learning model with the choices made
by human subjects. Pioneering studies of Hindi
word order have demonstrated a wide variety of
factors that influence order preferences, such as in-
formation status (Butt and King,1996;Kidwai,
2000), prosody (Patil et al.,2008), and seman-
tics (Perera and Srivastava,2016;Mohanan and
Mohanan,1994). We incorporated measures of
these baseline influences into a logistic regression
model to distinguish the original reference sen-
tences from our generated variants. We model
arXiv:2210.13938v1 [cs.CL] 25 Oct 2022
lexical priming with an n-gram cache model and
we capture more abstract syntactic priming with
an adaptive neural language model. Gries (2005)
showed that syntactic priming effects are strongly
contingent on verb class. To this end, we ana-
lyze model behavior on sentences involving the
following verb classes: Levin’s (1993) syntactic-
semantic verb classes, verbs involved in double
object constructions, and conjunct verbs involving
noun-verb complex predicates. To foreshadow our
results, information-theoretic surprisal computed
using our two different models predicts word order
in complementary linguistic contexts over the base-
line predictors. Moreover, for the task of choosing
reference vs variant sentences, the model’s pre-
dicted choices matched the agreement between hu-
man subjects for all of Levin’s verb classes. By
showing that different priming influences are sep-
arable from one another, our results support the
dual mechanism hypothesis that multiple different
cognitive mechanisms underlie priming.
2 Data
Our data set consists of 1996 reference sentences
containing well-defined subject and object con-
stituents corresponding to the projective depen-
dency trees in HUTB corpus (Bhatt et al.,2009).
The sentences in the HUTB corpus belong to the
newswire domain and contain written text in a nat-
urally occurring context i.e., every sentence in the
news article was situated in the context of pre-
ceding sentences. For each reference sentence in
our data set, we created counterfactual grammati-
cal variants expressing the same truth-conditional
meaning
1
by permuting the preverbal constituents
whose heads were linked to the root node in the
dependency tree.
2
Inspired by grammar rules pro-
posed in the NLG literature (Rajkumar and White,
2014), ungrammatical variants were automatically
filtered out by detecting dependency relation se-
quences not attested in the original HUTB corpus.
After filtering, we had 72833 variant sentences for
our classification task.
1
A limitation of this definition: It does not capture the fact
that, in contrast to marked orders, which necessitate context
for a full interpretation, SOV canonical orders are neutral with
respect to the preceding discourse (Gambhir,1981).
2
Appendix Aexplains our variant generation procedure in
more detail.
3 Classification Task
In order to mitigate the data imbalance between the
two groups (1996 references vs. 72833 variants),
we follow Joachims (2002) by formulating our task
as a pair-wise ranking problem.
w·φ(reference)> w ·φ(variant)(1)
w·(φ(reference)φ(variant)) >0(2)
The goal of the basic binary classifier model
is shown in Equation 1, where the model learns
a feature weight (
w
) such that the dot product of
the variant feature vector (
φ(variant)
) with
w
is
less than the dot product of
w
with the reference
feature vector (
φ(reference)
). The same goal can
be written as Equation 2which ensures that
w
s
dot product with the difference between the feature
vectors is positive. This transformation alleviates
issues from having dramatically unbalanced class
distributions.
We first arranged the references and variants into
ordered pairs (e.g., a reference with two variants
would be paired as (
reference
,
variant1
) and
(
variant2
,
reference
)), and then subtracted the
feature vectors of the first member of the pair from
the feature vectors of its second member. We then
assigned binary labels to each pair, with reference-
variant pairs coded as “1”, and variant-reference
pairs coded as “0”, thus re-balancing our previ-
ously severely imbalanced classification task. Ad-
ditionally, the feature values of sentences with vary-
ing lengths get centered using this technique. Refer
to Rajkumar et al. (2016) and Ranjan et al. (2022b)
for a more detailed illustration.
Using features extracted from the transformed
dataset, we trained a logistic regression model to
predict each reference sentence (see Equation 3).
All the experiments were done with the General-
ized Linear Model (GLM) package in
R
. Here
choice is encoded by the binary dependent variable
as discussed above (
1
: reference preference and
0
:
variant preference).
choice
δdependency length +
δtrigram surp + δpcfg surp +
δIS score + δlexical repetition surp +
δlstm surp + δadaptive lstm surp
(3)
3.1 Cognitive Theories and Measures
3.1.1 Surprisal Theory
According to the Surprisal Theory (Hale,2001;
Levy,2008), comprehenders build probabilistic
interpretations of phrases based on patterns they
have already seen in sentence structures. Math-
ematically, the surprisal of the
kth
word,
wk
, is
defined as the negative log probability of
wk
given
the preceding context:
Sk=log P(wk|w1...k1)(4)
These probabilities, which indicate the informa-
tion load (or predictability) of
wk
, can be calcu-
lated over word sequences or syntactic configura-
tions. The theory is supported by a large number
of empirical evidences from behavioural as well as
broad-coverage corpus data comprising both com-
prehension (Demberg and Keller,2008;Boston
et al.,2008;Roark et al.,2009;Ranjan et al.,2022b;
Staub,2015;Agrawal et al.,2017) and production
modalities (Demberg et al.,2012;Dammalapati
et al.,2021,2019;Ranjan et al.,2019,2022a;Jain
et al.,2018).
Using the above surprisal framework, we esti-
mate various types of surprisal scores for each test
sentence in our dataset as described below serv-
ing as independent variables in our experiment.
The word-level surprisal of all the words in each
sentence were summed to obtain sentence-level
surprisal measures.
1. Trigram surprisal
: We calculated the local
predictability of each word in a sentence us-
ing a 3-gram language model (LM) trained on
1 million sentences of mixed genre from the
EMILLE Hindi corpus (Baker et al.,2002) us-
ing the SRILM toolbox (Stolcke,2002) with
Good-Turing discounting.
2. PCFG surprisal
: We estimated the syntac-
tic probability of each word in the sentence
using the Berkeley latent-variable PCFG
parser
3
(Petrov et al.,2006). We created
12000 phrase structure trees by converting
HUTB dependency trees into constituency
trees using the approach described in Yadav
et al. (2017). Subsequently, we used them
3
5-fold CV parser training and testing F1-score metrics
were 90.82% and 84.95%, respectively.
to train the Berkeley PCFG parser. Sentence
level log-likelihood of each test sentence was
estimated by training a PCFG language model
on four folds of the phrase structure trees and
then testing on a fifth held-out fold.
3. Lexical repetition surprisal
: Following the
method proposed by Kuhn and De Mori
(1990), we estimated cache-based surprisal
of each word in a sentence using SRILM tool-
box by interpolating a 3-gram LM with a
unigram cache LM based on the history of
words (
|H|= 100
) involving the preceding
sentence with a default interpolation weight
parameter (
µ= 0.05
; see Equations 5and
6). The basic idea is to keep track of word to-
kens that appeared recently and then amplify
their likelihood of occurrence in the trigram
word sequence. In other words, the following
sentences are more likely to use words again
that have recently appeared in the text (Kuhn
and De Mori,1990;Clarkson and Robinson,
1997). This way, we account for the lexical
priming effect in sentence processing.
P(wk|w1..k1) = µ Pcache(wk|w1..k1)
+(1 µ)Ptrigram(wk|wk2, wk1)(5)
Pcache(wk|w1..k1) = countH(wk)
|H|(6)
4. LSTM surprisal
: The probabilities of each
word in the sentence were estimated accord-
ing to the entire sentence prefix using a long
short-term memory language model (LSTM;
Hochreiter and Schmidhuber,1997) trained
on 1 million sentences of the EMILLE Hindi
corpus. We used the implementation provided
in the neural complexity toolkit
4
(van Schi-
jndel and Linzen,2018) with default hyper-
parameter settings to estimate surprisal using
an unbounded neural context.
5. Adaptive LSTM surprisal
: Following the
method proposed by van Schijndel and Linzen
(2018), we calculated the discourse-enhanced
surprisal of each word in the sentence. The
cited authors presented a simple way to con-
tinuously adapt a neural LM, and found that
4https://github.com/vansky/neural-complexity
adaptive surprisal considerably outperforms
non-adaptive surprisal at predicting human
reading times. They use a pre-trained LSTM
LM and, after estimating surprisal for a test
sentence, change the LM’s parameters based
on the sentence’s cross-entropy loss. After
that, the revised LM weights are used to pre-
dict the next test sentence. In our work, we
estimated the surprisal scores for each test
sentence using neural complexity toolkit by
adapting our base (non-adaptive) LSTM LM
to one preceding context sentence.
3.1.2 Dependency Locality Theory
Shorter dependencies are typically simpler to pro-
cess than longer ones, according to the Depen-
dency Locality Theory (Gibson,2000), which has
been demonstrated to be effective at predicting the
comprehension difficulty of a sequence (Temper-
ley,2007;Futrell et al.,2015;Liu et al.,2017, cf.
Demberg and Keller,2008). Following the work
by Temperley (2008) and Rajkumar et al. (2016),
we calculated sentence-level dependency length by
summing the head-dependent distances (measured
as the number of intervening words) in the depen-
dency trees of reference and variant sentences.
3.1.3 Information Status
Languages generally prefer to mention given refer-
ents, from earlier in the discourse, before introduc-
ing new ones (Clark and Haviland,1977;Chafe,
1976;Kaiser and Trueswell,2004). We assigned
aGiven tag to the subject and object constituents
in a sentence if any content word within them was
mentioned in the preceding sentence or if the head
of the phrase was a pronoun. All other phrases
were tagged as New. For each sentence, IS score
was computed as follows: a) Given-New order =
+1 b) New-Given order = -1 c) Given-Given and
New-New = 0. For illustration, see Appendix B,
which shows how givenness would be coded after
a context sentence.
4 Experiments and Results
We tested the hypothesis that surprisal enhanced
with inter-sentential discourse information (adap-
tive LSTM surprisal) predicts constituent ordering
in Hindi over other baseline cognitive controls,
including information status, dependency length,
lexical repetition, and non-adaptive surprisal. For
our adaptation experiments, we used an adaptive
learning rate of 2 as it minimized the perplexity of
the validation data set (see Table 5in Appendix C).
The Pearson’s correlation coefficients between dif-
ferent predictors are displayed in Figure 2in Ap-
pendix D. The adaptive LSTM surprisal has a high
correlation with all other surprisal features and a
low correlation with dependency length and infor-
mation status score. On specific verbs of interest,
we report the results of the regression and predic-
tion experiments (using 10-fold cross-validation,
i.e., a model trained on 9 folds was used to generate
predictions on the remaining fold). A prediction
experiment using feature ablation helped ascertain
the impact of syntactic priming independent of lex-
ical repetition effects. We conducted a fine-grained
verb-specific analysis of priming patterns on con-
junct verbs and Levin’s syntactic-semantic classes,
followed by a targeted human evaluation of Levin’s
verb classes.
4.1 Verb-Specific Priming
Individual verb biases are well known to influence
structural choices during language production (Fer-
reira and Schotter,2013;Thothathiri et al.,2017;
Yi et al.,2019) and priming effects are also contin-
gent on specific verbs (Gries,2005). Therefore, we
grouped Hindi verbs based on Levins syntactico-
semantic classes using the heuristics proposed by
Begum and Sharma (2017). Then we analyzed the
efficacy of adaptive surprisal at classifying refer-
ence and variant instances of Levin’s verb classes
(still training the classifier on the full training parti-
tion for each fold). Our results (Table 1, top block)
indicate that the GIVE verb class was susceptible
to priming, with adaptive surprisal producing a
significant improvement of 0.12% in classification
accuracy (p = 0.01 using McNemar’s two-tailed
test) over the baseline model. The regression coef-
ficients pertaining to Levins GIVE verb classes are
presented in Table 6in Appendix E. Other Levin
verb frames did not show syntactic priming.
Our results align with previous work in the prim-
ing literature that shows GIVE to be especially sus-
ceptible to priming, thus providing cross-linguistic
support to verb-based priming effects (Pickering
and Branigan,1998;Gries,2005;Bock,1986). The
GIVE verb class in our data set includes different
摘要:

DualMechanismPrimingEffectsinHindiWordOrderSidharthRanjanIITDelhisidharth.ranjan03@gmail.comMartenvanSchijndelCornellUniversitymv443@cornell.eduSumeetAgarwalIITDelhisumeet@iitd.ac.inRajakrishnanRajkumarIISERBhopalrajak@iiserb.ac.inAbstractWordorderchoicesduringsentenceproduc-tioncanbeprimedbyprecedi...

展开>> 收起<<
Dual Mechanism Priming Effects in Hindi Word Order Sidharth Ranjan IIT Delhi.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:571.46KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注