Dual Mechanism Priming Effects in Hindi Word Order Sidharth Ranjan IIT Delhi

2025-05-03 0 0 571.46KB 17 页 10玖币

侵权投诉

Dual Mechanism Priming Effects in Hindi Word Order

Sidharth Ranjan

IIT Delhi

sidharth.ranjan03@gmail.com

Marten van Schijndel

Cornell University

mv443@cornell.edu

Sumeet Agarwal

IIT Delhi

sumeet@iitd.ac.in

Rajakrishnan Rajkumar

IISER Bhopal

rajak@iiserb.ac.in

Abstract

Word order choices during sentence produc-

tion can be primed by preceding sentences. In

this work, we test the DUAL MECHANISM hy-

pothesis that priming is driven by multiple dif-

ferent sources. Using a Hindi corpus of text

productions, we model lexical priming with

an n-gram cache model and we capture more

abstract syntactic priming with an adaptive

neural language model. We permute the pre-

verbal constituents of corpus sentences, and

then use a logistic regression model to pre-

dict which sentences actually occurred in the

corpus against artiﬁcially generated meaning-

equivalent variants. Our results indicate that

lexical priming and lexically-independent syn-

tactic priming affect complementary sets of

verb classes. By showing that different prim-

ing inﬂuences are separable from one another,

our results support the hypothesis that mul-

tiple different cognitive mechanisms underlie

priming.

1 Introduction

Gries (2005) deﬁnes syntactic priming as the ten-

dency of speakers “to repeat syntactic structures

they have just encountered (produced or compre-

hended) before”. Starting with Bock (1986), a long

line of experimental and corpus-based work has

provided evidence for this phenomenon in the con-

text of language production (see Reitter et al.,2011,

for a through review). More recently, comprehen-

sion studies have also attested priming effects in a

wide variety of languages (Arai et al.,2007;Too-

ley and Traxler,2010), where prior experience of

a syntactic structure alleviates the comprehension

difﬁculty associated with subsequent similar syn-

tactic structures during reading. The experimental

record also demonstrates that lexical repetition af-

fects syntactic priming (Reitter et al.,2011, and

references therein). According to the DUAL MECH-

ANISM ACCOUNT proposed by Tooley and Traxler

(2010), lexically independent syntactic priming

effects are caused by an implicit learning mecha-

nism (Bock and Grifﬁn,2000;Chang et al.,2006),

whereas lexically dependent priming effects are

caused by a more short-term mechanism, such as

residual activation (Pickering and Branigan,1998).

In the present work, we test this hypothesis of a

dual mechanism of priming by analyzing whether

different kinds of intersentential priming can ac-

count for the word order of different constructions

in Hindi. Our main contribution is that we deploy

precisely deﬁned quantitative cognitive factors in

our statistical models along with minimally paired

alternative productions, whereas most previous ex-

perimental and corpus studies on priming only em-

ploy one or the other.

Hindi has a ﬂexible word order, though SOV

is the canonical order (Kachru,2006). To investi-

gate constituent ordering preferences, we generate

meaning-equivalent grammatical variants of Hindi

sentences by linearizing preverbal constituents of

projective dependency trees of the Hindi-Urdu

Treebank corpus (HUTB; Bhatt et al.,2009) of

written text. We validated the assumptions un-

derlying this method using crowd-sourced human

judgments and compared the performance of our

machine learning model with the choices made

by human subjects. Pioneering studies of Hindi

word order have demonstrated a wide variety of

factors that inﬂuence order preferences, such as in-

formation status (Butt and King,1996;Kidwai,

2000), prosody (Patil et al.,2008), and seman-

tics (Perera and Srivastava,2016;Mohanan and

Mohanan,1994). We incorporated measures of

these baseline inﬂuences into a logistic regression

model to distinguish the original reference sen-

tences from our generated variants. We model

arXiv:2210.13938v1 [cs.CL] 25 Oct 2022

lexical priming with an n-gram cache model and

we capture more abstract syntactic priming with

an adaptive neural language model. Gries (2005)

showed that syntactic priming effects are strongly

contingent on verb class. To this end, we ana-

lyze model behavior on sentences involving the

following verb classes: Levin’s (1993) syntactic-

semantic verb classes, verbs involved in double

object constructions, and conjunct verbs involving

noun-verb complex predicates. To foreshadow our

results, information-theoretic surprisal computed

using our two different models predicts word order

in complementary linguistic contexts over the base-

line predictors. Moreover, for the task of choosing

reference vs variant sentences, the model’s pre-

dicted choices matched the agreement between hu-

man subjects for all of Levin’s verb classes. By

showing that different priming inﬂuences are sep-

arable from one another, our results support the

dual mechanism hypothesis that multiple different

cognitive mechanisms underlie priming.

2 Data

Our data set consists of 1996 reference sentences

containing well-deﬁned subject and object con-

stituents corresponding to the projective depen-

dency trees in HUTB corpus (Bhatt et al.,2009).

The sentences in the HUTB corpus belong to the

newswire domain and contain written text in a nat-

urally occurring context i.e., every sentence in the

news article was situated in the context of pre-

ceding sentences. For each reference sentence in

our data set, we created counterfactual grammati-

cal variants expressing the same truth-conditional

meaning

by permuting the preverbal constituents

whose heads were linked to the root node in the

dependency tree.

Inspired by grammar rules pro-

posed in the NLG literature (Rajkumar and White,

2014), ungrammatical variants were automatically

ﬁltered out by detecting dependency relation se-

quences not attested in the original HUTB corpus.

After ﬁltering, we had 72833 variant sentences for

our classiﬁcation task.

A limitation of this deﬁnition: It does not capture the fact

that, in contrast to marked orders, which necessitate context

for a full interpretation, SOV canonical orders are neutral with

respect to the preceding discourse (Gambhir,1981).

Appendix Aexplains our variant generation procedure in

more detail.

3 Classiﬁcation Task

In order to mitigate the data imbalance between the

two groups (1996 references vs. 72833 variants),

we follow Joachims (2002) by formulating our task

as a pair-wise ranking problem.

w·φ(reference)> w ·φ(variant)(1)

w·(φ(reference)−φ(variant)) >0(2)

The goal of the basic binary classiﬁer model

is shown in Equation 1, where the model learns

a feature weight (

) such that the dot product of

the variant feature vector (

φ(variant)

) with

less than the dot product of

with the reference

feature vector (

φ(reference)

). The same goal can

be written as Equation 2which ensures that

’s

dot product with the difference between the feature

vectors is positive. This transformation alleviates

issues from having dramatically unbalanced class

distributions.

We ﬁrst arranged the references and variants into

ordered pairs (e.g., a reference with two variants

would be paired as (

reference

variant1

) and

(

variant2

reference

)), and then subtracted the

feature vectors of the ﬁrst member of the pair from

the feature vectors of its second member. We then

assigned binary labels to each pair, with reference-

variant pairs coded as “1”, and variant-reference

pairs coded as “0”, thus re-balancing our previ-

ously severely imbalanced classiﬁcation task. Ad-

ditionally, the feature values of sentences with vary-

ing lengths get centered using this technique. Refer

to Rajkumar et al. (2016) and Ranjan et al. (2022b)

for a more detailed illustration.

Using features extracted from the transformed

dataset, we trained a logistic regression model to

predict each reference sentence (see Equation 3).

All the experiments were done with the General-

ized Linear Model (GLM) package in

. Here

choice is encoded by the binary dependent variable

as discussed above (

: reference preference and

variant preference).

choice ∼











δdependency length +

δtrigram surp + δpcfg surp +

δIS score + δlexical repetition surp +

δlstm surp + δadaptive lstm surp

(3)

3.1 Cognitive Theories and Measures

3.1.1 Surprisal Theory

According to the Surprisal Theory (Hale,2001;

Levy,2008), comprehenders build probabilistic

interpretations of phrases based on patterns they

have already seen in sentence structures. Math-

ematically, the surprisal of the

kth

word,

, is

deﬁned as the negative log probability of

given

the preceding context:

Sk=−log P(wk|w1...k−1)(4)

These probabilities, which indicate the informa-

tion load (or predictability) of

, can be calcu-

lated over word sequences or syntactic conﬁgura-

tions. The theory is supported by a large number

of empirical evidences from behavioural as well as

broad-coverage corpus data comprising both com-

prehension (Demberg and Keller,2008;Boston

et al.,2008;Roark et al.,2009;Ranjan et al.,2022b;

Staub,2015;Agrawal et al.,2017) and production

modalities (Demberg et al.,2012;Dammalapati

et al.,2021,2019;Ranjan et al.,2019,2022a;Jain

et al.,2018).

Using the above surprisal framework, we esti-

mate various types of surprisal scores for each test

sentence in our dataset as described below serv-

ing as independent variables in our experiment.

The word-level surprisal of all the words in each

sentence were summed to obtain sentence-level

surprisal measures.

1. Trigram surprisal

: We calculated the local

predictability of each word in a sentence us-

ing a 3-gram language model (LM) trained on

1 million sentences of mixed genre from the

EMILLE Hindi corpus (Baker et al.,2002) us-

ing the SRILM toolbox (Stolcke,2002) with

Good-Turing discounting.

2. PCFG surprisal

: We estimated the syntac-

tic probability of each word in the sentence

using the Berkeley latent-variable PCFG

parser

(Petrov et al.,2006). We created

12000 phrase structure trees by converting

HUTB dependency trees into constituency

trees using the approach described in Yadav

et al. (2017). Subsequently, we used them

5-fold CV parser training and testing F1-score metrics

were 90.82% and 84.95%, respectively.

to train the Berkeley PCFG parser. Sentence

level log-likelihood of each test sentence was

estimated by training a PCFG language model

on four folds of the phrase structure trees and

then testing on a ﬁfth held-out fold.

3. Lexical repetition surprisal

: Following the

method proposed by Kuhn and De Mori

(1990), we estimated cache-based surprisal

of each word in a sentence using SRILM tool-

box by interpolating a 3-gram LM with a

unigram cache LM based on the history of

words (

|H|= 100

) involving the preceding

sentence with a default interpolation weight

parameter (

µ= 0.05

; see Equations 5and

6). The basic idea is to keep track of word to-

kens that appeared recently and then amplify

their likelihood of occurrence in the trigram

word sequence. In other words, the following

sentences are more likely to use words again

that have recently appeared in the text (Kuhn

and De Mori,1990;Clarkson and Robinson,

1997). This way, we account for the lexical

priming effect in sentence processing.

P(wk|w1..k−1) = µ Pcache(wk|w1..k−1)

+(1 −µ)Ptrigram(wk|wk−2, wk−1)(5)

Pcache(wk|w1..k−1) = countH(wk)

|H|(6)

4. LSTM surprisal

: The probabilities of each

word in the sentence were estimated accord-

ing to the entire sentence preﬁx using a long

short-term memory language model (LSTM;

Hochreiter and Schmidhuber,1997) trained

on 1 million sentences of the EMILLE Hindi

corpus. We used the implementation provided

in the neural complexity toolkit

(van Schi-

jndel and Linzen,2018) with default hyper-

parameter settings to estimate surprisal using

an unbounded neural context.

5. Adaptive LSTM surprisal

: Following the

method proposed by van Schijndel and Linzen

(2018), we calculated the discourse-enhanced

surprisal of each word in the sentence. The

cited authors presented a simple way to con-

tinuously adapt a neural LM, and found that

4https://github.com/vansky/neural-complexity

adaptive surprisal considerably outperforms

non-adaptive surprisal at predicting human

reading times. They use a pre-trained LSTM

LM and, after estimating surprisal for a test

sentence, change the LM’s parameters based

on the sentence’s cross-entropy loss. After

that, the revised LM weights are used to pre-

dict the next test sentence. In our work, we

estimated the surprisal scores for each test

sentence using neural complexity toolkit by

adapting our base (non-adaptive) LSTM LM

to one preceding context sentence.

3.1.2 Dependency Locality Theory

Shorter dependencies are typically simpler to pro-

cess than longer ones, according to the Depen-

dency Locality Theory (Gibson,2000), which has

been demonstrated to be effective at predicting the

comprehension difﬁculty of a sequence (Temper-

ley,2007;Futrell et al.,2015;Liu et al.,2017, cf.

Demberg and Keller,2008). Following the work

by Temperley (2008) and Rajkumar et al. (2016),

we calculated sentence-level dependency length by

summing the head-dependent distances (measured

as the number of intervening words) in the depen-

dency trees of reference and variant sentences.

3.1.3 Information Status

Languages generally prefer to mention given refer-

ents, from earlier in the discourse, before introduc-

ing new ones (Clark and Haviland,1977;Chafe,

1976;Kaiser and Trueswell,2004). We assigned

aGiven tag to the subject and object constituents

in a sentence if any content word within them was

mentioned in the preceding sentence or if the head

of the phrase was a pronoun. All other phrases

were tagged as New. For each sentence, IS score

was computed as follows: a) Given-New order =

+1 b) New-Given order = -1 c) Given-Given and

New-New = 0. For illustration, see Appendix B,

which shows how givenness would be coded after

a context sentence.

4 Experiments and Results

We tested the hypothesis that surprisal enhanced

with inter-sentential discourse information (adap-

tive LSTM surprisal) predicts constituent ordering

in Hindi over other baseline cognitive controls,

including information status, dependency length,

lexical repetition, and non-adaptive surprisal. For

our adaptation experiments, we used an adaptive

learning rate of 2 as it minimized the perplexity of

the validation data set (see Table 5in Appendix C).

The Pearson’s correlation coefﬁcients between dif-

ferent predictors are displayed in Figure 2in Ap-

pendix D. The adaptive LSTM surprisal has a high

correlation with all other surprisal features and a

low correlation with dependency length and infor-

mation status score. On speciﬁc verbs of interest,

we report the results of the regression and predic-

tion experiments (using 10-fold cross-validation,

i.e., a model trained on 9 folds was used to generate

predictions on the remaining fold). A prediction

experiment using feature ablation helped ascertain

the impact of syntactic priming independent of lex-

ical repetition effects. We conducted a ﬁne-grained

verb-speciﬁc analysis of priming patterns on con-

junct verbs and Levin’s syntactic-semantic classes,

followed by a targeted human evaluation of Levin’s

verb classes.

4.1 Verb-Speciﬁc Priming

Individual verb biases are well known to inﬂuence

structural choices during language production (Fer-

reira and Schotter,2013;Thothathiri et al.,2017;

Yi et al.,2019) and priming effects are also contin-

gent on speciﬁc verbs (Gries,2005). Therefore, we

grouped Hindi verbs based on Levin’s syntactico-

semantic classes using the heuristics proposed by

Begum and Sharma (2017). Then we analyzed the

efﬁcacy of adaptive surprisal at classifying refer-

ence and variant instances of Levin’s verb classes

(still training the classiﬁer on the full training parti-

tion for each fold). Our results (Table 1, top block)

indicate that the GIVE verb class was susceptible

to priming, with adaptive surprisal producing a

signiﬁcant improvement of 0.12% in classiﬁcation

accuracy (p = 0.01 using McNemar’s two-tailed

test) over the baseline model. The regression coef-

ﬁcients pertaining to Levin’s GIVE verb classes are

presented in Table 6in Appendix E. Other Levin

verb frames did not show syntactic priming.

Our results align with previous work in the prim-

ing literature that shows GIVE to be especially sus-

ceptible to priming, thus providing cross-linguistic

support to verb-based priming effects (Pickering

and Branigan,1998;Gries,2005;Bock,1986). The

GIVE verb class in our data set includes different

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DualMechanismPrimingEffectsinHindiWordOrderSidharthRanjanIITDelhisidharth.ranjan03@gmail.comMartenvanSchijndelCornellUniversitymv443@cornell.eduSumeetAgarwalIITDelhisumeet@iitd.ac.inRajakrishnanRajkumarIISERBhopalrajak@iiserb.ac.inAbstractWordorderchoicesduringsentenceproduc-tioncanbeprimedbyprecedi...

展开>> 收起<<

Dual Mechanism Priming Effects in Hindi Word Order Sidharth Ranjan IIT Delhi.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Dual Mechanism Priming Effects in Hindi Word Order Sidharth Ranjan IIT Delhi

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: