Every word counts A multilingual analysis of individual human alignment with model attention Stephanie Brandl

2025-04-27 0 0 1.15MB 6 页 10玖币

侵权投诉

Every word counts: A multilingual analysis of individual human

alignment with model attention

Stephanie Brandl

Department of Computer Science

University of Copenhagen

brandl@di.ku.dk

Nora Hollenstein

Center for Language Technology

University of Copenhagen

nora.hollenstein@hum.ku.dk

Abstract

Human ﬁxation patterns have been shown to

correlate strongly with Transformer-based at-

tention. Those correlation analyses are usu-

ally carried out without taking into account in-

dividual differences between participants and

are mostly done on monolingual datasets mak-

ing it difﬁcult to generalise ﬁndings. In this pa-

per, we analyse eye-tracking data from speak-

ers of 13 different languages reading both in

their native language (L1) and in English as

language learners (L2). We ﬁnd considerable

differences between languages but also that in-

dividual reading behaviour such as skipping

rate, total reading time and vocabulary knowl-

edge (LexTALE) inﬂuence the alignment be-

tween humans and models to an extent that

should be considered in future studies.

1 Introduction

Recent research has shown that relative impor-

tance metrics in neural language models correlate

strongly with human attention, i.e., ﬁxation dura-

tions extracted from eye-tracking recordings during

reading (Morger et al.,2022;Eberle et al.,2022;

Bensemann et al.,2022;Hollenstein and Beinborn,

2021;Sood et al.,2020). This approach serves as

an interpretability tool and helps to quantify the

cognitive plausibility of language models. How-

ever, what drives these correlations in terms of dif-

ferences between individual readers has not been

investigated.

In this short paper, we approach this by analysing

(i) differences in correlation between machine at-

tention and human relative ﬁxation duration across

languages, (ii) differences within the same lan-

guage across datasets, text domains and native

speakers of different languages, (iii) differences

between native speakers (L1) and second language

learners (L2), (iv) the inﬂuence of syntactic proper-

ties such as part-of-speech tags, and (v) the inﬂu-

ence of individual differences in demographics, i.e.,

age, vocabulary knowledge, depth of processing.

Taking into account individual and subgroup

differences in future research, will encourage

single-subject and cross-subject evaluation scenar-

ios which will not only improve the generalization

capabilities of ML models but also allow for adapt-

able and personalized technologies, including appli-

cations in language learning, reading development

or assistive communication technology. Addition-

ally, understanding computational language models

from the perspectives of different user groups can

lead to increased fairness and transparency in NLP

applications.

Contributions

We quantify the individual differ-

ences in human alignment with Transformer-based

attention in a correlation study where we com-

pare relative ﬁxation duration from native speakers

of 13 different languages on the MECO corpus

(Siegelman et al.,2022;Kuperman et al.,2022) to

ﬁrst layer attention extracted from mBERT (De-

vlin et al.,2019), XLM-R (Conneau et al.,2020)

and mT5 (Xue et al.,2021), pre-trained multilin-

gual language models. We carry out this correla-

tion analysis on the participants’ respective native

languages (L1) and data from an English experi-

ment (L2) of the same participants. We analyse

the inﬂuence of processing depth, i.e., quantifying

the thoroughness of reading through the readers’

skipping behaviour, part-of-speech (POS) tags, and

vocabulary knowledge in the form of LexTALE

scores on the correlation values. Finally, we com-

pare correlations to data from the GECO corpus,

which contains English (L1 and L2) and Dutch (L1)

eye-tracking data (Cop et al.,2017).

The results show that (i) the correlation varies

greatly across languages, (ii) L1 reading data cor-

relates less with neural attention than L2 data, (iii)

generally, in-depth reading leads to higher correla-

tion than shallow processing. Our code is avail-

able at

github.com/stephaniebrandl/

eyetracking-subgroups.

arXiv:2210.04963v1 [cs.CL] 5 Oct 2022

2 Related Work

Multilingual eye-tracking

Brysbaert (2019)

found differences in word per minute rates during

reading across different languages and proﬁciency

levels. That eye-tracking data contains language-

speciﬁc information is also concluded by Berzak

et al. (2017), who showed that eye-tracking fea-

tures can be used to determine a reader’s native

language based on English text.

Individual differences

The neglection of indi-

vidual differences is a well-known issue in cogni-

tive science, which leads to theories that support

a misleading picture of an idealised human cog-

nition that is largely invariant across individuals

(Levinson,2012). Kidd et al. (2018) pointed out

that the extent to which human sentence processing

is affected by individual differences is most likely

underestimated since psycholinguistic experiments

almost exclusively focus on a homogeneous sub-

sample of the human population (Henrich et al.,

2010).

Along the same lines, when using cognitive sig-

nals in NLP, most often the data is aggregated

across all participants (Hollenstein et al.,2020;

Klerke and Plank,2019). While there is some evi-

dence showing that this leads to more robust results

regarding model performance, it also disregards

differences between subgroups of readers.

Eye-tracking prediction and correlation in

NLP

State-of-the-art word embeddings are

highly correlated with eye-tracking metrics (Hol-

lenstein et al.,2019;Salicchi et al.,2021). Hollen-

stein et al. (2021) showed that multilingual mod-

els can predict a range of eye-tracking features

across different languages. This implies that Trans-

former language models are able to extract cogni-

tive processing information from human signals

in a supervised way. Moreover, relative impor-

tance metrics in neural language models correlate

strongly with human attention, i.e., ﬁxation dura-

tions extracted from eye-tracking recordings during

reading (Morger et al.,2022;Eberle et al.,2022;

Bensemann et al.,2022;Hollenstein and Beinborn,

2021;Sood et al.,2020).

3 Method

We analyse the Spearman correlation coefﬁcients

between ﬁrst layer attention in a multilingual lan-

guage model and relative ﬁxation durations ex-

tracted from a large multilingual eye-tracking cor-

pus, including 13 languages (Siegelman et al.,

2022;Kuperman et al.,2022) as described below.

Total ﬁxation time (TRT) per word is divided by

the sum over all TRTs in the respective sentence

to compute relative ﬁxation duration for individual

participants, similar to Hollenstein and Beinborn

(2021).

We extract ﬁrst layer attention for each word

from mBERT

, XLM-R

and mT5

, all three are

multilingual pre-trained language models. We then

average across heads. We also test gradient-based

saliency and attention ﬂow, which show similar

correlations but require substantially higher com-

putational cost. This is in line with ﬁndings in

Morger et al. (2022).

Eye-tracking Data

The L1 part of the MECO

corpus contains data from native speakers read-

ing 12 short encyclopedic-style texts (89-120 sen-

tences) in their own languages

(parallel texts and

similar texts of the same topics in all languages),

while the L2 part contains data from the same

participants of different native languages reading

12 English texts (91 sentences, also encyclopedic-

style). For each part, the complete texts were

shown on multiple line on a single screen and

the participants read naturally without any time

limit. Furthermore, language-speciﬁc LexTALE

tests have been carried out for several languages in

the L1 experiments and the English version for all

participants in the L2 experiment. LexTALE is a

fast and efﬁcient test of vocabulary knowledge for

medium to highly proﬁcient speakers (Lemhöfer

and Broersma,2012).

For comparison, we also run the experiments on

the GECO corpus (Cop et al.,2017), which con-

tains eye-tracking data from English and Dutch na-

tive speakers reading an entire novel in their native

language (L1, 4921/4285 sentences, respectively),

as well as a part where the Dutch speakers read

English text (L2, 4521 sentences). The text was

presented on the screen in paragraphs for natural

unpaced reading.

1https://huggingface.co/

bert-base-multilingual-cased

2https://huggingface.co/

xlm-roberta-base

3https://huggingface.co/google/

mt5-base

The languages in MECO L1 include: Dutch (nl), English

(en), Estonian (et), Finnish (ﬁ), German (de), Greek (el), He-

brew (he), Italian (it), Korean (ko), Norwegian (no), Russian

(ru), Spanish (es) and Turkish (tr).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Everywordcounts:AmultilingualanalysisofindividualhumanalignmentwithmodelattentionStephanieBrandlDepartmentofComputerScienceUniversityofCopenhagenbrandl@di.ku.dkNoraHollensteinCenterforLanguageTechnologyUniversityofCopenhagennora.hollenstein@hum.ku.dkAbstractHumanxationpatternshavebeenshowntocorrela...

展开>> 收起<<

Every word counts A multilingual analysis of individual human alignment with model attention Stephanie Brandl.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Every word counts A multilingual analysis of individual human alignment with model attention Stephanie Brandl

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: