Every word counts A multilingual analysis of individual human alignment with model attention Stephanie Brandl

2025-04-27 0 0 1.15MB 6 页 10玖币
侵权投诉
Every word counts: A multilingual analysis of individual human
alignment with model attention
Stephanie Brandl
Department of Computer Science
University of Copenhagen
brandl@di.ku.dk
Nora Hollenstein
Center for Language Technology
University of Copenhagen
nora.hollenstein@hum.ku.dk
Abstract
Human fixation patterns have been shown to
correlate strongly with Transformer-based at-
tention. Those correlation analyses are usu-
ally carried out without taking into account in-
dividual differences between participants and
are mostly done on monolingual datasets mak-
ing it difficult to generalise findings. In this pa-
per, we analyse eye-tracking data from speak-
ers of 13 different languages reading both in
their native language (L1) and in English as
language learners (L2). We find considerable
differences between languages but also that in-
dividual reading behaviour such as skipping
rate, total reading time and vocabulary knowl-
edge (LexTALE) influence the alignment be-
tween humans and models to an extent that
should be considered in future studies.
1 Introduction
Recent research has shown that relative impor-
tance metrics in neural language models correlate
strongly with human attention, i.e., fixation dura-
tions extracted from eye-tracking recordings during
reading (Morger et al.,2022;Eberle et al.,2022;
Bensemann et al.,2022;Hollenstein and Beinborn,
2021;Sood et al.,2020). This approach serves as
an interpretability tool and helps to quantify the
cognitive plausibility of language models. How-
ever, what drives these correlations in terms of dif-
ferences between individual readers has not been
investigated.
In this short paper, we approach this by analysing
(i) differences in correlation between machine at-
tention and human relative fixation duration across
languages, (ii) differences within the same lan-
guage across datasets, text domains and native
speakers of different languages, (iii) differences
between native speakers (L1) and second language
learners (L2), (iv) the influence of syntactic proper-
ties such as part-of-speech tags, and (v) the influ-
ence of individual differences in demographics, i.e.,
age, vocabulary knowledge, depth of processing.
Taking into account individual and subgroup
differences in future research, will encourage
single-subject and cross-subject evaluation scenar-
ios which will not only improve the generalization
capabilities of ML models but also allow for adapt-
able and personalized technologies, including appli-
cations in language learning, reading development
or assistive communication technology. Addition-
ally, understanding computational language models
from the perspectives of different user groups can
lead to increased fairness and transparency in NLP
applications.
Contributions
We quantify the individual differ-
ences in human alignment with Transformer-based
attention in a correlation study where we com-
pare relative fixation duration from native speakers
of 13 different languages on the MECO corpus
(Siegelman et al.,2022;Kuperman et al.,2022) to
first layer attention extracted from mBERT (De-
vlin et al.,2019), XLM-R (Conneau et al.,2020)
and mT5 (Xue et al.,2021), pre-trained multilin-
gual language models. We carry out this correla-
tion analysis on the participants’ respective native
languages (L1) and data from an English experi-
ment (L2) of the same participants. We analyse
the influence of processing depth, i.e., quantifying
the thoroughness of reading through the readers’
skipping behaviour, part-of-speech (POS) tags, and
vocabulary knowledge in the form of LexTALE
scores on the correlation values. Finally, we com-
pare correlations to data from the GECO corpus,
which contains English (L1 and L2) and Dutch (L1)
eye-tracking data (Cop et al.,2017).
The results show that (i) the correlation varies
greatly across languages, (ii) L1 reading data cor-
relates less with neural attention than L2 data, (iii)
generally, in-depth reading leads to higher correla-
tion than shallow processing. Our code is avail-
able at
github.com/stephaniebrandl/
eyetracking-subgroups.
arXiv:2210.04963v1 [cs.CL] 5 Oct 2022
2 Related Work
Multilingual eye-tracking
Brysbaert (2019)
found differences in word per minute rates during
reading across different languages and proficiency
levels. That eye-tracking data contains language-
specific information is also concluded by Berzak
et al. (2017), who showed that eye-tracking fea-
tures can be used to determine a reader’s native
language based on English text.
Individual differences
The neglection of indi-
vidual differences is a well-known issue in cogni-
tive science, which leads to theories that support
a misleading picture of an idealised human cog-
nition that is largely invariant across individuals
(Levinson,2012). Kidd et al. (2018) pointed out
that the extent to which human sentence processing
is affected by individual differences is most likely
underestimated since psycholinguistic experiments
almost exclusively focus on a homogeneous sub-
sample of the human population (Henrich et al.,
2010).
Along the same lines, when using cognitive sig-
nals in NLP, most often the data is aggregated
across all participants (Hollenstein et al.,2020;
Klerke and Plank,2019). While there is some evi-
dence showing that this leads to more robust results
regarding model performance, it also disregards
differences between subgroups of readers.
Eye-tracking prediction and correlation in
NLP
State-of-the-art word embeddings are
highly correlated with eye-tracking metrics (Hol-
lenstein et al.,2019;Salicchi et al.,2021). Hollen-
stein et al. (2021) showed that multilingual mod-
els can predict a range of eye-tracking features
across different languages. This implies that Trans-
former language models are able to extract cogni-
tive processing information from human signals
in a supervised way. Moreover, relative impor-
tance metrics in neural language models correlate
strongly with human attention, i.e., fixation dura-
tions extracted from eye-tracking recordings during
reading (Morger et al.,2022;Eberle et al.,2022;
Bensemann et al.,2022;Hollenstein and Beinborn,
2021;Sood et al.,2020).
3 Method
We analyse the Spearman correlation coefficients
between first layer attention in a multilingual lan-
guage model and relative fixation durations ex-
tracted from a large multilingual eye-tracking cor-
pus, including 13 languages (Siegelman et al.,
2022;Kuperman et al.,2022) as described below.
Total fixation time (TRT) per word is divided by
the sum over all TRTs in the respective sentence
to compute relative fixation duration for individual
participants, similar to Hollenstein and Beinborn
(2021).
We extract first layer attention for each word
from mBERT
1
, XLM-R
2
and mT5
3
, all three are
multilingual pre-trained language models. We then
average across heads. We also test gradient-based
saliency and attention flow, which show similar
correlations but require substantially higher com-
putational cost. This is in line with findings in
Morger et al. (2022).
Eye-tracking Data
The L1 part of the MECO
corpus contains data from native speakers read-
ing 12 short encyclopedic-style texts (89-120 sen-
tences) in their own languages
4
(parallel texts and
similar texts of the same topics in all languages),
while the L2 part contains data from the same
participants of different native languages reading
12 English texts (91 sentences, also encyclopedic-
style). For each part, the complete texts were
shown on multiple line on a single screen and
the participants read naturally without any time
limit. Furthermore, language-specific LexTALE
tests have been carried out for several languages in
the L1 experiments and the English version for all
participants in the L2 experiment. LexTALE is a
fast and efficient test of vocabulary knowledge for
medium to highly proficient speakers (Lemhöfer
and Broersma,2012).
For comparison, we also run the experiments on
the GECO corpus (Cop et al.,2017), which con-
tains eye-tracking data from English and Dutch na-
tive speakers reading an entire novel in their native
language (L1, 4921/4285 sentences, respectively),
as well as a part where the Dutch speakers read
English text (L2, 4521 sentences). The text was
presented on the screen in paragraphs for natural
unpaced reading.
1https://huggingface.co/
bert-base-multilingual-cased
2https://huggingface.co/
xlm-roberta-base
3https://huggingface.co/google/
mt5-base
4
The languages in MECO L1 include: Dutch (nl), English
(en), Estonian (et), Finnish (fi), German (de), Greek (el), He-
brew (he), Italian (it), Korean (ko), Norwegian (no), Russian
(ru), Spanish (es) and Turkish (tr).
摘要:

Everywordcounts:AmultilingualanalysisofindividualhumanalignmentwithmodelattentionStephanieBrandlDepartmentofComputerScienceUniversityofCopenhagenbrandl@di.ku.dkNoraHollensteinCenterforLanguageTechnologyUniversityofCopenhagennora.hollenstein@hum.ku.dkAbstractHumanxationpatternshavebeenshowntocorrela...

展开>> 收起<<
Every word counts A multilingual analysis of individual human alignment with model attention Stephanie Brandl.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:1.15MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注