CEFR-Based Sentence Difficulty Annotation and Assessment Yuki Araseyand Satoru Uchidaand Tomoyuki Kajiwara yGraduate School of Information Science and Technology Osaka University Japan

2025-04-30 0 0 508.13KB 14 页 10玖币
侵权投诉
CEFR-Based Sentence Difficulty Annotation and Assessment
Yuki Araseand Satoru Uchida?and Tomoyuki Kajiwara
Graduate School of Information Science and Technology, Osaka University, Japan
?Faculty of Languages and Cultures, Kyushu University, Japan
Graduate School of Science and Engineering, Ehime University, Japan
arase@ist.osaka-u.ac.jp, uchida@flc.kyushu-u.ac.jp
kajiwara@cs.ehime-u.ac.jp
Abstract
Controllable text simplification is a crucial as-
sistive technique for language learning and
teaching. One of the primary factors hinder-
ing its advancement is the lack of a corpus an-
notated with sentence difficulty levels based
on language ability descriptions. To address
this problem, we created the CEFR-based Sen-
tence Profile (CEFR-SP) corpus, containing
17k English sentences annotated with the lev-
els based on the Common European Frame-
work of Reference for Languages assigned
by English-education professionals. In addi-
tion, we propose a sentence-level assessment
model to handle unbalanced level distribution
because the most basic and highly proficient
sentences are naturally scarce. In the exper-
iments in this study, our method achieved a
macro-F1score of 84.5% in the level assess-
ment, thus outperforming strong baselines em-
ployed in readability assessment.
1 Introduction
Controllable text simplification, first proposed by
Scarton and Specia (2018), is the automatic rewrit-
ing of sentences to make them comprehensible to
a target audience with a specific proficiency level.
Among its primary applications are providing read-
ing assistance to language learners and helping
teachers adjust the difficulty level of their teaching
materials (Petersen and Ostendorf,2007;Pellow
and Eskenazi,2014;Paetzold,2016). The fine-
grained control of output levels to match the lin-
guistic ability of the readership is crucial for these
educational applications.
While readability assessments have been actively
studied (e.g., in (Vajjala Balakrishna,2015;Meng
et al.,2020;Deutsch et al.,2020)), linking read-
ability to language ability is difficult. Readabil-
ity scores, such as the Flesch–Kincaid grade level
(Kincaid et al.,1975), are intended for native speak-
ers, not for language learners to whom very differ-
ent considerations apply. Pilán et al. (2014) and
Ozasa et al. (2007) revealed that readability met-
rics designed for L
1
do not apply to L
2
learners.
Furthermore, readability definitions use documents
rather than sentences, which are required by text
simplification at the sentence-level, as their unit.
The lack of a corpus annotated by sentence dif-
ficulty level hinders the advancement of control-
lable text simplification. Previous studies (Scarton
and Specia,2018;Nishihara et al.,2019;Agrawal
et al.,2021) necessarily used corpora annotated
for readability rather than difficulty; furthermore,
they assumed that all sentences in a document had
the same readability (i.e., the document level in
Newsela (Xu et al.,2015)).
To solve these problems, we created a large-scale
English corpus annotated by sentence difficulty lev-
els based on the Common European Framework of
Reference for Languages (CEFR),
1
the most widely
used international standard describing learners’ lan-
guage ability. Our CEFR-based Sentence Profile
(CEFR-SP) corpus adapts CEFR to sentence levels.
A sentence is categorised as a certain level if a per-
son with the corresponding CEFR-level can readily
understand it. CEFR-SP provides CEFR levels for
17
k sentences annotated by professionals with rich
experience teaching English in higher education.
A major challenge in sentence-level assessment
is the unbalanced distribution of levels: sentences
at the basic (A
1
) and highly proficient (C
2
) levels
are naturally scarce. To handle this, we propose a
sentence-level assessment model with a macro-F
1
score of
84.5%
. We designed a metric-based clas-
sification method with a simple inductive bias that
avoids overfitting to majority classes (Vinyals et al.,
2016;Snell et al.,2017). Our method generates
embeddings representing each CEFR-level and esti-
mates a sentence’s level based on its cosine similar-
ity to these embeddings. Empirical results confirm
that our method effectively copes with unbalanced
1https://www.coe.int/en/web/
common-european-framework-reference-languages
arXiv:2210.11766v1 [cs.CL] 21 Oct 2022
label distribution and outperforms the strong base-
lines employed in readability assessments.
This study makes two main contributions. First,
we present the largest corpus to date of sentences
annotated according to established language abil-
ity indicators. Second, we propose a sentence-
level assessment model to handle unbalanced label
distribution. CEFR-SP and sentence-level assess-
ment codes are available
2
for future research at
https://github.com/yukiar/CEFR-SP.
2 Related Work
Related studies have assessed text levels on differ-
ent granularity (document and sentence) and level
definitions (readability/complexity and CEFR).
2.1 Document-based Readability
Previous studies have assessed readability and cre-
ated corpora with document readability annota-
tions. WeeBit (Vajjala and Meurers,2012), the
OneStopEnglish corpus (Vajjala and Luˇ
ci´
c,2018),
and Newsela provide manually written documents
for various readability levels. Working with these
annotated corpora, previous studies have used vari-
ous linguistic and psycholinguistic features to de-
velop models for assessing document-based read-
ability (Heilman et al.,2007;Kate et al.,2010;
Vajjala and Meurers,2012;Xia et al.,2016;Vaj-
jala and Luˇ
ci´
c,2018). Neural network-based ap-
proaches have proven to be better than feature-
based models (Azpiazu and Pera,2019;Meng et al.,
2020;Imperial,2021;Martinc et al.,2021). In
particular, Deutsch et al. (2020) showed that pre-
trained language models outperform feature-based
approaches, and the combination of linguistic fea-
tures plays no role in performance gains.
2.2 Sentence-based Readability
Previous studies annotated sentences’ complexities
based on crowd workers’ subjective perceptions.
Stajner et al. (2017) used a
5
-point scale to rate the
complexity of sentences written by humans or gen-
erated by text simplification models. Brunato et al.
(2018) used a
7
-point scale for sentences extracted
from the news sections of treebanks (McDonald
et al.,2013). However, as Section 3.4 confirms, re-
lating complexity to language ability descriptions is
challenging. Naderi et al. (2019) annotated German
sentence complexity based on language learners’
2
The licenses of the data sources are detailed in Ethics
Statement section.
subjective judgements. In contrast, the CEFR-level
of a sentence should be judged objectively based
on the understanding of language learners’ skills.
Hence, we presume that a sentence CEFR-level
can be judged only by language education profes-
sionals based on their teaching experience. For
sentence-based readability assessments, previous
studies regarded all sentences in a document to
have the same readability (Collins-Thompson and
Callan,2004;Dell’Orletta et al.,2011;Vajjala and
Meurers,2014;Ambati et al.,2016;Howcroft and
Demberg,2017). As we show in Section 3.4, this
assumption hardly holds.
The simplicity of a sentence is one of the primary
aspects in a text simplification evaluation, which
is commonly judged by human. There are a few
corpora annotated by the sentence simplicity for
automatic quality estimation of text simplification
(Štajner et al.,2016;Alva-Manchego et al.,2021).
Nakamachi et al. (2020) applied a pretrained lan-
guage model for estimating the sentence simplic-
ity and used it to reward a reinforcement learn-
ing–based text simplification model. The sentence
simplicity is distinctive from CEFR levels based
on the established language ability descriptions.
2.3 CEFR-based Text Levels
Attempts have been made to establish criteria for
CEFR-level assessments. For example, the English
Profile (Salamoura and Saville,2010) and CEFR-J
(Ishii and Tono,2018) projects relate English vo-
cabulary and grammar to CEFR levels based on
learner-written’ and textbook corpora. Tools such
as Text Inspector
3
and CVLA (Uchida and Negishi,
2018) endeavour to measure the level of English
reading passages automatically. Xia et al. (2016)
collected reading passages from Cambridge En-
glish Exams and predicted their CEFR levels using
features proposed to assess readability. Rama and
Vajjala (2021) demonstrated that Bidirectional En-
coder Representations from Transformers (BERT)
(Devlin et al.,2019) consistently achieved high ac-
curacy for multilingual CEFR-level classification.
Although these micro- (i.e., vocabulary and
grammar) and macro-level (i.e., passage-level) ap-
proaches have proven useful, few attempts have
been made to assign CEFR levels at the sentence
level, despite its importance in learning and teach-
ing. Pilán et al. (2014) conducted a sentence-level
assessment for Swedish based on CEFR; however,
3https://textinspector.com/
they regarded document-based levels as sentence
levels. Furthermore, their level assessment was as
coarse as predicting either above B1or not.
3 CEFR-SP Corpus
This section describes the design of the annota-
tion procedure and discusses sentence-level pro-
files. CEFR describes language ability on a
6
-point
scale: A
1
indicates the proficiency of beginners;
A
2
, B
1
, B
2
, C
1
, and C
2
indicates mastery of a
language at the basic (A), independent (B), and
proficient (C) levels. Because CEFR is skill-based,
each level is defined by ‘can-do’ descriptors in-
dicating what learners can do,
4
CEFR levels for
sentences cannot be defined directly.
Therefore, we used a bottom-up approach, as-
signing CEFR levels to sentences based on the ‘can-
do’ descriptors of reading skills under the definition
that a sentence is, for example, at A1 level if it can
be readily understood by A1-level learners. We hy-
pothesise that with sufficient teaching experience
and CEFR knowledge, it is possible to objectively
determine at which level a learner can understand
each sentence. We therefore carefully selected an-
notators with sufficient expertise through pilot and
trial sessions.
3.1 Annotation Procedure
Pilot Study
A pilot study was conducted to ver-
ify the hypothesis that sufficient teaching experi-
ence and CEFR knowledge will allow an objec-
tive evaluation of sentence levels. We recruited
participants with three levels of expertise to label
228
sample sentences: an English-language educa-
tion specialist with
12
years of teaching experience
in higher education, a graduate student majoring
in English education who is familiar with CEFR,
and a group of three graduate students with var-
ious majors (natural language processing and or-
nithology) and no prior knowledge of CEFR or
English-teaching experience. The results showed
that the second expert had a high agreement rate
with the first senior expert (Pearson correlation co-
efficient
0.74
), whereas the members of the third
group agreed less often with the senior expert (Pear-
son correlation coefficients:
0.45
,
0.50
, and
0.59
).
These results confirm that annotators with consider-
able experience and knowledge agree on the judge-
ment of the CEFR levels of sentences.
4https://rm.coe.int/CoERMPublicCommonSearchServices/
DisplayDCTMContent?documentId=090000168045bb52
Annotation Guidelines
The annotators were fa-
miliarised with the annotation guidelines before
beginning their work. The guidelines described
the scales and ‘can-do’ descriptions of CEFR read-
ing skills with example sentences of each level
that were assessed by the expert. Importantly, the
guidelines required the annotators to judge each
sentence’s level based on their English-teaching
experience. Annotators were allowed to look in
a dictionary to establish word levels but were in-
structed not to determine a sentence’s level solely
based on the levels of the words it contained.
Annotator Selection
For formal annotation, we
recruited eight annotators with diversified English-
teaching experience. We then conducted a trial
session in which the annotators were asked to label
100
samples extracted from the target corpora of
formal annotation. These samples were labelled by
the senior expert in the pilot study as references.
Pearson correlation coefficients against the expert
ranged from
0.59
to
0.77
, roughly correlating with
the participants’ experience in English-teaching
in terms of duration (years of teaching) and role
(as private tutor or teacher in higher education).
We finally selected two having high agreement
rates (Pearson correlation coefficients:
0.75
and
0.73
) and small average level-assignment differ-
ences (
0.11
and
0.22
) compared to the expert.
5
The annotation guidelines were finalised to provide
example sentences with corresponding CEFR lev-
els on which multiple annotators had agreed in the
pilot and trial sessions.
3.2 Sentence Selection
Sentences were drawn from Newsela-Auto, Wiki-
Auto, and the Sentence Corpus of Remedial En-
glish (SCoRE). Newsela-Auto and Wiki-Auto, cre-
ated by Jiang et al. (2020), are specifically used for
text simplification.
6
SCoRE (Chujo et al.,2015)
was created for computer-assisted English learn-
ing, particularly for second language learners with
lower-level proficiency. The sentences in SCoRE
were carefully written by native English speakers,
understanding the educational goals of each profi-
ciency level; they include A-level sentences, which
are scarce in text simplification corpora.
5CEFR levels were converted into a 6-point scale.
6
With the plan of expanding CEFR-SP to a parallel cor-
pus in the future, we included parallel sentences. Note that
our data-split policy (Section 5.1) ensures that highly similar
sentences do NOT appear in training and validation/test sets.
摘要:

CEFR-BasedSentenceDifcultyAnnotationandAssessmentYukiAraseyandSatoruUchida?andTomoyukiKajiwarayGraduateSchoolofInformationScienceandTechnology,OsakaUniversity,Japan?FacultyofLanguagesandCultures,KyushuUniversity,JapanGraduateSchoolofScienceandEngineering,EhimeUniversity,Japanarase@ist.osaka-u.ac....

展开>> 收起<<
CEFR-Based Sentence Difficulty Annotation and Assessment Yuki Araseyand Satoru Uchidaand Tomoyuki Kajiwara yGraduate School of Information Science and Technology Osaka University Japan.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:508.13KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注