CEFR-Based Sentence Difﬁculty Annotation and Assessment Yuki Araseyand Satoru Uchidaand Tomoyuki Kajiwara yGraduate School of Information Science and Technology Osaka University Japan

2025-04-30 0 0 508.13KB 14 页 10玖币

侵权投诉

CEFR-Based Sentence Difﬁculty Annotation and Assessment

Yuki Arase†and Satoru Uchida?and Tomoyuki Kajiwara

†Graduate School of Information Science and Technology, Osaka University, Japan

?Faculty of Languages and Cultures, Kyushu University, Japan

Graduate School of Science and Engineering, Ehime University, Japan

arase@ist.osaka-u.ac.jp, uchida@flc.kyushu-u.ac.jp

kajiwara@cs.ehime-u.ac.jp

Abstract

Controllable text simpliﬁcation is a crucial as-

sistive technique for language learning and

teaching. One of the primary factors hinder-

ing its advancement is the lack of a corpus an-

notated with sentence difﬁculty levels based

on language ability descriptions. To address

this problem, we created the CEFR-based Sen-

tence Proﬁle (CEFR-SP) corpus, containing

17k English sentences annotated with the lev-

els based on the Common European Frame-

work of Reference for Languages assigned

by English-education professionals. In addi-

tion, we propose a sentence-level assessment

model to handle unbalanced level distribution

because the most basic and highly proﬁcient

sentences are naturally scarce. In the exper-

iments in this study, our method achieved a

macro-F1score of 84.5% in the level assess-

ment, thus outperforming strong baselines em-

ployed in readability assessment.

1 Introduction

Controllable text simpliﬁcation, ﬁrst proposed by

Scarton and Specia (2018), is the automatic rewrit-

ing of sentences to make them comprehensible to

a target audience with a speciﬁc proﬁciency level.

Among its primary applications are providing read-

ing assistance to language learners and helping

teachers adjust the difﬁculty level of their teaching

materials (Petersen and Ostendorf,2007;Pellow

and Eskenazi,2014;Paetzold,2016). The ﬁne-

grained control of output levels to match the lin-

guistic ability of the readership is crucial for these

educational applications.

While readability assessments have been actively

studied (e.g., in (Vajjala Balakrishna,2015;Meng

et al.,2020;Deutsch et al.,2020)), linking read-

ability to language ability is difﬁcult. Readabil-

ity scores, such as the Flesch–Kincaid grade level

(Kincaid et al.,1975), are intended for native speak-

ers, not for language learners to whom very differ-

ent considerations apply. Pilán et al. (2014) and

Ozasa et al. (2007) revealed that readability met-

rics designed for L

do not apply to L

learners.

Furthermore, readability deﬁnitions use documents

rather than sentences, which are required by text

simpliﬁcation at the sentence-level, as their unit.

The lack of a corpus annotated by sentence dif-

ﬁculty level hinders the advancement of control-

lable text simpliﬁcation. Previous studies (Scarton

and Specia,2018;Nishihara et al.,2019;Agrawal

et al.,2021) necessarily used corpora annotated

for readability rather than difﬁculty; furthermore,

they assumed that all sentences in a document had

the same readability (i.e., the document level in

Newsela (Xu et al.,2015)).

To solve these problems, we created a large-scale

English corpus annotated by sentence difﬁculty lev-

els based on the Common European Framework of

Reference for Languages (CEFR),

the most widely

used international standard describing learners’ lan-

guage ability. Our CEFR-based Sentence Proﬁle

(CEFR-SP) corpus adapts CEFR to sentence levels.

A sentence is categorised as a certain level if a per-

son with the corresponding CEFR-level can readily

understand it. CEFR-SP provides CEFR levels for

k sentences annotated by professionals with rich

experience teaching English in higher education.

A major challenge in sentence-level assessment

is the unbalanced distribution of levels: sentences

at the basic (A

) and highly proﬁcient (C

) levels

are naturally scarce. To handle this, we propose a

sentence-level assessment model with a macro-F

score of

84.5%

. We designed a metric-based clas-

siﬁcation method with a simple inductive bias that

avoids overﬁtting to majority classes (Vinyals et al.,

2016;Snell et al.,2017). Our method generates

embeddings representing each CEFR-level and esti-

mates a sentence’s level based on its cosine similar-

ity to these embeddings. Empirical results conﬁrm

that our method effectively copes with unbalanced

1https://www.coe.int/en/web/

common-european-framework-reference-languages

arXiv:2210.11766v1 [cs.CL] 21 Oct 2022

label distribution and outperforms the strong base-

lines employed in readability assessments.

This study makes two main contributions. First,

we present the largest corpus to date of sentences

annotated according to established language abil-

ity indicators. Second, we propose a sentence-

level assessment model to handle unbalanced label

distribution. CEFR-SP and sentence-level assess-

ment codes are available

for future research at

https://github.com/yukiar/CEFR-SP.

2 Related Work

Related studies have assessed text levels on differ-

ent granularity (document and sentence) and level

deﬁnitions (readability/complexity and CEFR).

2.1 Document-based Readability

Previous studies have assessed readability and cre-

ated corpora with document readability annota-

tions. WeeBit (Vajjala and Meurers,2012), the

OneStopEnglish corpus (Vajjala and Luˇ

ci´

c,2018),

and Newsela provide manually written documents

for various readability levels. Working with these

annotated corpora, previous studies have used vari-

ous linguistic and psycholinguistic features to de-

velop models for assessing document-based read-

ability (Heilman et al.,2007;Kate et al.,2010;

Vajjala and Meurers,2012;Xia et al.,2016;Vaj-

jala and Luˇ

ci´

c,2018). Neural network-based ap-

proaches have proven to be better than feature-

based models (Azpiazu and Pera,2019;Meng et al.,

2020;Imperial,2021;Martinc et al.,2021). In

particular, Deutsch et al. (2020) showed that pre-

trained language models outperform feature-based

approaches, and the combination of linguistic fea-

tures plays no role in performance gains.

2.2 Sentence-based Readability

Previous studies annotated sentences’ complexities

based on crowd workers’ subjective perceptions.

Stajner et al. (2017) used a

-point scale to rate the

complexity of sentences written by humans or gen-

erated by text simpliﬁcation models. Brunato et al.

(2018) used a

-point scale for sentences extracted

from the news sections of treebanks (McDonald

et al.,2013). However, as Section 3.4 conﬁrms, re-

lating complexity to language ability descriptions is

challenging. Naderi et al. (2019) annotated German

sentence complexity based on language learners’

The licenses of the data sources are detailed in Ethics

Statement section.

subjective judgements. In contrast, the CEFR-level

of a sentence should be judged objectively based

on the understanding of language learners’ skills.

Hence, we presume that a sentence CEFR-level

can be judged only by language education profes-

sionals based on their teaching experience. For

sentence-based readability assessments, previous

studies regarded all sentences in a document to

have the same readability (Collins-Thompson and

Callan,2004;Dell’Orletta et al.,2011;Vajjala and

Meurers,2014;Ambati et al.,2016;Howcroft and

Demberg,2017). As we show in Section 3.4, this

assumption hardly holds.

The simplicity of a sentence is one of the primary

aspects in a text simpliﬁcation evaluation, which

is commonly judged by human. There are a few

corpora annotated by the sentence simplicity for

automatic quality estimation of text simpliﬁcation

(Štajner et al.,2016;Alva-Manchego et al.,2021).

Nakamachi et al. (2020) applied a pretrained lan-

guage model for estimating the sentence simplic-

ity and used it to reward a reinforcement learn-

ing–based text simpliﬁcation model. The sentence

simplicity is distinctive from CEFR levels based

on the established language ability descriptions.

2.3 CEFR-based Text Levels

Attempts have been made to establish criteria for

CEFR-level assessments. For example, the English

Proﬁle (Salamoura and Saville,2010) and CEFR-J

(Ishii and Tono,2018) projects relate English vo-

cabulary and grammar to CEFR levels based on

learner-written’ and textbook corpora. Tools such

as Text Inspector

and CVLA (Uchida and Negishi,

2018) endeavour to measure the level of English

reading passages automatically. Xia et al. (2016)

collected reading passages from Cambridge En-

glish Exams and predicted their CEFR levels using

features proposed to assess readability. Rama and

Vajjala (2021) demonstrated that Bidirectional En-

coder Representations from Transformers (BERT)

(Devlin et al.,2019) consistently achieved high ac-

curacy for multilingual CEFR-level classiﬁcation.

Although these micro- (i.e., vocabulary and

grammar) and macro-level (i.e., passage-level) ap-

proaches have proven useful, few attempts have

been made to assign CEFR levels at the sentence

level, despite its importance in learning and teach-

ing. Pilán et al. (2014) conducted a sentence-level

assessment for Swedish based on CEFR; however,

3https://textinspector.com/

they regarded document-based levels as sentence

levels. Furthermore, their level assessment was as

coarse as predicting either above B1or not.

3 CEFR-SP Corpus

This section describes the design of the annota-

tion procedure and discusses sentence-level pro-

ﬁles. CEFR describes language ability on a

-point

scale: A

indicates the proﬁciency of beginners;

, B

, C

, and C

indicates mastery of a

language at the basic (A), independent (B), and

proﬁcient (C) levels. Because CEFR is skill-based,

each level is deﬁned by ‘can-do’ descriptors in-

dicating what learners can do,

CEFR levels for

sentences cannot be deﬁned directly.

Therefore, we used a bottom-up approach, as-

signing CEFR levels to sentences based on the ‘can-

do’ descriptors of reading skills under the deﬁnition

that a sentence is, for example, at A1 level if it can

be readily understood by A1-level learners. We hy-

pothesise that with sufﬁcient teaching experience

and CEFR knowledge, it is possible to objectively

determine at which level a learner can understand

each sentence. We therefore carefully selected an-

notators with sufﬁcient expertise through pilot and

trial sessions.

3.1 Annotation Procedure

Pilot Study

A pilot study was conducted to ver-

ify the hypothesis that sufﬁcient teaching experi-

ence and CEFR knowledge will allow an objec-

tive evaluation of sentence levels. We recruited

participants with three levels of expertise to label

228

sample sentences: an English-language educa-

tion specialist with

years of teaching experience

in higher education, a graduate student majoring

in English education who is familiar with CEFR,

and a group of three graduate students with var-

ious majors (natural language processing and or-

nithology) and no prior knowledge of CEFR or

English-teaching experience. The results showed

that the second expert had a high agreement rate

with the ﬁrst senior expert (Pearson correlation co-

efﬁcient

0.74

), whereas the members of the third

group agreed less often with the senior expert (Pear-

son correlation coefﬁcients:

0.45

0.50

, and

0.59

These results conﬁrm that annotators with consider-

able experience and knowledge agree on the judge-

ment of the CEFR levels of sentences.

4https://rm.coe.int/CoERMPublicCommonSearchServices/

DisplayDCTMContent?documentId=090000168045bb52

Annotation Guidelines

The annotators were fa-

miliarised with the annotation guidelines before

beginning their work. The guidelines described

the scales and ‘can-do’ descriptions of CEFR read-

ing skills with example sentences of each level

that were assessed by the expert. Importantly, the

guidelines required the annotators to judge each

sentence’s level based on their English-teaching

experience. Annotators were allowed to look in

a dictionary to establish word levels but were in-

structed not to determine a sentence’s level solely

based on the levels of the words it contained.

Annotator Selection

For formal annotation, we

recruited eight annotators with diversiﬁed English-

teaching experience. We then conducted a trial

session in which the annotators were asked to label

100

samples extracted from the target corpora of

formal annotation. These samples were labelled by

the senior expert in the pilot study as references.

Pearson correlation coefﬁcients against the expert

ranged from

0.59

0.77

, roughly correlating with

the participants’ experience in English-teaching

in terms of duration (years of teaching) and role

(as private tutor or teacher in higher education).

We ﬁnally selected two having high agreement

rates (Pearson correlation coefﬁcients:

0.75

and

0.73

) and small average level-assignment differ-

ences (

0.11

and

0.22

) compared to the expert.

The annotation guidelines were ﬁnalised to provide

example sentences with corresponding CEFR lev-

els on which multiple annotators had agreed in the

pilot and trial sessions.

3.2 Sentence Selection

Sentences were drawn from Newsela-Auto, Wiki-

Auto, and the Sentence Corpus of Remedial En-

glish (SCoRE). Newsela-Auto and Wiki-Auto, cre-

ated by Jiang et al. (2020), are speciﬁcally used for

text simpliﬁcation.

SCoRE (Chujo et al.,2015)

was created for computer-assisted English learn-

ing, particularly for second language learners with

lower-level proﬁciency. The sentences in SCoRE

were carefully written by native English speakers,

understanding the educational goals of each proﬁ-

ciency level; they include A-level sentences, which

are scarce in text simpliﬁcation corpora.

5CEFR levels were converted into a 6-point scale.

With the plan of expanding CEFR-SP to a parallel cor-

pus in the future, we included parallel sentences. Note that

our data-split policy (Section 5.1) ensures that highly similar

sentences do NOT appear in training and validation/test sets.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CEFR-BasedSentenceDifcultyAnnotationandAssessmentYukiAraseyandSatoruUchida?andTomoyukiKajiwarayGraduateSchoolofInformationScienceandTechnology,OsakaUniversity,Japan?FacultyofLanguagesandCultures,KyushuUniversity,JapanGraduateSchoolofScienceandEngineering,EhimeUniversity,Japanarase@ist.osaka-u.ac....

展开>> 收起<<

CEFR-Based Sentence Difﬁculty Annotation and Assessment Yuki Araseyand Satoru Uchidaand Tomoyuki Kajiwara yGraduate School of Information Science and Technology Osaka University Japan.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

CEFR-Based Sentence Difﬁculty Annotation and Assessment Yuki Araseyand Satoru Uchidaand Tomoyuki Kajiwara yGraduate School of Information Science and Technology Osaka University Japan

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: