The rapid digitization of educational resources opens up opportunities to adopt artificial intelligence
(AI) to automate the process of MCQ construction. A substantial number of questions already exist
in a digital format, thus providing the required data as a first step toward building AI systems. The
automation of MCQ construction could support both teachers and learners. Teachers could benefit
from an increased efficiency in creating questions, in their already high workload. Students’ learning
experience could improve due to increased practice opportunities based on automatically generated
exercises, and if these systems are sufficiently accurate, they could power personalized learning [
41
].
A crucial step in MCQ creation is the generation of distractors [
39
]. Distractors are incorrect options
that are related to the answer to some degree. The quality of an MCQ heavily depends on the
quality of distractors [
12
]. If the distractors do not sufficiently challenge learners, picking the correct
answer becomes easy, ultimately degrading the discriminative power of the question. The automatic
suggestion of distractors will be the focus of this paper.
Several works have already proposed distractor generation techniques for automatic MCQ creation,
mostly based on selecting distractors according to their similarity to the correct answer. In general,
two approaches are used to measure the similarity between distractors and an answer: graph-based and
corpus-based methods. Graph-based approaches use the semantic distance between concepts in the
graph as a similarity measure. In language learning applications, typically WordNet [
46
,
54
] is used
to generate distractors, while for factoid questions domain-specific (ontologies) are used to generate
distractors [
51
,
16
,
34
,
2
]. In corpus-based methods, similarity between distractors and answers has
been defined as having similar frequency count [
11
], belonging to the same POS class [
20
], having a
high co-occurrence likelihood [
25
], having similar phonetic and morphological features [
54
], and
being nearby in embedding spaces [
31
,
22
,
26
]. Other works such as [
39
,
35
,
36
,
38
] use machine
learning models to generate distractors by using a combination of the previous features and other
types of information such as tf-idf scores.
While the current state-of-the-art in MCQ creation is promising, we see a number of limitations.
First of all, existing models are often domain specific. Indeed, the proposed techniques are tailored
to the application and distractor types. In language learning, such as vocabulary, grammar or tense
usage exercises, typically similarity based on basic syntactic and statistical information works well:
frequency, POS information, etc. In other domains, such as science, health, history, geography, etc.,
distractors should be selected on deeper understanding of context and semantics, and the current
methods fail to capture such information.
The second limitation, language dependency, is especially applicable to factoids. Models should be
agnostic to language because facts do not change with languages. Moreover, building a new model
for each language could be daunting task as it would require enough training data for each language.
In this work, we study how the automatic retrieval of distractors can facilitate the efficient construction
of MCQs. We use a high-quality large dataset of question, answer, distractor triples that are diverse in
terms of language, domain, and type of questions. Our dataset was made available by a commercial
organization active in the field of e-assessment (see Section 3.2), and is therefore representative
for the educational domain, with a total of 62k MCQ, none of them identical, encompassing only
92k different answers and distractors. Despite an average of 2.4 distractors per question, there is a
large reuse of distractors over different questions. This motivates our premise to retrieve and reuse
distractors for new questions. We make use of the latest data-driven Natural Language Processing
(NLP) techniques to retrieve candidate distractors. We propose context-aware multilingual models
that are based on deep neural network models that select distractors by taking into account the context
of the question. They are also able to handle variety of distractors in terms of length and type. We
compare our proposed models to a competitive feature-based baseline that is based on classical
machine learning methods trained on several handcrafted features.
The methods are evaluated for distractor quality using automated metrics and a real-world user
test with teachers. Both the automatic evaluation and the user study with teachers indicate that the
proposed context-aware methods outperform the feature-based baseline. Our contribution can be
summarized as follows:
•
We built three multilingual Transformer-based distractor retrieval models that suggest distrac-
tors to teachers for multiple subjects in different languages. The first model (Section 3.4.3)
requires similar distractors to have similar semantic representations, while the second (Sec-
2