
discrimination. Based on finding obtained, we found that six
o exercises were classified as poor exercises that could be
improved. In a sizable public institution, a CS2 course was
taught using OpenDSA as the main eTextbook [1]. A module
in OpenDSA represents one topic or portion of a typical lec-
ture, such as one sorting algorithm, and is regarded as the most
elementary functional unit for OpenDSA materials [2]. There
is a range of various exercises in each module. One of these
exercises requires the student to manipulate a data structure
to demonstrate how an algorithm affects it. These are known
as ”Proficiency Exercises” (PE). PE exercises were developed
and utilized for the first time in the TRAKLA2 system [9]. The
other type of exercise is the Simple questions, which include
various types of system questions such as true/false, multiple-
choice, and short-answer questions. OpenDSA made utilized
the exercise framework from Khan Academy (KA) [10] to
save and present Simple questions.
II. RELATED WORK
In [11], the responses of 372 students who registered in
one first-year undergraduate course were utilized to evaluate
the quality of 100 MCQs written by an instructor that was
used in an undergraduate midterm and final exam. In order to
compute item difficulty, discrimination, and chance properties
they applied Classical test theory and IRT analysis models.
The two-Parameter logistic (2PL) model consistently had the
best fit to the data, they discovered. According to the analyses,
higher education institutions need to guarantee that MCQs are
evaluated before student grading decisions are made. In an
introductory programming course, IRT was applied to assess
students’ coding ability [12]. They developed a 1PL Rasch
model using the coding scores of the students. Their findings
revealed that students with prior knowledge performed sta-
tistically much better than students with no prior knowledge.
In order to analyze the questions for the midterm exam for
an introductory computer science course, the authors of [13]
utilized IRT. The purpose of this study was to study questions’
item characteristic curves in order to enhance the assessment
for future semesters. The authors applied IRT for problem
selection and recommendation in ITS. To automatically select
problems, the authors created a model using a combination of
collaborative filtering and IRT [14].
III. EXPERIMENTAL ANALYSIS
Students make many interactions during their dealing with
the eTextbook, every student interaction represents a log, and
all student logs are stored in the OpenDSA system. OpenDSA
contains different types of interactions. Interactions are divided
into two types the first one is only interactions with the eText-
book itself, such as loading a page, reloading a page, clicking
on a link to go somewhere else, or viewing slideshows. The
second type is the interactions with all types of eTextbook
exercises such that attempts to answer any exercise, submit
an answer, or request a hint. This study focused more on
the second type. In [2] a more description of interactions
and exercise types.The amount of questions in each exercise
varies.In this work, during the fall of 2020, we analyzed data of
students who were enrolled in the CS2 course. There are about
303,800 logs that represent the interactions of students with
the eTextbook. These logs contain the name and description of
the action, the time of the interactions, and which module the
student executed the interactions on it. As for the interactions
of students with the exercises, we analysed about 200,000
logs .every log consists of time in which a student interacted
with the question,total seconds in which a student finished
interacting with a question,total count of hints that the student
requested when interacting with a question,Total counts of
attempts for student attempts to a question, and The type
of request to a question (attempt or hint).In [15], different
measures were applied in order to determine the difficult
topics in a CS3 course. These measures are correct attempt
ratio (r), difficulty level (dl), students’ hint usage (hr), and
incorrect answer ratio (it). We computed these measures for
every exercise. In the next subsections, we will talk about
them.
A. Analysis of the ratios of right answers
We aim to give a value to every exercise in terms of “relative
difficulty”. Our aim is to find which exercises average-ability
students find comparatively difficult. From this, we intend to
learn which themes are the most difficult for students. As a
result, maybe lead us to refocus the instructional efforts. In
the OpenDSA, students can answer an exercise as many times
as they want until they get it correct. This will result in most
students receiving almost full marks for their exercises[15].
Among the vulnerabilities, as is typical with online courseware
that most students exploit is that some exercises can be
”gamed” [16], In OpenDSA means that in order for students
to get a question instance which easy to solve they reload the
current page repeatedly. Due to the previous reasons, we have
not counted the number of students who completed an exercise
correctly. Instead, we employed other definitions for difficulty.
To measure the exercise difficulty, we looked at the ratio of
correct to incorrect answers in OpenDSA exercises, such that
the correct attempt ratio for difficult exercises should be lower.
We utilized the fraction r to evaluate student performance.
r=#of correct attempts
#of total attempts (1)
We calculate the difficulty level (dl) for each exercise, such
that
dl =1−Pn
ir(i)
N(2)
The number of students is referred to as N, and the ratio of
correct attempts is referred to as r. In [15] ,the same measure
was used to identify the difficult topics in CS3 course.In [17],
similar measures was utilized to rate the difficulty of exer-
cises, the authors utilized “the number of attempts it takes a
student to figure out the right answer once making their initial
mistake” as a metric of how difficult a logic exercises are. To
determine the workout difficulty for an ITS, history of attempts