Course-Prerequisite Networks for Analyzing and Understanding Academic Curricula Pavlos Stavrinides1and Konstantin Zuev1 1Department of Computing and Mathematical Sciences

2025-05-01 0 0 1.25MB 25 页 10玖币
侵权投诉
Course-Prerequisite Networks for Analyzing and Understanding Academic Curricula
Pavlos Stavrinides1and Konstantin Zuev1,
1Department of Computing and Mathematical Sciences,
California Institute of Technology, Pasadena, CA 91125, USA
Understanding a complex system of relationships between courses is of great importance for the
university’s educational mission. This paper is dedicated to the study of course-prerequisite net-
works (CPNs), where nodes represent courses and directed links represent the formal prerequisite
relationships between them. The main goal of CPNs is to model interactions between courses, rep-
resent the flow of knowledge in academic curricula, and serve as a key tool for visualizing, analyzing,
and optimizing complex curricula. First, we consider several classical centrality measures, discuss
their meaning in the context of CPNs, and use them for the identification of important courses.
Next, we describe the hierarchical structure of a CPN using the topological stratification of the
network. Finally, we perform the interdependence analysis, which allows to quantify the strength
of knowledge flow between university divisions and helps to identify the most intradependent, in-
fluential, and interdisciplinary areas of study. We discuss how course-prerequisite networks can be
used by students, faculty, and administrators for detecting important courses, improving existing
and creating new courses, navigating complex curricula, allocating teaching resources, increasing
interdisciplinary interactions between departments, revamping curricula, and enhancing the overall
students’ learning experience. The proposed methodology can be used for the analysis of any CPN,
and it is illustrated with a network of courses taught at the California Institute of Technology. The
network data analyzed in this paper is publicly available in the GitHub repository.
I. INTRODUCTION
An academic curriculum is a complex system of courses
and interactions between them that lies at the heart of
a university and underlies its educational mission. Un-
derstanding a university curriculum as a whole is an im-
portant prerequisite for providing students with a high
quality education and meaningful learning experiences.
Moreover, designing an appropriate curriculum is of great
importance not only from an academic point of view, but
also for organizational and financial management.
A full list of courses together with their descriptions
given in the university catalog allows, at least in princi-
ple, to know everything about the curriculum and answer
any question about it. However, it is hard to comprehend
this raw data, extract actionable knowledge, and make
data-driven decisions.
A network, where nodes represent courses and links
represent certain relationships between them, is a nat-
ural model for conceptualizing, representing, and ana-
lyzing a curriculum. For example, links can represent
the temporal relationships between courses based on how
many students move from one course to another through
out their studies [1]. These temporal networks can be
used for forecasting course enrollments, predicting stu-
dent performance [2], and estimating the relative contri-
bution of courses [3]. Alternatively, links between courses
can reflect the influence that some subjects have on oth-
ers based on the expert knowledge of the professors. Such
influence network models can be used for the curriculum
design and recommendations [4].
Corresponding author, email: kostia@caltech.edu
This paper focuses of the study of course-prerequisite
networks (CPNs), where nodes represent courses and di-
rected links represent the formal prerequisite require-
ments between them listed in the university catalog. Un-
like temporal and influence networks, CPNs are objec-
tively defined and, when in a steady state, don’t change
substantially from year to year. Over the last decade,
CPNs have attracted a lot of interest from researchers
due to their key role in the understanding of the complex
structure of academic curricula. For example, Slim et al.
used CPNs for detecting crucial courses that have a high
impact on students progress and graduation rates [5],
Aldrich discussed applications of CPNs to advising and
curriculum reform [6], and Molontay et al. introduced a
data-driven probabilistic approach for studying the dis-
tribution of graduation time based on the CPN topol-
ogy [7].
In this paper, we propose a general network-science-
based framework for analysis of CPNs and illustrate it
with a CPN based on the courses offered at the Califor-
nia Institute of Technology in the 2021-2022 academic
year. We show that a CPN is an indispensable tool for
visualizing, understanding, and optimizing an academic
curriculum. It can be used not only for identification of
important courses, but also for improving existing and
creating new courses. We discuss how students can use
a CPN to navigate their complex curriculum, and how a
CPN can help faculty and administrators to meaningfully
allocate teaching resources, increase interactions between
divisions and departments, revamp the curriculum, and
enhance the overall students’ learning experience.
The proposed framework is based on network sci-
ence [811], which is an interdisciplinary field that
emerged at the intersection of graph theory, compu-
tational statistics, computer science, and statistical
arXiv:2210.01269v3 [physics.soc-ph] 28 Apr 2023
2
physics. The basic idea of network science is to use a
network as a simplified representation of a complex sys-
tem that captures the pattern of connection between sys-
tem’s components and represents its structural skeleton.
Networks have been used to represent a variety of so-
cial, technological, information, and biological systems
consisting of many interconnected, interacting compo-
nents. Modeling complex systems with networks has
proved to be useful for understanding systems as intri-
cate and diverse as the Internet, the world wide web, food
webs, power grids, protein interactions, interwoven social
groups, and even the human brain.
The rest of the paper is organized as follows. In Sec-
tion II, we define an abstract CPN and basic related no-
tions and describe the Caltech CPN and its giant con-
nected component. Section III is dedicated to the identi-
fication of important courses and different measures for
importance quantification. In Section IV, we construct
the topological stratification of a CPN and discuss how
the emergent hierarchical structure on the CPN can be
used for finding hidden prerequisites and creating com-
prehensive schedules for different areas of study. In Sec-
tion V, we perform an interdependence analysis of a CPN
that provides a bird’s eye view of the whole curriculum
and allows to quantify the strength of flow of knowledge
from one university division or area of study to another
and identify the most intradependent, influential, and in-
terdisciplinary divisions and areas of study. Finally, Sec-
tion VI concludes with a brief summary and specific rec-
ommendations on how students, faculty, and administra-
tors can use the results of the CPN analysis for efficient
navigation and optimal enhancement of the curriculum.
II. NETWORK REPRESENTATION OF
UNIVERSITY COURSES
The main object of study in this paper is a course-
prerequisite network (CPN), which is a directed network
that describes interactions between university courses. In
a CPN, nodes represent different courses and directed
links between nodes represent the course-prerequisite re-
lationships between the corresponding courses. A course
Xis called a prerequisite for course Yif taking Xis re-
quired before taking Y. Usually prerequisites cover ma-
terial that is necessary for understanding more advanced
courses. For example, a calculus course is often listed as
a prerequisite for a course on differential equations. If X
is a prerequisite for Y, then, in the CPN, this is repre-
sented by a directed link from node Xto node Y. In this
case, Yis called a postrequisite of X.
It is convenient to mathematically represent a CPN by
its adjacency matrix. If a CPN has nnodes labeled by
1, . . . , n, then its adjacency matrix is the n×nmatrix
with elements Aij ,i, j = 1, . . . , n, defined as follows:
Aij =(1,if there is a link from ito j,
0,otherwise. (1)
The adjacency matrices of CPNs are sparse (most of Aij
equal to zero), since a typical course has a small number
of prerequisites and serves as a prerequisite to a small
number of courses.
As an example, consider a toy curriculum consisting of
six courses: A, B, C, X, Y, and Z. The course-prerequisite
relationships are summarized in Table I. Course X has
Course Prerequisites
A —
B —
C —
X A, B
Y B, C
Z —
TABLE I. Example of a curriculum consisting of six courses.
two prerequisites, A and B, course Y has two prerequi-
sites, B and C, and all other courses have no prerequi-
sites. If a course does not have any prerequisites, like
courses A, B, C, and Z, then it can be taken any time. If
a course does not have any prerequisites and postrequi-
sites, like course Z, then it is represented by an isolated
node, i.e. a node without incoming and outgoing links.
Figure 1shows the CPN induced by the toy curriculum.
AB C
X Y Z
FIG. 1. The course-prerequisite network (CPN) for the toy
curriculum with six courses defined in Table I.
The adjacency matrix of the this toy CPN is
A=
ABCXYZ
A 0 0 0 1 0 0
B 0 0 0 1 1 0
C 0 0 0 0 1 0
X 0 0 0 0 0 0
Y 0 0 0 0 0 0
Z 0 0 0 0 0 0
(2)
A course-prerequisite network represents the flow of
knowledge between different courses in a university cur-
riculum. The main goal of this paper is to show how
CPNs can be used for visualization of complex curric-
ula, drawing important observations and insights about
the courses, and helping students to navigate and faculty
and administrators to optimize their curricula.
The methods described here can be applied to the anal-
ysis of any CPN. We will illustrate them with a real CPN
based on the courses that were offered at the California
3
Institute of Technology (Caltech) in the 2021-2022 aca-
demic year. The Caltech CPN, consisting of both under-
graduate and graduate courses, is shown in Fig. 2. All
network visualizations in this paper are done in Gephi,
an open-source and free network visualization software
package [12]. For visual clarity, the network visualization
in Fig. 2omits the node labels (course names). A larger
and more detailed visualization of the Caltech CPN is
shown in Fig. 15 in the Appendix. The network data is
publicly available in the GitHub repository [13].
FIG. 2. The 2021-2022 Caltech CPN. Nodes in gray are iso-
lated, i.e. have no prerequisites and do not serve as prereq-
uisites. Nodes in color other than gray represent connected
components. The network has 771 nodes and 772 links.
Any university curriculum contains courses that are
completely independent of each other: they are not pre-
requisites for each other, not prerequisites for another
course, there is not a course which is a common pre-
requisite for them, etc. This independence between two
courses manifests itself in the CPN by the absence of a
path between the nodes representing the courses. Inde-
pendent courses belong to different connected components
of the CPN. Technically, a (weakly) connected compo-
nent of a CPN is a subset of nodes such that for any two
nodes in the subset there exists at least one path through
the network connecting the nodes, where paths are al-
lowed to go in both ways along any link (the directions
of the links are ignored). Each isolated node constitutes
a trivial connected component. Isolated nodes usually
represent seminars, projects, outreach, and special top-
ics courses. For example, the toy CPN on Fig. 1has two
connected components: consisting of nodes {A, B, C, X,
Y}and one isolated node Z.
Real-world CPNs have several small connected compo-
nents and one “giant” connected component, called the
largest connected component (LCC), which contains the
largest fraction of nodes, almost all links, and consti-
tutes the most interesting and nontrivial part of a CPN.
The LCC of a CPN, denoted by G, is the main part of
the network, which represents its complex structure and
function.
In the Caltech CPN, in addition to isolated nodes,
there are 10 connected components represented by dif-
ferent colors in Fig. 2and, in more detail, in Fig. 15. All
but one are very small, with sizes not exceeding six nodes.
The largest connected component G(shown in pink) con-
tains n= 436 nodes (57% of all nodes) and m= 747 links
(97% of all links). In what follows, we focus our analysis
on the largest connect component Gof the Caltech CPN,
which is shown in Fig. 16 in the Appendix.
III. CENTRALITY MEASURES
One of the most interesting and intriguing questions
about a university curriculum is the following: “Which
are the most important courses in the curriculum?”. In
other words, “Which are the most important nodes in
the CPN?”. Knowing the most important courses, the
courses that form the “backbone” of the curriculum,
could help to a) better allocate university resources to
provide students with better experiences in these courses
and b) inform students about these courses, so that they
can pay special attention to them.
There are different ways to define the “importance” of
a node in a CPN. Here, we will consider three widely used
in network science centrality measures, which quantify
the node importance: degree, PageRank centrality, and
betweenness centrality.
A. Degree Distributions
The total degree of a node is the total number of links
connected to it. In directed networks like CPNs, nodes
have two kinds of degree: an in-degree, the number of in-
coming links, and an out-degree, the number of outgoing
links. The in-degree kin(i) of a node iis the number of
prerequisites course ihas, and the out-degree kout(i) of i
is the number of courses for which iis a prerequisite. In
terms of the adjacency matrix A, the in- and out-degrees
of node iare given by
kin(i) =
n
X
j=1
Aji and kout(i) =
n
X
j=1
Aij .(3)
The total degree of node iis then
k(i) = kin(i) + kout(i) =
n
X
j=1
(Aij +Aji).(4)
The in-degree of a node measures how specialized the
corresponding course is: the larger kin(i) is, the more
4
prerequisites course ihas, the more specialized it is. The
out-degree, on the other hand, measures how fundamen-
tal a course is: the larger kout(i) is, the more courses
have ias a prerequisite, the more fundamental course i
is. We expect that in real-wold CPNs, the in- and out-
degrees of nodes are negatively correlated. The absolute
value of the Pearson correlation coefficient between in-
and out-degrees of nodes,
ρin,out =
n
P
i=1 kin(i)¯
kinkout(i)¯
kout
sn
P
i=1 kin(i)¯
kin2sn
P
i=1 kout(i)¯
kout2
,
(5)
can be used to measure the structural difference between
fundamental and specialized courses.
To elaborate more on this, consider two extreme cases
shown in Fig. 3.
nodes
/2 nodes
/2 nodes
(a)
(b)
FIG. 3. Two extreme cases. Curriculum (a) (blue): all courses
are equivalent. Curriculum (b) (green): there is a substantial
structural difference between fundamental courses (bottom
row) and specialized courses (top row).
In curriculum (a), essentially all courses are struc-
turally equivalent (except for the first and the last one);
there are no fundamental and specialized courses. The
sequences of in- and out-degrees are kin = (0,1,...,1)
and kout = (1,...,1,0), and the correlation coefficient
between them is
ρ(a)
in,out =1
n10,for large n. (6)
In curriculum (b), however, there are fundamental
courses (bottom row) and specialized courses (top row).
These two types of courses are structurally very dif-
ferent: all fundamental courses are prerequisites for
all specialized courses. The sequences of in- and out-
degrees are kin = (0,...,0, n/2, . . . , n/2) and kout =
(n/2, . . . , n/2,0,...,0), and the correlation coefficient
between them is
ρ(b)
in,out =1,for any n. (7)
In real-world CPNs, the correlation coefficient ρin,out
(1,0), and its magnitude |ρin,out|quantifies the
structural “gap” between fundamental and specialized
courses: the larger |ρin,out|is, the more significant the
split of the curriculum into fundamental and specialized
courses.
Figure 4shows the histograms of in-, out-, and total
degrees in the LCC Gof the Caltech CPN. All degree
distributions are right-skewed and have long right tails.
0 1 2 3 4 5 6 7
In-degree, kin
0
20
40
60
80
100
120
140
160
180
200
Number of Courses
0 5 10 15 20 25 30
Out-degree, kout
0
50
100
150
200
250
300
Number of Courses
0 5 10 15 20 25 30 35
Total degree, k
0
20
40
60
80
100
120
140
Number of Courses
FIG. 4. The in-, out-, and total degree distributions in the
largest connected component Gof the Caltech CPN.
5
A network is called scale-free, a term coined in the
seminal paper [14], if its degree distribution follows a
power-law,
P(k)kγ,for kkmin,(8)
where P(k) is the probability that a node chosen uni-
formly at random has degree k,kmin is the lower cut-
off for the scaling region, and γis the power-law expo-
nent. Many real-world networks are approximately scale-
free, with the power-law exponent typically in the range
2< γ < 3 [8]. For example, both the in- and out-degrees
of the World Wide Web approximately follow power-law
distributions with γin = 2.1 and γout = 2.7 [15]. For
the LCC Gof the Caltech CPN, the hypothesis test for
discerning and quantifying power-law behavior in empiri-
cal data developed in [16] accepts the hypothesis that the
total degree distribution (bottom panel in Fig. 4) approx-
imately follows a power-law, but rejects that hypothesis
for the in- and out-degrees. The maximum-likelihood fit-
ting method developed in [16] estimates the power-law
exponent γand the lower cut-off kmin as follows:
γ= 2.48 and kmin = 3.(9)
Most courses in G(74%) have one or two prerequisites
(top panel in Fig. 4) and many (63%) are “dead ends,”
i.e. they do not serve as prerequisites for other courses
(middle panel in Fig. 4). The average in-, out-, and total
degrees are (note that always ¯
kin =¯
kout =¯
k/2):
¯
kin =1
n
n
X
i=1
kin(i)=1.70,
¯
kout =1
n
n
X
i=1
kout(i) = 1.70,
¯
k=1
n
n
X
i=1
k(i) = ¯
kin +¯
kout = 3.40.
(10)
The Pearson correlation coefficient between in- and
out-degrees of nodes is
ρin,out =0.13.(11)
It is instructive to see how exactly the in- and out-
degrees contribute to the total degrees of nodes. Let
Gd⊂ G be the subset of nodes with total degree dand
ndbe the number of nodes in Gd,
Gd={i:k(i) = d}, nd=|Gd|.(12)
Then for any node in Gdthe sum of its in- and out-degrees
is exactly d,
kin(i) + kout(i) = d, i∈ Gd.(13)
Let ¯
k(d)
in and ¯
k(d)
out be the average in- and out-degrees of
nodes in Gd,
¯
k(d)
in =1
ndX
i∈Gd
kin(i) and ¯
k(d)
out =1
ndX
i∈Gd
kout(i).(14)
Then, by averaging (13) over all nodes i∈ Gd, we have
¯
k(d)
in +¯
k(d)
out =d, (15)
for any total degree d. Figure 5shows the decomposition
of the total degree dinto the in- and out- components
¯
k(d)
in and ¯
k(d)
out.
FIG. 5. Decomposition d=¯
k(d)
in +¯
k(d)
out of the total degree d
into the in- and out- components ¯
k(d)
in and ¯
k(d)
out.
In all subsets Gd⊂ G for d > 5, ¯
k(d)
out strongly dominates
¯
k(d)
in and contributes the most to the total degree d. The
in-degree component can be viewed as a “noise” added
to the out-degree component.
Tables II,III, and IV list the top courses with respect
to the in-, out-, and total degrees.
In-Degree Top 9
Course Title In-Deg
1 CMS 139 Algorithm Analysis 7
2 Ge 270 Continental Tectonics 7
3 ACM 106 Computational Math 5
4 ChE 111 Sustainability 5
5 Ch 21 Physical Chem 5
6 Ch 25 Biophysical Chem 5
7 CMS 144 Networks 5
8 ME 50 Modelling in Mech. Eng. 5
9 Ph 6 Physics Lab 5
TABLE II. The top 9 courses with the largest in-degree. Cut-
off was set to in-degree 5.
As expected, the top in-degree nodes in Table II in-
clude some of the most specialized courses offered at Cal-
tech. These courses are often at the graduate level and
require a considerable amount of previous coursework in
order to be taken. The top out-degree nodes in Table III
correspond to some of the most fundamental courses in
摘要:

Course-PrerequisiteNetworksforAnalyzingandUnderstandingAcademicCurriculaPavlosStavrinides1andKonstantinZuev1,1DepartmentofComputingandMathematicalSciences,CaliforniaInstituteofTechnology,Pasadena,CA91125,USAUnderstandingacomplexsystemofrelationshipsbetweencoursesisofgreatimportancefortheuniversity'...

展开>> 收起<<
Course-Prerequisite Networks for Analyzing and Understanding Academic Curricula Pavlos Stavrinides1and Konstantin Zuev1 1Department of Computing and Mathematical Sciences.pdf

共25页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:25 页 大小:1.25MB 格式:PDF 时间:2025-05-01

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 25
客服
关注