PREDICTING SCHOOL TRANSITION RATES IN AUSTRIA WITH CLASSIFICATION TREES ANNETTE M OLLER

2025-05-02 0 0 970.47KB 15 页 10玖币
侵权投诉
PREDICTING SCHOOL TRANSITION RATES IN AUSTRIA WITH
CLASSIFICATION TREES
ANNETTE M ¨
OLLER
Faculty of Business Administration and Economics, Bielefeld University, Germany
ANN CATHRICE GEORGE
Federal Institute for Quality Assurance of the Austrian School System (IQS), Austria
J¨
URGEN GROSS
Institute for Mathematics and Applied Informatics, University of Hildesheim, Germany
Abstract. Methods based on machine learning become increasingly popular in many areas
as they allow models to be fitted in a highly-data driven fashion, and often show comparable
or even increased performance in comparison to classical methods. However, in the area of
educational sciences the application of machine learning is still quite uncommon. This work
investigates the benefit of using classification trees for analyzing data from educational sciences.
An application to data on school transition rates in Austria indicates different aspects of in-
terest in the context of educational sciences: (i) the trees select variables for predicting school
transition rates in a data-driven fashion which are well in accordance with existing confirmatory
theories from educational sciences, (ii) trees can be employed for performing variable selection
for regression models, (iii) the classification performance of trees is comparable to that of binary
regression models. These results indicate that trees and possibly other machine learning meth-
ods may also be helpful to explore high-dimensional educational data sets, especially where no
confirmatory theories have been developed yet.
1. Introduction
Machine learning methods become more and more popular in many applications and often
show competitive performance to traditional models from applied statistics, as e.g. regression
models. Regression and classification trees have many appealing advantages. In contrast to
traditional regression models they can deal with very large numbers of predictors and do not
require any assumptions regarding distribution or the relationship of predictors and response.
Furthermore, a tree model is highly interpretable due to its hierarchical nature.
Although the use of machine learning is still rare in educational sciences, it recently becomes
evident that applying these methods in combination with traditional statistical methods is
helpful in many ways (Lezhnina and Kismih´ok, 2021). A few approaches utilizing machine
learning are given in the following: In Sinharay (2016) trees, random forests and boosting are
employed to predict different variables of educational interest, as for example item difficulty,
E-mail addresses:annette.moeller@uni-bielefeld.de, anncathrice.george@iqs.gv.at,
juergen.gross@uni-hildesheim.de.
Key words and phrases. Regression and Classification Trees; school transition; variable selection and im-
portance; multilevel structure of data; large-scale assessment.
This is an original manuscript of an article published by Taylor & Francis in International Journal of Research
& Method on 17 Oct 2022, available at: http://www.tandfonline.com/doi/full/10.1080/1743727X.2022.2128744.
1
arXiv:2210.11580v1 [stat.AP] 20 Oct 2022
2 PREDICTING SCHOOL TRANSITION RATES IN AUSTRIA WITH CLASSIFICATION TREES
high school dropouts, and scorings in electronic essays. The authors note that these methods
have slightly superior performance over traditional regression models. The work in Gao and
Rogers (2011) uses regression trees to predict and interpret item difficulties in a language
assessment survey and concluded that the tree structure can be used to enhance interpretation
of the items. The authors in Salles et al. (2020) made attempts to analyze data from computer
based assessments, which pose challenges to the researcher due to their high-dimensionality.
In order to continue and extend the research conducted so far in educational sciences the
issues (i) - (iii) stated in the following will be addressed in the subsequent analysis:
(i) Are the predictors selected in the tree in accordance with existing theories from educational
sciences or can even complement them (Section 4.1)?
(ii) Is it possible to utilize the variable choices of a tree as data-driven variable selection method
for building regression models to predict an educational response variable? Can such a tree-
based variable selection be supportive and complementary when choosing variables based on
educational approaches (Section 4.2)?
(iii) Is the performance of a tree comparable to the performance of traditional generalized linear
(mixed) models when applied to educational data (Section 4.3)?
In this regard classification trees are employed to predict school transition rates in Austria
based on data of the test of educational standards in mathematics for fourth graders (BIFIE,
2019).
If the variable selection of trees indeed leads to reasonable interpretations, future research
may consider the application of trees or other machine learning methods to data and research
questions in which no educational theories exist so far. This possibility is further outlined in
Section 5.
2. Background information on data and investigated educational topic
2.1. Educational research topic and existing educational approaches. The Organiza-
tion for Cooperation and Development (OECD, 2010) has shown that unemployment rates
decrease as the level of education raises. Understanding the factors related to educational as-
piration is of great interest to educators in order to explain and predict the choices students
make during their educational paths. Students in Austria make the first choice regarding their
educational path after grade four: for the upcoming school transition they can select between
a higher academic track, called “Allgemeine H¨ohere Schule” (AHS), which (after graduation)
allows them to enroll at a university, and lower academic tracks. Insights from educational
science show that not only the students’ competencies are of relevance for that selection but
also other social and personal factors. The present study follows the (existing) approach of
Gil-Flores et al. (2011) which model students’ aspiration in dependence of a small number of
variables they indicated in a comprehensive literature review. These variables are gender, ed-
ucational status of parents, educational attainment, socio-economic status (measured by the
number of books in the household), and parental involvement in school (measured by the stu-
dents’ equipment regarding a working place at home).
2.2. Background on data and variables. Information about students’ educational aspira-
tion, i.e. their plans which school to visit after grade four are given in the data of the Austrian
educational standard tests for fourth graders. The Austrian educational standard tests are
employed to monitor students’ competencies, and based on the results, to enhance the school
system. This study is based on data of the Austrian educational standard test in mathematics
for fourth graders in 2018 (BIFIE, 2019). The Austrian standards testing is mandatory, which
leads to a survey of 73,780 students in 4,925 classes and 2,961 schools. The present study
includes a representative sample of 8,520 students in 637 classes and 430 schools.
The data mainly includes students’ competencies in mathematics which can be categorized
into four content sub-domains (i.e. numbers, operations, measures, geometry), and four cog-
nitive sub-domains (i.e. model building, calculating, communicating, problem solving), with
PREDICTING SCHOOL TRANSITION RATES IN AUSTRIA WITH CLASSIFICATION TREES 3
Table 1. Overview on important variables in data set. The first column contains
the variable names, the second column states the level on which a variable is
measured (in brackets the level of the corresponding aggregated variable), the
third column provides a short explanation of each variable.
Variable Data Level Description
aspiration-education student
Parents aspiration of the highest education their
children will achieve (higher values indicate
higher education)
math-grade student Grade students achieve in mathematics (higher
values indicate poorer grades)
points-calculating student
Points in the educational standards test in cog-
nitive sub-domain calculating (higher values for
higher competencies)
points-communicating student
Points in the educational standards test in cog-
nitive sub-domain communicating about math-
ematical facts (higher values for higher compe-
tencies)
private-tutoring
(private-tutoring-aggCL)
student
(class)
Number of hours for private tutoring in class
(categorized, 7 levels, higher values indicate
more hours)
after-school
(after-school-aggSL)
student
(school)
Visiting after-school supervision (higher values
indicate more time spent in supervision)
social-status
(social-status-aggSL)
student
(school)
A measure indicating the social status of the
student (higher values indicate higher social sta-
tus)
federal-state school Federal state of school location (9 states in Aus-
tria)
town-size school Size of the town where the school is located
(higher values indicate larger towns)
school-size school Total number of students attending the school
urban school
Degree of urbanization where the school is lo-
cated (higher values indicate a smaller degree of
urbanization)
each of them measured in the test (for measuring students’ competencies in all eight domains
see also Groß et al., 2016). The results are given on continuous scales with a mean value of
500.
Besides the students’ competencies additional background information of students, teachers,
parents and schools is collected via context questionnaires (BIFIE, 2018). The questionnaires
yield multifaceted information about the students’ family and school environment, as well as
personal and motivational factors. As mentioned above, the variable to be predicted in this
study is the school the students will attend in the following year.
The around 700 context variables obtained from the complete questionnaires will be employed
in the subsequent analysis to predict school transition. Table 1 presents a selection of context
variables, which will be found to be of relevance in the subsequent model building process, see,
e.g., Section 4.1.
摘要:

PREDICTINGSCHOOLTRANSITIONRATESINAUSTRIAWITHCLASSIFICATIONTREESANNETTEMOLLERFacultyofBusinessAdministrationandEconomics,BielefeldUniversity,GermanyANNCATHRICEGEORGEFederalInstituteforQualityAssuranceoftheAustrianSchoolSystem(IQS),AustriaJURGENGROSSInstituteforMathematicsandAppliedInformatics,Unive...

展开>> 收起<<
PREDICTING SCHOOL TRANSITION RATES IN AUSTRIA WITH CLASSIFICATION TREES ANNETTE M OLLER.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:970.47KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注