ANEXPERIMENTAL STUDY OF DIMENSION REDUCTION METHODS ON MACHINE LEARNING ALGORITHMS WITH APPLICATIONS TO PSYCHOMETRICS

2025-04-30 0 0 1.27MB 14 页 10玖币
侵权投诉
ANEXPERIMENTAL STUDY OF DIMENSION REDUCTION
METHODS ON MACHINE LEARNING ALGORITHMS WITH
APPLICATIONS TO PSYCHOMETRICS
ARXIV PREPRINT
Sean H. Merritt
Department of Economics
Claremont Graduate University
150 E 10th St, Claremont, CA, 91711
sean.merritt@cgu.edu
Alexander P. Christensen
Department of Psychology and Human Development
Vanderbilt University
Nashville, TN, 37203
alexander.christensen@vanderbilt.edu
March 23, 2023
The final version of this paper can be found at Advances of Artificial Intelligence and Machine Learning.
ABSTRACT
Developing interpretable machine learning models has become an increasingly important issue. One
way in which data scientists have been able to develop interpretable models has been to use dimension
reduction techniques. In this paper, we examine several dimension reduction techniques including
two recent approaches developed in the network psychometrics literature called exploratory graph
analysis (EGA) and unique variable analysis (UVA). We compared EGA and UVA with two other
dimension reduction techniques common in the machine learning literature (principal component
analysis and independent component analysis) as well as no reduction in the variables. We show that
EGA and UVA perform as well as the other reduction techniques or no reduction. Consistent with
previous literature, we show that dimension reduction can decrease, increase, or provide the same
accuracy as no reduction of variables. Our tentative results find that dimension reduction tends to
lead to better performance when used for classification tasks.
Keywords dimension reduction ·exploratory graph analysis ·PCA ·ICA ·machine learning ·interpretability
The final version of this paper is published at Advances in Artificial Intelligence and Machine Learning
1 Introduction
Machine learning has proliferated across science and impacted domains such as biology, chemistry, economics,
neuroscience, physics, and psychology. In nearly all scientific domains, new technology has allowed for more data to
be collected leading to high-dimensional data. With increasingly complex data, the parameters of the machine learning
algorithms exponentially increase leading to issues in interpretability. Solutions to this issue requires either careful
feature engineering, feature selection, regularization or some combination of them. In this paper, we focus on feature
engineering by way of dimension reduction.
The goal of dimension reduction within machine learning is to reduce the number of variables to a refined set of
variables that retain the maximum variance explainable in the whole set that then maximizes prediction. The standard
arXiv:2210.13230v3 [cs.LG] 22 Mar 2023
ARXIV PREPRINT - MARCH 23, 2023
method in machine learning has been to apply Principal Component Analysis (PCA). PCA attempts to find a linear
combination of dimensions that are uncorrelated (or orthogonal) and adequately explain the majority of variance
between all variables in the dataset. The utility of PCA in machine learning contexts is clear: variables are embedded
in a reduced dimension space that maximizes their distinct variance from other dimensions. Given the congruence
between the goals of dimension reduction within machine learning and the function of PCA, it’s not surprising that the
method has become the go-to choice for machine learning researchers.
Should PCA be the de facto dimension reduction method? Previous work examining the effects of different dimension
reduction techniques within machine learning algorithms is sparse. Reddy and colleagues [
1
] tested PCA and linear
discriminant analysis (LDA) against no dimension reduction on cardiotocography data. They found that PCA performed
better than no reduction when the number of features was high. Similar work has found that PCA tends to perform as
well as or better than no reduction [
2
,
3
]. These studies, however, have been limited to examining classification tasks
only and very specific applications (e.g., cardiotocography, internet of things, bot detection). Whether PCA should be
routinely applied to data before using machine learning algorithms is an open question that we aim to address.
Other commonly used dimension reduction techniques include independent component analysis (ICA). ICA is similar
to PCA in that it tries to linearly separate variables into dimensions that are statistically independent rather than
uncorrelated. This function is the major difference between their goals: PCA seeks to maximize explained variance
in each dimension such that dimensions are uncorrelated whereas ICA seeks to identify underlying dimensions that
are statistically independent (maximizing variance explained is not an objective). Similar to PCA, there is a strong
congruence between the goals of dimension reduction within machine learning and ICA. With statistically independent
dimensions, the data are separated into completely unique dimensions. This property ensures that the predicted variance
of an outcome is explained uniquely by each dimension. One advantage ICA has over PCA is that it can work well with
non-Gaussian data and therefore does not require variables to be normalized. ICA is commonly used in face recognition
[4] as well as neuroscience to identify distinct connectivity patterns between regions of the brain [5,6].
PCA and ICA are perhaps the two most commonly used dimension reduction methods in machine learning. Despite
their common usage, few studies have systematically evaluated whether one should be preferred when it comes to
classification or regression tasks. Similarly, few studies, to our knowledge, have examined the extent to which dimension
reduction improves prediction accuracy relative to no data reduction at all. Beyond PCA and ICA, there are other
dimension reduction methods that offer different advantages that could potentially be useful in machine learning
frameworks. Supervised methods, such as sufficient dimension reduction techniques [
7
], are common in literature,
but for the purpose of this paper we focus on unsupervised methods from the network psychometrics literature in
psychology.
Exploratory graph analysis (EGA) and unique variable analysis (UVA) are methods that have recently emerged in the
field of network psychometrics [
8
]. These techniques build off of graph theory and social network analysis techniques
to identify dimensions in multivariate data. EGA is often compared to PCA in simulations that mirror common
psychological data structures [
9
,
10
,
11
]. UVA, in contrast, rose out of a need to identify whether variables are redundant
(e.g., multicollinearity, locally dependent) with one another and could be reduced to single, unique variables [
12
].
Given the goal of dimension reduction in machine learning, these two approaches seem potentially useful for reducing
high-dimensional data and identifying unique, non-redundant sources of variance (respectively).
In the present study, we compare PCA, ICA, EGA, UVA, and no reduction on 14 different data sets, seven classification
tasks and seven regression tasks. The main aims of this paper are to (1) introduce two alternative dimension reduction
methods to the machine learning literature, (2) compare these and the other dimension reduction methods against each
other as well as no reduction to the data on a variety of data types and tasks, and (3) examine features of data that lead
to dimension reduction improving machine learning algorithms prediction over no reduction. The paper is outlined as
follows: section two defines and formalizes EGA and UVA, section three explains the data and procedures in detail,
section four reports the results, and section five provides our concluding remarks.
2 Psychometric Dimension Reduction
2.1 Exploratory Graph Analysis
Exploratory graph analyses (EGA) begins by representing the relationship among variables with the Gaussian graphical
model (GGM) with the graph
G={vi, eij }
, where node
vi
represents the
ith
variable and the edge
eij
is the
partial correlation between variable
vi
and
vj
. Estimating a GGM in psychology is often done using the EBICglasso
[
13
,
14
,
15
], which applies the graphical least absolute shrinkage and selection operator (GLASSO) [
16
,
17
] to the
inverse covariance matrix and uses the extended Bayesian information criterion (EBIC) [18] to select the model.
To define the GLASSO regularization method, first assume yis a multivariate normal distribution:
2
ARXIV PREPRINT - MARCH 23, 2023
yN(0,Σ),(1)
where Σis the population variance-covariance matrix. Let Kdenote the inverse covariance matrix:
K=Σ1.(2)
K
can be standardized to produce a partial correlation matrix with each element representing the partial correlation
between yiand yjconditioned on all other variables (yi, yj|y(i,j)) [19]:
Cor(yi, yj|y(i,j)) = κij
κiiκjj
,(3)
where
κij
represents the
ith
and
jth
element of
K
. The GLASSO regularization method aims to estimate the inverse
covariance matrix Kby maximizing the penalized log-likelihood, which is defined as [16]:
log det(K)trace(SK)λX
<i,j> |κij |,(4)
where
S
represents the sample variance-covariance matrix. The
λ
parameter represents the penalty on the log-likelihood
such that larger values (larger penalty) results in a sparser (fewer non-zero values) inverse covariance matrix. Conversely,
smaller values (smaller penalty) results in a denser (fewer zero values) inverse covariance matrix. A GLASSO network
is represented as a partial correlation matrix using Eq. 3.
Multiple values of
λ
are commonly used and model selection techniques such as cross-validation [
16
] are applied
to determine the best fitting model. In the psychometric literature, a more common approach has been to apply the
extended Bayesian information criterion (EBIC) [
18
] to select the
λ
parameter and best fitting model. The EBIC is
defined as:
EBIC = 2L+Elog(N)+4γE log(P),(5)
where
L
denotes log-likelihood,
N
the number of observations,
E
the number of non-zero elements in
K
(edges), and
P
the number of variables (nodes). Several
λ
values (e.g., 100) are selected from a expotential set of values between
0 and 1. The default setting of this range is defined by a minimum-maximum ratio typically set to 0.01 [
14
]. The
γ
parameter of the EBIC controls how much simpler models (i.e., fewer non-zero edges) are preferred to more complex
models (i.e., fewer zero edges). The default setting for this parameter is typically set to 0.50 [15].
After estimating the GGM via the EBICglasso method, EGA estimates the number of dimensions in the network using
a community detection algorithm. There are many different community detection algorithms with some of the more
commonly applied algorithms being the Walktrap [
20
] and Louvain [
9
,
11
,
21
,
22
,
23
]. The Walktrap algorithm uses
random walks to obtain a transition matrix that specifies how likely one node would be to "step" to another node. On
this transition matrix, Ward’s hierarchical clustering algorithm [
24
] is applied to the transition matrix and modularity
[25] is used to decide the appropriate "cut" or number of clusters should remain.
Modularity is also used as the primary objective function of the Louvain algorithm. Because of its importance for these
two algorithms, we define modularity (Q) [26]:
di=
p
X
i=1
wij ,(6)
D=1
2
p
X
i=1
p
X
j=1
wij ,(7)
Q=1
2D
p
X
i=1
p
X
j=1 wij didj
2Dδ(ci, cj),(8)
where
wij
is the weight (partial correlation) between node
i
and node
j
in the network,
p
is the number of nodes in the
network,
di
is the degree or sum of the edge weights connected to node
i
,
D
is the total sum of all the edge weights in
3
摘要:

ANEXPERIMENTALSTUDYOFDIMENSIONREDUCTIONMETHODSONMACHINELEARNINGALGORITHMSWITHAPPLICATIONSTOPSYCHOMETRICSARXIVPREPRINTSeanH.MerrittDepartmentofEconomicsClaremontGraduateUniversity150E10thSt,Claremont,CA,91711sean.merritt@cgu.eduAlexanderP.ChristensenDepartmentofPsychologyandHumanDevelopmentVanderbilt...

展开>> 收起<<
ANEXPERIMENTAL STUDY OF DIMENSION REDUCTION METHODS ON MACHINE LEARNING ALGORITHMS WITH APPLICATIONS TO PSYCHOMETRICS.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:1.27MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注