ARXIV PREPRINT - MARCH 23, 2023
method in machine learning has been to apply Principal Component Analysis (PCA). PCA attempts to find a linear
combination of dimensions that are uncorrelated (or orthogonal) and adequately explain the majority of variance
between all variables in the dataset. The utility of PCA in machine learning contexts is clear: variables are embedded
in a reduced dimension space that maximizes their distinct variance from other dimensions. Given the congruence
between the goals of dimension reduction within machine learning and the function of PCA, it’s not surprising that the
method has become the go-to choice for machine learning researchers.
Should PCA be the de facto dimension reduction method? Previous work examining the effects of different dimension
reduction techniques within machine learning algorithms is sparse. Reddy and colleagues [
1
] tested PCA and linear
discriminant analysis (LDA) against no dimension reduction on cardiotocography data. They found that PCA performed
better than no reduction when the number of features was high. Similar work has found that PCA tends to perform as
well as or better than no reduction [
2
,
3
]. These studies, however, have been limited to examining classification tasks
only and very specific applications (e.g., cardiotocography, internet of things, bot detection). Whether PCA should be
routinely applied to data before using machine learning algorithms is an open question that we aim to address.
Other commonly used dimension reduction techniques include independent component analysis (ICA). ICA is similar
to PCA in that it tries to linearly separate variables into dimensions that are statistically independent rather than
uncorrelated. This function is the major difference between their goals: PCA seeks to maximize explained variance
in each dimension such that dimensions are uncorrelated whereas ICA seeks to identify underlying dimensions that
are statistically independent (maximizing variance explained is not an objective). Similar to PCA, there is a strong
congruence between the goals of dimension reduction within machine learning and ICA. With statistically independent
dimensions, the data are separated into completely unique dimensions. This property ensures that the predicted variance
of an outcome is explained uniquely by each dimension. One advantage ICA has over PCA is that it can work well with
non-Gaussian data and therefore does not require variables to be normalized. ICA is commonly used in face recognition
[4] as well as neuroscience to identify distinct connectivity patterns between regions of the brain [5,6].
PCA and ICA are perhaps the two most commonly used dimension reduction methods in machine learning. Despite
their common usage, few studies have systematically evaluated whether one should be preferred when it comes to
classification or regression tasks. Similarly, few studies, to our knowledge, have examined the extent to which dimension
reduction improves prediction accuracy relative to no data reduction at all. Beyond PCA and ICA, there are other
dimension reduction methods that offer different advantages that could potentially be useful in machine learning
frameworks. Supervised methods, such as sufficient dimension reduction techniques [
7
], are common in literature,
but for the purpose of this paper we focus on unsupervised methods from the network psychometrics literature in
psychology.
Exploratory graph analysis (EGA) and unique variable analysis (UVA) are methods that have recently emerged in the
field of network psychometrics [
8
]. These techniques build off of graph theory and social network analysis techniques
to identify dimensions in multivariate data. EGA is often compared to PCA in simulations that mirror common
psychological data structures [
9
,
10
,
11
]. UVA, in contrast, rose out of a need to identify whether variables are redundant
(e.g., multicollinearity, locally dependent) with one another and could be reduced to single, unique variables [
12
].
Given the goal of dimension reduction in machine learning, these two approaches seem potentially useful for reducing
high-dimensional data and identifying unique, non-redundant sources of variance (respectively).
In the present study, we compare PCA, ICA, EGA, UVA, and no reduction on 14 different data sets, seven classification
tasks and seven regression tasks. The main aims of this paper are to (1) introduce two alternative dimension reduction
methods to the machine learning literature, (2) compare these and the other dimension reduction methods against each
other as well as no reduction to the data on a variety of data types and tasks, and (3) examine features of data that lead
to dimension reduction improving machine learning algorithms prediction over no reduction. The paper is outlined as
follows: section two defines and formalizes EGA and UVA, section three explains the data and procedures in detail,
section four reports the results, and section five provides our concluding remarks.
2 Psychometric Dimension Reduction
2.1 Exploratory Graph Analysis
Exploratory graph analyses (EGA) begins by representing the relationship among variables with the Gaussian graphical
model (GGM) with the graph
G={vi, eij }
, where node
vi
represents the
ith
variable and the edge
eij
is the
partial correlation between variable
vi
and
vj
. Estimating a GGM in psychology is often done using the EBICglasso
[
13
,
14
,
15
], which applies the graphical least absolute shrinkage and selection operator (GLASSO) [
16
,
17
] to the
inverse covariance matrix and uses the extended Bayesian information criterion (EBIC) [18] to select the model.
To define the GLASSO regularization method, first assume yis a multivariate normal distribution:
2