ANEXPERIMENTAL STUDY OF DIMENSION REDUCTION METHODS ON MACHINE LEARNING ALGORITHMS WITH APPLICATIONS TO PSYCHOMETRICS

2025-04-30 1 0 1.27MB 14 页 10玖币

侵权投诉

ANEXPERIMENTAL STUDY OF DIMENSION REDUCTION

METHODS ON MACHINE LEARNING ALGORITHMS WITH

APPLICATIONS TO PSYCHOMETRICS

ARXIV PREPRINT

Sean H. Merritt

Department of Economics

Claremont Graduate University

150 E 10th St, Claremont, CA, 91711

sean.merritt@cgu.edu

Alexander P. Christensen

Department of Psychology and Human Development

Vanderbilt University

Nashville, TN, 37203

alexander.christensen@vanderbilt.edu

March 23, 2023

The ﬁnal version of this paper can be found at Advances of Artiﬁcial Intelligence and Machine Learning.

ABSTRACT

Developing interpretable machine learning models has become an increasingly important issue. One

way in which data scientists have been able to develop interpretable models has been to use dimension

reduction techniques. In this paper, we examine several dimension reduction techniques including

two recent approaches developed in the network psychometrics literature called exploratory graph

analysis (EGA) and unique variable analysis (UVA). We compared EGA and UVA with two other

dimension reduction techniques common in the machine learning literature (principal component

analysis and independent component analysis) as well as no reduction in the variables. We show that

EGA and UVA perform as well as the other reduction techniques or no reduction. Consistent with

previous literature, we show that dimension reduction can decrease, increase, or provide the same

accuracy as no reduction of variables. Our tentative results ﬁnd that dimension reduction tends to

lead to better performance when used for classiﬁcation tasks.

Keywords dimension reduction ·exploratory graph analysis ·PCA ·ICA ·machine learning ·interpretability

The ﬁnal version of this paper is published at Advances in Artiﬁcial Intelligence and Machine Learning

1 Introduction

Machine learning has proliferated across science and impacted domains such as biology, chemistry, economics,

neuroscience, physics, and psychology. In nearly all scientiﬁc domains, new technology has allowed for more data to

be collected leading to high-dimensional data. With increasingly complex data, the parameters of the machine learning

algorithms exponentially increase leading to issues in interpretability. Solutions to this issue requires either careful

feature engineering, feature selection, regularization or some combination of them. In this paper, we focus on feature

engineering by way of dimension reduction.

The goal of dimension reduction within machine learning is to reduce the number of variables to a reﬁned set of

variables that retain the maximum variance explainable in the whole set that then maximizes prediction. The standard

arXiv:2210.13230v3 [cs.LG] 22 Mar 2023

ARXIV PREPRINT - MARCH 23, 2023

method in machine learning has been to apply Principal Component Analysis (PCA). PCA attempts to ﬁnd a linear

combination of dimensions that are uncorrelated (or orthogonal) and adequately explain the majority of variance

between all variables in the dataset. The utility of PCA in machine learning contexts is clear: variables are embedded

in a reduced dimension space that maximizes their distinct variance from other dimensions. Given the congruence

between the goals of dimension reduction within machine learning and the function of PCA, it’s not surprising that the

method has become the go-to choice for machine learning researchers.

Should PCA be the de facto dimension reduction method? Previous work examining the effects of different dimension

reduction techniques within machine learning algorithms is sparse. Reddy and colleagues [

] tested PCA and linear

discriminant analysis (LDA) against no dimension reduction on cardiotocography data. They found that PCA performed

better than no reduction when the number of features was high. Similar work has found that PCA tends to perform as

well as or better than no reduction [

]. These studies, however, have been limited to examining classiﬁcation tasks

only and very speciﬁc applications (e.g., cardiotocography, internet of things, bot detection). Whether PCA should be

routinely applied to data before using machine learning algorithms is an open question that we aim to address.

Other commonly used dimension reduction techniques include independent component analysis (ICA). ICA is similar

to PCA in that it tries to linearly separate variables into dimensions that are statistically independent rather than

uncorrelated. This function is the major difference between their goals: PCA seeks to maximize explained variance

in each dimension such that dimensions are uncorrelated whereas ICA seeks to identify underlying dimensions that

are statistically independent (maximizing variance explained is not an objective). Similar to PCA, there is a strong

congruence between the goals of dimension reduction within machine learning and ICA. With statistically independent

dimensions, the data are separated into completely unique dimensions. This property ensures that the predicted variance

of an outcome is explained uniquely by each dimension. One advantage ICA has over PCA is that it can work well with

non-Gaussian data and therefore does not require variables to be normalized. ICA is commonly used in face recognition

[4] as well as neuroscience to identify distinct connectivity patterns between regions of the brain [5,6].

PCA and ICA are perhaps the two most commonly used dimension reduction methods in machine learning. Despite

their common usage, few studies have systematically evaluated whether one should be preferred when it comes to

classiﬁcation or regression tasks. Similarly, few studies, to our knowledge, have examined the extent to which dimension

reduction improves prediction accuracy relative to no data reduction at all. Beyond PCA and ICA, there are other

dimension reduction methods that offer different advantages that could potentially be useful in machine learning

frameworks. Supervised methods, such as sufﬁcient dimension reduction techniques [

], are common in literature,

but for the purpose of this paper we focus on unsupervised methods from the network psychometrics literature in

psychology.

Exploratory graph analysis (EGA) and unique variable analysis (UVA) are methods that have recently emerged in the

ﬁeld of network psychometrics [

]. These techniques build off of graph theory and social network analysis techniques

to identify dimensions in multivariate data. EGA is often compared to PCA in simulations that mirror common

psychological data structures [

]. UVA, in contrast, rose out of a need to identify whether variables are redundant

(e.g., multicollinearity, locally dependent) with one another and could be reduced to single, unique variables [

Given the goal of dimension reduction in machine learning, these two approaches seem potentially useful for reducing

high-dimensional data and identifying unique, non-redundant sources of variance (respectively).

In the present study, we compare PCA, ICA, EGA, UVA, and no reduction on 14 different data sets, seven classiﬁcation

tasks and seven regression tasks. The main aims of this paper are to (1) introduce two alternative dimension reduction

methods to the machine learning literature, (2) compare these and the other dimension reduction methods against each

other as well as no reduction to the data on a variety of data types and tasks, and (3) examine features of data that lead

to dimension reduction improving machine learning algorithms prediction over no reduction. The paper is outlined as

follows: section two deﬁnes and formalizes EGA and UVA, section three explains the data and procedures in detail,

section four reports the results, and section ﬁve provides our concluding remarks.

2 Psychometric Dimension Reduction

2.1 Exploratory Graph Analysis

Exploratory graph analyses (EGA) begins by representing the relationship among variables with the Gaussian graphical

model (GGM) with the graph

G={vi, eij }

, where node

represents the

ith

variable and the edge

eij

is the

partial correlation between variable

and

. Estimating a GGM in psychology is often done using the EBICglasso

[

], which applies the graphical least absolute shrinkage and selection operator (GLASSO) [

] to the

inverse covariance matrix and uses the extended Bayesian information criterion (EBIC) [18] to select the model.

To deﬁne the GLASSO regularization method, ﬁrst assume yis a multivariate normal distribution:

ARXIV PREPRINT - MARCH 23, 2023

y∼N(0,Σ),(1)

where Σis the population variance-covariance matrix. Let Kdenote the inverse covariance matrix:

K=Σ−1.(2)

can be standardized to produce a partial correlation matrix with each element representing the partial correlation

between yiand yjconditioned on all other variables (yi, yj|y−(i,j)) [19]:

Cor(yi, yj|y−(i,j)) = −κij

√κii√κjj

,(3)

where

κij

represents the

ith

and

jth

element of

. The GLASSO regularization method aims to estimate the inverse

covariance matrix Kby maximizing the penalized log-likelihood, which is deﬁned as [16]:

log det(K)−trace(SK)−λX

<i,j> |κij |,(4)

where

represents the sample variance-covariance matrix. The

parameter represents the penalty on the log-likelihood

such that larger values (larger penalty) results in a sparser (fewer non-zero values) inverse covariance matrix. Conversely,

smaller values (smaller penalty) results in a denser (fewer zero values) inverse covariance matrix. A GLASSO network

is represented as a partial correlation matrix using Eq. 3.

Multiple values of

are commonly used and model selection techniques such as cross-validation [

] are applied

to determine the best ﬁtting model. In the psychometric literature, a more common approach has been to apply the

extended Bayesian information criterion (EBIC) [

] to select the

parameter and best ﬁtting model. The EBIC is

deﬁned as:

EBIC = −2L+Elog(N)+4γE log(P),(5)

where

denotes log-likelihood,

the number of observations,

the number of non-zero elements in

(edges), and

the number of variables (nodes). Several

values (e.g., 100) are selected from a expotential set of values between

0 and 1. The default setting of this range is deﬁned by a minimum-maximum ratio typically set to 0.01 [

]. The

parameter of the EBIC controls how much simpler models (i.e., fewer non-zero edges) are preferred to more complex

models (i.e., fewer zero edges). The default setting for this parameter is typically set to 0.50 [15].

After estimating the GGM via the EBICglasso method, EGA estimates the number of dimensions in the network using

a community detection algorithm. There are many different community detection algorithms with some of the more

commonly applied algorithms being the Walktrap [

] and Louvain [

]. The Walktrap algorithm uses

random walks to obtain a transition matrix that speciﬁes how likely one node would be to "step" to another node. On

this transition matrix, Ward’s hierarchical clustering algorithm [

] is applied to the transition matrix and modularity

[25] is used to decide the appropriate "cut" or number of clusters should remain.

Modularity is also used as the primary objective function of the Louvain algorithm. Because of its importance for these

two algorithms, we deﬁne modularity (Q) [26]:

di=

i=1

wij ,(6)

D=1

i=1

j=1

wij ,(7)

Q=1

i=1

j=1 wij −didj

2Dδ(ci, cj),(8)

where

wij

is the weight (partial correlation) between node

and node

in the network,

is the number of nodes in the

network,

is the degree or sum of the edge weights connected to node

is the total sum of all the edge weights in

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ANEXPERIMENTALSTUDYOFDIMENSIONREDUCTIONMETHODSONMACHINELEARNINGALGORITHMSWITHAPPLICATIONSTOPSYCHOMETRICSARXIVPREPRINTSeanH.MerrittDepartmentofEconomicsClaremontGraduateUniversity150E10thSt,Claremont,CA,91711sean.merritt@cgu.eduAlexanderP.ChristensenDepartmentofPsychologyandHumanDevelopmentVanderbilt...

展开>> 收起<<

ANEXPERIMENTAL STUDY OF DIMENSION REDUCTION METHODS ON MACHINE LEARNING ALGORITHMS WITH APPLICATIONS TO PSYCHOMETRICS.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ANEXPERIMENTAL STUDY OF DIMENSION REDUCTION METHODS ON MACHINE LEARNING ALGORITHMS WITH APPLICATIONS TO PSYCHOMETRICS

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: