Stacked Penalized Logistic Regression for Selecting Views in Multi-View Learning Wouter van Loon1 Marjolein Fokkema1 Frank de Vos12 Marisa

2025-05-03 0 0 881.6KB 49 页 10玖币

侵权投诉

Stacked Penalized Logistic Regression for

Selecting Views in Multi-View Learning

Wouter van Loon1, Marjolein Fokkema1, Frank de Vos1,2, Marisa

Koini3, Reinhold Schmidt3, and Mark de Rooij1,2

1Department of Methodology and Statistics, Leiden University

2Leiden Institute for Brain and Cognition

3Division of Neurogeriatrics, Department of Neurology, Medical

University of Graz

June 7, 2024

Abstract

Data for which a set of objects is described by multiple distinct feature sets

(called views) is known as multi-view data. When missing values occur in

multi-view data, all features in a view are likely to be missing simultaneously.

This may lead to very large quantities of missing data which, especially when

combined with high-dimensionality, can make the application of conditional

imputation methods computationally infeasible. However, the multi-view

structure could be leveraged to reduce the complexity and computational

load of imputation. We introduce a new imputation method based on the

existing stacked penalized logistic regression (StaPLR) algorithm for multi-

view learning. It performs imputation in a dimension-reduced space to ad-

dress computational challenges inherent to the multi-view context. We com-

pare the performance of the new imputation method with several existing

imputation algorithms in simulated data sets and a real data application.

The results show that the new imputation method leads to competitive re-

sults at a much lower computational cost, and makes the use of advanced

imputation algorithms such as missForest and predictive mean matching

possible in settings where they would otherwise be computationally infeasi-

ble.

keywords missing data,imputation,multi-view learning,stacked generaliza-

tion,feature selection

Accepted for publication in Information Fusion at https://doi.org/10.1016/j.inffus.2024.102524.

http://creativecommons.org/licenses/by/4.0/

arXiv:2210.14484v4 [stat.ML] 20 Jun 2024

1 Introduction

Multi-view data refers to any data set where the features have been divided into

distinct feature sets [1, 2, 3]1. Such data sets are particularly common in the

biomedical domain where these feature sets, commonly called views, often corre-

spond to diﬀerent data sources or modalities [4, 5, 6, 7]. Classiﬁcation models of

disease using information from multiple views generally lead to better performance

than models using only a single view [8, 9, 10, 11, 12, 13]. Traditionally, infor-

mation from diﬀerent views is often combined using simple feature concatenation,

where the features corresponding to diﬀerent views are simply aggregated into a

single feature matrix, so that traditional machine learning methods can be deployed

[4]. More recently, dedicated multi-view machine learning techniques have been

developed, which are speciﬁcally designed to handle the multi-view structure of

the data [2, 4]. One such multi-view learning technique is stacked penalized logis-

tic regression (StaPLR) [14]. In addition to improving classiﬁcation performance,

StaPLR can automatically select the views that are most relevant for prediction

[14, 15, 16]. This ability to select the most relevant views is particularly impor-

tant in the biomedical sciences [4] where selecting, for example, a subset of brain

scan types [16], could drastically reduce costs in future measurements, and prevent

patients from undergoing unnecessary medical procedures. Furthermore, models

which select views rather than individual features tend to be more interpretable

[16].

In practice, not all views may be observed for all subjects. When confronted

with missing views, typical approaches are to remove any subjects with at least one

missing value from the data set (called list-wise deletion or complete case analysis

(CCA)), or to replace missing values by some substituted value, a process known

as imputation. In biomedical studies, a single view may consist of thousands or

even millions of features. With the traditional approach of feature concatenation,

in the presence of missing views, CCA leads to a massive loss of information, while

imputation may be computationally infeasible. In this article we propose a new

method for dealing with missing views, based on the StaPLR algorithm. We show

how this method requires much less computation by imputing missing values in a

dimension-reduced space, rather than in the original feature space. We compare

our proposed imputation method with imputation methods applied in the original

feature space.

1Depending on the research area, multi-view data is sometimes called multi-block, multi-set,

multi-group, or multi-table data [3].

2 Methods

Missing values are often divided into three categories: missing completely at ran-

dom (MCAR), missing at random (MAR), or missing not at random (MNAR)

[17, 18]. Values are said to be MCAR if the causes of the missingness are unre-

lated to both missing and observed data [18]. Examples include random machine

failure, or missingness introduced by analyzing a random sub-sample of the data.

If the missingness is not completely random but depends only on observed data,

the missing values are said to be MAR [18]. If the missingness instead depends on

unobserved factors, the missing values are said to be MNAR [18]. Here, we will

focus on MCAR missing values.

The simplest way of dealing with MCAR missing values is to discard obser-

vations with at least one missing value through complete case analysis. However,

this approach is potentially very wasteful since a single missing value causes an

entire observation to be removed from the data. CCA may therefore remove many

more observed values from the data than the number of values initially missing,

and drastically reduce the sample size, leading to increased variance and therefore

less accurate predictions.

To prevent wasting observed data, missing values can be imputed. The sim-

plest form of imputation is to replace each missing value with a constant. A very

common choice is the unconditional mean of the feature, a procedure known as

(unconditional) mean imputation (MI). If one is primarily interested in prediction,

MI has some favorable properties: Its computational cost is extremely small, and

it has been shown that MI is universally consistent for prediction even for MAR

data, as long as the learning algorithm used is also universally consistent [19]. Here

consistent means that, given an inﬁnite amount of training data, the prediction

function achieves the error rate of the best possible prediction function (i.e., the

Bayes rate), while universal means that the procedure is consistent for all pos-

sible data distributions [19]. However, MI is often criticized because it is known

to distort the data distribution by attenuating existing correlations between the

features, underestimating the variance, and causing bias in almost any estimate

other than the mean [18].

Many more sophisticated imputation methods have been developed. The lit-

erature on the imputation of missing values is vast, and we do not aim to give a

complete overview here. However, most of the popular imputation methods can

be grouped in a number of categories. The ﬁrst such category consists of cold deck

imputation methods, which impute missing values using observed values from a

diﬀerent data set [20]. However, this requires suitable additional data to be avail-

able, which is often not the case. By contrast, hot deck-style imputation methods

[21] are more generally applicable. For each observation with missing values, these

imputation methods ﬁnd one or several complete observations in the data which

are most similar to the observation with missing values [21]. The observed val-

ues of these cases, or some function thereof, are then used to impute the missing

values of the incomplete case. The most popular example is imputation based on

the k-nearest neighbors (kNN) algorithm [22]. A diﬀerent category of imputation

methods is that of regression-based imputation. This includes the state-of-the-art

multiple imputation through chained equations (MICE) [23]. Another category is

based on matrix factorization, which includes Adaptive-Impute [24], and various

other methods [25] based on, for example, principal component analysis (PCA)

[26] or multiple factor analysis (MFA) [27]. More recently, tree-based imputation

methods such as missForest [28] have become popular. Finally, there are deep

learning imputation methods which are generally based on auto-encoders, such as

multiple imputation with denoising autoencoders (MIDAS) [29] or missing data

importance-weighted autoencoder (MIWAE) [30], and/or based on generative ad-

versarial networks, such as generative adversarial imputation nets (GAIN) [31] or

graph imputation neural networks (GINN) [32]. Some of the most sophisticated

imputation methods may combine ideas from several of the aforementioned cate-

gories. Predictive mean matching (PMM) [18], for example, uses regression-based

imputation to ﬁnd cases in the data which are most similar in terms of their pre-

dicted values. It is worth noting that it is generally preferable to generate not one,

but multiple imputed data sets, so that correct variance estimates can be obtained

[18]; this is known as multiple imputation [18].

We can also categorize the existing imputation methods depending on whether

they perform unconditional or conditional imputation. We deﬁne an unconditional

imputation method as any method in which the imputation of a missing value is

based solely on other observations of the same feature, that is, the imputation

takes place within a single column of the feature matrix. The aforementioned

mean imputation is a classic example of an unconditional imputation method. By

contrast, a conditional imputation method is any method in which the imputation

of a missing value is based, in part or completely, on observations of other features,

that is, the imputation uses diﬀerent columns of the feature matrix. Most sophis-

ticated imputation methods, such as Bayesian multiple imputation and PMM,

are conditional imputation methods. The distinction between unconditional and

conditional imputation methods is of particular interest for feature selection. Un-

conditional imputation methods, such as mean imputation, use only the univariate

distributions for imputation, so that the imputed feature remains in some sense

‘pure’ and free from contamination from other features. However, as mentioned

earlier, mean imputation is known to distort the data distribution by attenuating

existing correlations between the features [18].

By contrast, some (but not all) conditional imputation methods preserve the

correlations between features [18]. However, in this case the imputed values depend

on other features in the data. In the event that a selected feature has a large

number of imputed values, this may lead to diﬃculties in interpretation, since a

large proportion of the selected feature is derived from other features. Nevertheless,

a recent study on the eﬀect of imputation methods on feature selection suggests

sophisticated conditional imputation methods generally lead to better results than

unconditional imputation methods [33]. Because it is not possible to both perform

the imputation independent of other features and preserve existing correlations,

one has to choose between one or the other.

It should be noted that other methods for handling missing data exist which

do not explicitly impute missing values. These methods incorporate the missing

data handling directly into the model ﬁtting procedure and include likelihood-

based methods such as full information maximum likelihood (FIML) [34, 35] for

parametric regression models, and missingness incorporated in attributes (MIA)

[36] for decision trees. However, these methods are less broadly applicable than

imputation methods [18, 19, 35] and we do not consider them here.

2.1 From Missing Features to Missing Views

In multi-view data, it is likely that missingness will occur at the view level, rather

than at the feature level [37, 38]. Missing views may occur at random and/or by

design [37, 38]. In a study where one of the views corresponds to features derived

from a magnetic resonance imaging (MRI) scan, factors like the MRI scanner ex-

periencing machine failure, a mistake in the scanning protocol by the researcher

administering the scan, or a subject simply not making it to their appointment

in time due to heavy traﬃc, would lead to all features of this view being simul-

taneously missing. Likewise, if one of the views corresponds to features derived

from a sample of blood or cerebral spinal ﬂuid (CSF), a lost or contaminated sam-

ple would lead to all derived features being simultaneously missing. Note that

in these cases, although the missingness occurs at the view level, the underlying

mechanism is still MCAR. Another common example of MCAR data occurs in the

case of planned missingness, where the missing values are part of the study design.

For example, it may be considered too expensive to administer an MRI scan to

all study participants, so instead an MRI scan is administered only to a random

sub-sample of the participants. Again the underlying mechanism is MCAR, but all

features corresponding to the MRI scan will be missing simultaneously for the un-

measured sub-sample. Throughout the rest of this article we will assume that (1)

for each observation, a view is either completely missing or completely observed,

and (2) the missingness is completely at random (MCAR).

Conceptually, one could impute a missing view by ﬁrst applying feature con-

catenation, and then simply applying a chosen imputation method on the concate-

nated feature set. However, in practice this may be impossible. For example, if

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

StackedPenalizedLogisticRegressionforSelectingViewsinMulti-ViewLearningWoutervanLoon1,MarjoleinFokkema1,FrankdeVos1,2,MarisaKoini3,ReinholdSchmidt3,andMarkdeRooij1,21DepartmentofMethodologyandStatistics,LeidenUniversity2LeidenInstituteforBrainandCognition3DivisionofNeurogeriatrics,DepartmentofNeurol...

展开>> 收起<<

Stacked Penalized Logistic Regression for Selecting Views in Multi-View Learning Wouter van Loon1 Marjolein Fokkema1 Frank de Vos12 Marisa.pdf

共49页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Stacked Penalized Logistic Regression for Selecting Views in Multi-View Learning Wouter van Loon1 Marjolein Fokkema1 Frank de Vos12 Marisa

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: