A Locally Adaptive Shrinkage Approach to False Selection Rate Control in High-Dimensional Classication

2025-04-27 1 0 1.19MB 55 页 10玖币

侵权投诉

A Locally Adaptive Shrinkage Approach to False

Selection Rate Control in High-Dimensional

Classiﬁcation

Bowen Gang1, Yuantao Shi2, and Wenguang Sun3

Abstract

The uncertainty quantiﬁcation and error control of classiﬁers are crucial in many

high-consequence decision-making scenarios. We propose a selective classiﬁcation

framework that provides an “indecision” option for any observations that cannot be

classiﬁed with conﬁdence. The false selection rate (FSR), deﬁned as the expected

fraction of erroneous classiﬁcations among all deﬁnitive classiﬁcations, provides a

useful error rate notion that trades oﬀ a fraction of indecisions for fewer classiﬁcation

errors. We develop a new class of locally adaptive shrinkage and selection (LASS)

rules for FSR control in the context of high-dimensional linear discriminant analysis

(LDA). LASS is easy-to-analyze and has robust performance across sparse and dense

regimes. Theoretical guarantees on FSR control are established without strong as-

sumptions on sparsity as required by existing theories in high-dimensional LDA. The

empirical performances of LASS are investigated using both simulated and real data.

Key words and phrases: Classiﬁcation with conﬁdence, False discovery rate, Linear dis-

criminant analysis, Risk control, Shrinkage estimation.

1Department of Statistics and Data Science, Fudan University.

2Department of Statistics, University of Chicago.

3Center for Data Science, Zhejiang University.

arXiv:2210.04268v1 [stat.ME] 9 Oct 2022

1 Introduction

Linear discriminant analysis (LDA) has been widely used in classiﬁcation problems. We

focus on the basic setup, which assumes that the observations are p-dimensional vector-

valued features that are drawn with equal probability from one of the two multivariate

normal distributions:

N(µ

µ1,Σ) (class 1) and N(µ

µ2,Σ) (class 2).(1.1)

Let W

W∈ Rpbe a new observation. Denote µ

µ=µ

µ1+µ

µ2

2and d

d=µ

µ1−µ

µ2. The procedure

that achieves the minimal misclassiﬁcation risk is Fisher’s linear discriminant rule:

δF=I(W

W−µ

µ)>Σ−1d

d < 0+ 2 ·I(W

W−µ

µ)>Σ−1d

d≥0,(1.2)

which assigns W

Wto class cif δF=c,c= 1,2. When µ

µ1,µ

µ2and Σ are unknown, the

common practice is to construct a data-driven LDA rule by obtaining suitable estimates of

the unknown quantities in (1.2). In the high-dimensional setting, naive sample estimates

become highly unstable, and a plethora of regularized LDA rules have been proposed and

shown to achieve substantial improvements in prediction accuracy (Friedman, 1989; Tib-

shirani et al., 2003; Witten and Tibshirani, 2009; Cai and Liu, 2011; Shao et al., 2011; Mai

et al., 2012; Cai and Zhang, 2019; among others). However, it still remains unknown how to

assess the uncertainties and control the decision errors in high-dimensional LDA. This arti-

cle proposes a selective classiﬁcation approach to controlling the false selection rate (FSR).

We develop a new class of data-driven LDA rules based on locally adaptive shrinkage and

selection (LASS), and illustrate how LASS can be deployed in decision-making scenarios

to control the FSR at a user-speciﬁed level.

1.1 Selective classiﬁcation and false selection rate

Uncertainty quantiﬁcation and error control are crucial in many sensitive decision-making

scenarios. The decision errors, which can be very expensive to correct, are often unavoidable

due to the intrinsic ambiguity of a classiﬁcation problem. Consider the ideal setting where

the multivariate normal parameters µ

µ1,µ

µ2and Σ are known. Then among all classiﬁcation

rules, the LDA rule (1.2) achieves the minimum classiﬁcation risk 1 −Φ1

2√d

d>Σ−1d

d,

where Φ(·) is the cumulative distribution function (CDF) of a standard normal variable.

However, this minimum risk can still be unacceptably high when the signal to noise ratio

√d

d>Σ−1d

dis low. The issue is exacerbated in practice, particularly in high-dimensional

settings, where we must employ “plug-in” rules learned from limited training data.

In contrast with conventional classiﬁcation algorithms which are forced to make classi-

ﬁcations on all new observations, a useful strategy for uncertainty quantiﬁcation involves

providing an indecision option (also referred to as abstention or reject option) for any ob-

servations which cannot be classiﬁed with conﬁdence. The observations with indecisions

can then be separately evaluated. This strategy is attractive in practice when the cost of

handling indecisions is less than that of ﬁxing a classiﬁcation error. To see how it aligns

with the social and policy objectives, consider a high-consequence classiﬁcation scenario

where one needs to assess the likelihood of a defendant becoming a recidivist. Obviously

the social cost of incorrectly classifying a low-risk individual as a recidivist is much higher

than that of an indecision – – it is worthy of waiting and collecting additional contex-

tual knowledge of the convicted individual to mitigate the ambiguity. Likewise, in medical

screening, a misclassiﬁcation can result in either missed medical care or unnecessary treat-

ments, both of which can be much more expensive than turning the patient over for a more

careful examination/evaluation.

Suppose we observe labeled training data Dtrain. The goal is to predict the classes for

mnew observations Dtest ={W

Wj: 1 ≤j≤m}. This article considers a selective classiﬁ-

cation framework that only makes deﬁnitive decisions on a selected subset of Dtest, while

the remaining subjects will receive indecisions (i.e. be rejected for further investigation).

The reject/indecision option, which is much less expensive to handle, is considered as a

wasted opportunity rather than a severe error. We propose to control the false selection

rate (FSR), which is the expected fraction of erroneous classiﬁcations among all deﬁnitive

classiﬁcations. Selective classiﬁcation with FSR control provides an eﬀective approach to

uncertainty quantiﬁcation and error control. We demonstrate that with the reject/indeci-

sion option, the FSR can be controlled at a user-speciﬁed level. When the signal to noise

ratio is low, the degree of ambiguity in the classiﬁcation task can be, in a sense, captured

by the fraction of indecisions in Dtest. Hence, a more powerful data-driven rule, subject to

the FSR constraint, translates to a smaller fraction of indecisions, which means that less

wasted eﬀorts are needed to perform separate evaluations.

1.2 FSR control via locally adaptive shrinkage and selection

The task of controlling the FSR in high-dimensional LDA is challenging; we start by dis-

cussing several limitations of existing works.

First, the methodology and theory of many high-dimensional LDA rules (e.g. Cai and

Liu, 2011; Shao et al., 2011; Mai et al., 2012; Cai and Zhang, 2019) critically depend on

strong sparsity assumptions, which may not hold in practice. The sparsity assumption is

counter-intuitive from the perspective of classiﬁcation error control. Consider the simple

case where all non-zero coordinates in d

d=µ

µ1−µ

µ2take the same value. Then a larger

l0norm of d

d(i.e. non-sparse setting) virtually implies that the two classes are better

separated, and hence, the control of classiﬁcation risk should become easier. However,

many state-of-the-art LDA rules lack theoretical justiﬁcations and often do poorly in the

supposedly easier non-sparse setting (Section 5). Second, the analysis of the error rate

of a classiﬁer often requires a precise quantiﬁcation of the quality of its outputs, which is

in general intractable due to the complexity of contemporary LDA rules. Finally, most

learning algorithms are driven by the need of improving prediction performance instead of

avoiding high-consequence decision errors. It is unclear how to tailor existing algorithms

to trade oﬀ a fraction of indecisions for fewer classiﬁcation errors, and further, how to

calibrate suitable data-driven thresholds to control the FSR at a user-speciﬁed level.

This article develops a class of FSR rules via a locally adaptive shrinkage and selection

(LASS) algorithm. LASS consists of three steps: ﬁrst estimating a score according to the

LDA rule (1.2), second ordering all individuals according to the estimated scores, and ﬁnally

choosing upper and lower data-adaptive thresholds to select individuals into the two classes,

with the unselected ones assigned to the indecision group. We prove theories to establish

the asymptotic validity of LASS for FSR control under mild conditions. A key innovation

in our method is the construction of intuitive and easy-to-analyze shrinkage factors that are

capable of reducing uncertainties without strong assumptions on sparsity. LASS provides a

principled and theoretically solid LDA rule that has comparable performance with state-of-

the-art classiﬁcation rules (e.g. Cai and Liu, 2011; Shao et al., 2011; Cai and Zhang, 2019)

in the sparse setting and substantially better performance under the non-sparse setting.

The theoretical adaptiveness of LASS to the unknown sparsity and its robust numerical

performance across sparse and dense settings are attractive for practitioners – particularly

in real world applications we only “bet on sparsity”; this working assumption (of sparsity)

can distort the hardness of the problem, and hence, lead to wrong choices of method.

1.3 Our contributions

Our work makes several contributions. First, selective classiﬁcation via FSR control pro-

vides a useful approach to risk-sensitive decision-making scenarios, where classiﬁcation

errors have high impacts on one’s social, economic or health status. Second, we develop

a novel shrinkage rule for estimating the linear discriminant score, which is eﬀective for

reducing the uncertainties in high dimensions. The estimator is intuitive, assumption-lean,

easy-to-analyze and enjoys strong theoretical properties. Third, we derive data-adaptive

decision boundaries based on the shrunken LDA rule to select and classify the observations.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ALocallyAdaptiveShrinkageApproachtoFalseSelectionRateControlinHigh-DimensionalClassicationBowenGang1,YuantaoShi2,andWenguangSun3AbstractTheuncertaintyquanticationanderrorcontrolofclassiersarecrucialinmanyhigh-consequencedecision-makingscenarios.Weproposeaselectiveclassicationframeworkthatprovide...

展开>> 收起<<

A Locally Adaptive Shrinkage Approach to False Selection Rate Control in High-Dimensional Classication.pdf

共55页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Locally Adaptive Shrinkage Approach to False Selection Rate Control in High-Dimensional Classication

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: