A Locally Adaptive Shrinkage Approach to False Selection Rate Control in High-Dimensional Classication

2025-04-27 1 0 1.19MB 55 页 10玖币
侵权投诉
A Locally Adaptive Shrinkage Approach to False
Selection Rate Control in High-Dimensional
Classification
Bowen Gang1, Yuantao Shi2, and Wenguang Sun3
Abstract
The uncertainty quantification and error control of classifiers are crucial in many
high-consequence decision-making scenarios. We propose a selective classification
framework that provides an “indecision” option for any observations that cannot be
classified with confidence. The false selection rate (FSR), defined as the expected
fraction of erroneous classifications among all definitive classifications, provides a
useful error rate notion that trades off a fraction of indecisions for fewer classification
errors. We develop a new class of locally adaptive shrinkage and selection (LASS)
rules for FSR control in the context of high-dimensional linear discriminant analysis
(LDA). LASS is easy-to-analyze and has robust performance across sparse and dense
regimes. Theoretical guarantees on FSR control are established without strong as-
sumptions on sparsity as required by existing theories in high-dimensional LDA. The
empirical performances of LASS are investigated using both simulated and real data.
Key words and phrases: Classification with confidence, False discovery rate, Linear dis-
criminant analysis, Risk control, Shrinkage estimation.
1Department of Statistics and Data Science, Fudan University.
2Department of Statistics, University of Chicago.
3Center for Data Science, Zhejiang University.
arXiv:2210.04268v1 [stat.ME] 9 Oct 2022
1 Introduction
Linear discriminant analysis (LDA) has been widely used in classification problems. We
focus on the basic setup, which assumes that the observations are p-dimensional vector-
valued features that are drawn with equal probability from one of the two multivariate
normal distributions:
N(µ
µ
µ1,Σ) (class 1) and N(µ
µ
µ2,Σ) (class 2).(1.1)
Let W
W
W∈ Rpbe a new observation. Denote µ
µ
µ=µ
µ
µ1+µ
µ
µ2
2and d
d
d=µ
µ
µ1µ
µ
µ2. The procedure
that achieves the minimal misclassification risk is Fisher’s linear discriminant rule:
δF=I(W
W
Wµ
µ
µ)>Σ1d
d
d < 0+ 2 ·I(W
W
Wµ
µ
µ)>Σ1d
d
d0,(1.2)
which assigns W
W
Wto class cif δF=c,c= 1,2. When µ
µ
µ1,µ
µ
µ2and Σ are unknown, the
common practice is to construct a data-driven LDA rule by obtaining suitable estimates of
the unknown quantities in (1.2). In the high-dimensional setting, naive sample estimates
become highly unstable, and a plethora of regularized LDA rules have been proposed and
shown to achieve substantial improvements in prediction accuracy (Friedman, 1989; Tib-
shirani et al., 2003; Witten and Tibshirani, 2009; Cai and Liu, 2011; Shao et al., 2011; Mai
et al., 2012; Cai and Zhang, 2019; among others). However, it still remains unknown how to
assess the uncertainties and control the decision errors in high-dimensional LDA. This arti-
cle proposes a selective classification approach to controlling the false selection rate (FSR).
We develop a new class of data-driven LDA rules based on locally adaptive shrinkage and
selection (LASS), and illustrate how LASS can be deployed in decision-making scenarios
to control the FSR at a user-specified level.
1
1.1 Selective classification and false selection rate
Uncertainty quantification and error control are crucial in many sensitive decision-making
scenarios. The decision errors, which can be very expensive to correct, are often unavoidable
due to the intrinsic ambiguity of a classification problem. Consider the ideal setting where
the multivariate normal parameters µ
µ
µ1,µ
µ
µ2and Σ are known. Then among all classification
rules, the LDA rule (1.2) achieves the minimum classification risk 1 Φ1
2d
d
d>Σ1d
d
d,
where Φ(·) is the cumulative distribution function (CDF) of a standard normal variable.
However, this minimum risk can still be unacceptably high when the signal to noise ratio
d
d
d>Σ1d
d
dis low. The issue is exacerbated in practice, particularly in high-dimensional
settings, where we must employ “plug-in” rules learned from limited training data.
In contrast with conventional classification algorithms which are forced to make classi-
fications on all new observations, a useful strategy for uncertainty quantification involves
providing an indecision option (also referred to as abstention or reject option) for any ob-
servations which cannot be classified with confidence. The observations with indecisions
can then be separately evaluated. This strategy is attractive in practice when the cost of
handling indecisions is less than that of fixing a classification error. To see how it aligns
with the social and policy objectives, consider a high-consequence classification scenario
where one needs to assess the likelihood of a defendant becoming a recidivist. Obviously
the social cost of incorrectly classifying a low-risk individual as a recidivist is much higher
than that of an indecision – – it is worthy of waiting and collecting additional contex-
tual knowledge of the convicted individual to mitigate the ambiguity. Likewise, in medical
screening, a misclassification can result in either missed medical care or unnecessary treat-
ments, both of which can be much more expensive than turning the patient over for a more
careful examination/evaluation.
Suppose we observe labeled training data Dtrain. The goal is to predict the classes for
mnew observations Dtest ={W
W
Wj: 1 jm}. This article considers a selective classifi-
cation framework that only makes definitive decisions on a selected subset of Dtest, while
2
the remaining subjects will receive indecisions (i.e. be rejected for further investigation).
The reject/indecision option, which is much less expensive to handle, is considered as a
wasted opportunity rather than a severe error. We propose to control the false selection
rate (FSR), which is the expected fraction of erroneous classifications among all definitive
classifications. Selective classification with FSR control provides an effective approach to
uncertainty quantification and error control. We demonstrate that with the reject/indeci-
sion option, the FSR can be controlled at a user-specified level. When the signal to noise
ratio is low, the degree of ambiguity in the classification task can be, in a sense, captured
by the fraction of indecisions in Dtest. Hence, a more powerful data-driven rule, subject to
the FSR constraint, translates to a smaller fraction of indecisions, which means that less
wasted efforts are needed to perform separate evaluations.
1.2 FSR control via locally adaptive shrinkage and selection
The task of controlling the FSR in high-dimensional LDA is challenging; we start by dis-
cussing several limitations of existing works.
First, the methodology and theory of many high-dimensional LDA rules (e.g. Cai and
Liu, 2011; Shao et al., 2011; Mai et al., 2012; Cai and Zhang, 2019) critically depend on
strong sparsity assumptions, which may not hold in practice. The sparsity assumption is
counter-intuitive from the perspective of classification error control. Consider the simple
case where all non-zero coordinates in d
d
d=µ
µ
µ1µ
µ
µ2take the same value. Then a larger
l0norm of d
d
d(i.e. non-sparse setting) virtually implies that the two classes are better
separated, and hence, the control of classification risk should become easier. However,
many state-of-the-art LDA rules lack theoretical justifications and often do poorly in the
supposedly easier non-sparse setting (Section 5). Second, the analysis of the error rate
of a classifier often requires a precise quantification of the quality of its outputs, which is
in general intractable due to the complexity of contemporary LDA rules. Finally, most
learning algorithms are driven by the need of improving prediction performance instead of
3
avoiding high-consequence decision errors. It is unclear how to tailor existing algorithms
to trade off a fraction of indecisions for fewer classification errors, and further, how to
calibrate suitable data-driven thresholds to control the FSR at a user-specified level.
This article develops a class of FSR rules via a locally adaptive shrinkage and selection
(LASS) algorithm. LASS consists of three steps: first estimating a score according to the
LDA rule (1.2), second ordering all individuals according to the estimated scores, and finally
choosing upper and lower data-adaptive thresholds to select individuals into the two classes,
with the unselected ones assigned to the indecision group. We prove theories to establish
the asymptotic validity of LASS for FSR control under mild conditions. A key innovation
in our method is the construction of intuitive and easy-to-analyze shrinkage factors that are
capable of reducing uncertainties without strong assumptions on sparsity. LASS provides a
principled and theoretically solid LDA rule that has comparable performance with state-of-
the-art classification rules (e.g. Cai and Liu, 2011; Shao et al., 2011; Cai and Zhang, 2019)
in the sparse setting and substantially better performance under the non-sparse setting.
The theoretical adaptiveness of LASS to the unknown sparsity and its robust numerical
performance across sparse and dense settings are attractive for practitioners – particularly
in real world applications we only “bet on sparsity”; this working assumption (of sparsity)
can distort the hardness of the problem, and hence, lead to wrong choices of method.
1.3 Our contributions
Our work makes several contributions. First, selective classification via FSR control pro-
vides a useful approach to risk-sensitive decision-making scenarios, where classification
errors have high impacts on one’s social, economic or health status. Second, we develop
a novel shrinkage rule for estimating the linear discriminant score, which is effective for
reducing the uncertainties in high dimensions. The estimator is intuitive, assumption-lean,
easy-to-analyze and enjoys strong theoretical properties. Third, we derive data-adaptive
decision boundaries based on the shrunken LDA rule to select and classify the observations.
4
摘要:

ALocallyAdaptiveShrinkageApproachtoFalseSelectionRateControlinHigh-DimensionalClassi cationBowenGang1,YuantaoShi2,andWenguangSun3AbstractTheuncertaintyquanti cationanderrorcontrolofclassi ersarecrucialinmanyhigh-consequencedecision-makingscenarios.Weproposeaselectiveclassi cationframeworkthatprovide...

展开>> 收起<<
A Locally Adaptive Shrinkage Approach to False Selection Rate Control in High-Dimensional Classication.pdf

共55页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:55 页 大小:1.19MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 55
客服
关注