Imbalanced Classification in Medical Imaging via Regrouping Le Peng1 Yash Travadi2 Rui Zhang3 Ying Cui4 Ju Sun1

2025-05-08 0 0 374.38KB 5 页 10玖币
侵权投诉
Imbalanced Classification in Medical Imaging
via Regrouping
Le Peng1, Yash Travadi2, Rui Zhang3, Ying Cui4, Ju Sun1
1Computer Science & Engineering, University of Minnesota, Twin Cities
2School of Statistics, University of Minnesota, Twin Cities
3Department of Surgery, University of Minnesota, Twin Cities
4Industrial and Systems Engineering, University of Minnesota, Twin Cities
{peng0347,trava029,zhan1386,yingcui,jusun}@umn.edu
Abstract
We propose performing imbalanced classification by regrouping majority classes
into small classes so that we turn the problem into balanced multiclass classifica-
tion. This new idea is dramatically different from popular loss reweighting and
class resampling methods. Our preliminary result on imbalanced medical image
classification shows that this natural idea can substantially boost the classifica-
tion performance as measured by average precision (approximately area-under-
the-precision-recall-curve, or AUPRC), which is more appropriate for evaluating
imbalanced classification than other metrics such as balanced accuracy.
The shaky foundation of machine learning (ML) for medical imaging
Modern data classifica-
tion is founded on accuracy (ACC) maximization, or equivalently error minimization:
min
f∈H
E(x,y)∼D1{y6=f(x)}=X
i
P(i=y)Ex|y=i1{y6=f(x)},(1)
where
H
is the hypothesis class,
D
is the data distribution, and
y
is the one-hot-encoded vector of
y
.
But accuracy can be misleading if the data are imbalanced across classes, e.g.,
D(y)
not uniform.
This intrinsic class imbalance is prevalent in medical imaging classification (MIC), e.g., in binary
cases, the negative rate is expected to far exceed the positive rate in most diseases, and in multiclass
cases, the prevalence rates of different diseases are generally disparate. A slightly refined notion is
balanced-accuracy (BA), which leads to the balanced-error (BE) minimization:
min
f∈H
1
| {y} | X
i
Ex|y=i1{y6=f(x)}.(2)
Figure 1: An example confusion table
for binary classification, and the various
associated performance metrics.
From a quick example in Fig. 1, it is clear that BA can
be more responsive to recall performance than ACC, but
it still fails to capture low precision which together with
recall probably matters the most for medical diagnosis.
Hence, for MIC, reporting precision-recall and the as-
sociated F1 score seems most appropriate [
1
,
2
]. But
precision/recall and the F1 score depend on the decision
threshold, for which our typical choice following the bal-
anced scenarios is provably suboptimal with class imbal-
ance [
3
,
4
]. Hence, the
area under the precision-recall
curve
(AUPRC), which is often approximated by
average
precision
(AP), seems a more sensible choice, as it only
depends on the ordering of the prediction scores and not
Preprint. Under review.
arXiv:2210.12234v2 [cs.CV] 26 Oct 2022
摘要:

ImbalancedClassicationinMedicalImagingviaRegroupingLePeng1,YashTravadi2,RuiZhang3,YingCui4,JuSun11ComputerScience&Engineering,UniversityofMinnesota,TwinCities2SchoolofStatistics,UniversityofMinnesota,TwinCities3DepartmentofSurgery,UniversityofMinnesota,TwinCities4IndustrialandSystemsEngineering,Uni...

展开>> 收起<<
Imbalanced Classification in Medical Imaging via Regrouping Le Peng1 Yash Travadi2 Rui Zhang3 Ying Cui4 Ju Sun1.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:374.38KB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注