Can Calibration Improve Sample Prioritization Ganesh Tata University of Alberta

2025-04-24 0 0 1.24MB 8 页 10玖币
侵权投诉
Can Calibration Improve Sample Prioritization?
Ganesh Tata
University of Alberta
gtata@ualberta.ca
Gautham Krishna Gudur
Global AI Accelerator, Ericsson
gautham.krishna.gudur@ericsson.com
Gopinath Chennupati
Amazon Alexa
cgnath.dr@gmail.com
Mohammad Emtiyaz Khan
RIKEN Center for AI Project
emtiyaz.khan@riken.jp
Abstract
Calibration can reduce overconfident predictions of deep neural networks, but can
calibration also accelerate training? In this paper, we show that it can when used
to prioritize some examples for performing subset selection. We study the effect of
popular calibration techniques in selecting better subsets of samples during training
(also called sample prioritization) and observe that calibration can improve the
quality of subsets, reduce the number of examples per epoch (by at least 70%),
and can thereby speed up the overall training process. We further study the effect
of using calibrated pre-trained models coupled with calibration during training to
guide sample prioritization, which again seems to improve the quality of samples
selected.
1 Introduction
Calibration is a widely used technique in machine learning to reduce overconfidence in predictions.
Modern deep neural networks are known to be overconfident classifiers or predictors, and calibrated
networks provide trustworthy and reliable confidence estimates [
1
]. Hence, finding new calibration
techniques and improving them has been an active area of research [1, 6, 10, 11].
In this paper, we ask if calibration aids in accelerating training by using sample prioritization, i.e.,
we select training samples based on calibrated predictions to better steer the training performance.
We explore different calibration techniques and focus on selecting a subset with the most informative
samples during each epoch. We observe that calibration performed during training plays a crucial
role in choosing the most informative subsets, which in turn accelerates neural network training.
We then investigate the effect of an external pre-trained model which is well-calibrated (with larger
capacity) on the sample selection process during training.
Our contributions are as follows,
We provide an in-depth study analyzing the effect of various calibration techniques on sample
prioritization during training. We also consider pre-trained calibrated target models and observe their
effect on sample prioritization along with calibration during training. We benchmark our findings
on widely used CIFAR-10 and CIFAR-100 datasets and observe the improved quality of the chosen
subsets across different subset sizes, which ensures faster deep neural network training.
Both authors contributed equally to this work.
Has it Trained Yet? Workshop at the Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.06592v2 [cs.LG] 16 Nov 2022
2 Background
2.1 Problem Statement
We formulate the problem in the paper as follows. A calibration technique
C
is performed during
training at each epoch, and a sample prioritization function
a
is then used to select the most informative
samples for training each subsequent epoch. We use Expected Calibration Error (ECE) for model
calibration [
10
], which measures the absolute difference between the model’s accuracy and its
confidence.
The paper discusses how a calibration technique
C
, when coupled with a sample prioritization
function
a
, affects the performance (accuracy and calibration error (ECE)) of the model. In addition,
we also observe if this phenomenon can aid in faster and more efficient training. We hypothesize
a closer relationship between calibration and sample prioritization during training, wherein the
calibrated model probabilities at each epoch are used by a sample prioritization criterion to select the
most informative samples for training each subsequent epoch.
2.2 Calibration
Calibration is a technique that curbs overconfident predictions in deep neural networks, wherein the
predicted (softmax) probabilities reflect true probabilities of correctness (better confidence estimates)
[
1
]. In this paper, we consider various prominently used calibration techniques which are performed
during training.
Label Smoothing
implicitly calibrates a model by discouraging overconfident prediction probabilities
during training [
9
]. The one-hot encoded ground truth labels (
yk
) are smoothened using a parameter
α
, that is
yLS
k=yk(1 α) + α/K
, where
K
is the number of classes. These smoothened targets
yLS
kand predicted outputs pkare then used to minimize the cross-entropy loss.
Mixup
is a data augmentation method [
14
] which is shown to output well-calibrated predictive scores
[13], and is again performed during training.
¯x=λxi+ (1 λ)xj
¯y=λyi+ (1 λ)yj
where
xi
and
xj
are two input data points that are randomly sampled, and
yi
and
yj
are their respective
one-hot encoded labels. Here, λBeta(α, α)with λ[0,1].
Focal Loss
is an alternative loss function to cross-entropy which yields calibrated probabilities by
minimizing a regularized KL divergence between the predicted and target distributions [8].
LF ocal =(1 p)γlogp
where
p
is the probability assigned by the model to the ground-truth correct class, and
γ
is a
hyperparameter. When compared with cross-entropy, Focal Loss has an added factor that encourages
the samples predicted with correct classes to have lower probabilities. This enables the predicted
distribution to have higher entropy, thereby helping avoid overconfident predictions.
2.3 Sample Prioritization
Sample prioritization is the process of selecting important samples during different stages of training
to accelerate the training process of a deep neural network without compromising on performance. In
this paper, we perform sample prioritization during training using Max Entropy, which is a de facto
uncertainty sampling technique to select the most efficient samples at each epoch.
Max Entropy
selects the most informative samples (top-
k
) that maximize the predictive entropy [
12
].
H[y|x, Dtrain] := X
c
p(y=c|x, Dtrain) log p(y=c|x, Dtrain)
2.4 Pre-trained Calibrated Target models
Pre-trained models have been widely used in literature to obtain comprehensive sample representations
before training a downstream task [
5
]. We use a pre-trained calibrated model with larger capacity
2
摘要:

CanCalibrationImproveSamplePrioritization?GaneshTataUniversityofAlbertagtata@ualberta.caGauthamKrishnaGudurGlobalAIAccelerator,Ericssongautham.krishna.gudur@ericsson.comGopinathChennupatiAmazonAlexacgnath.dr@gmail.comMohammadEmtiyazKhanRIKENCenterforAIProjectemtiyaz.khan@riken.jpAbstractCalibratio...

展开>> 收起<<
Can Calibration Improve Sample Prioritization Ganesh Tata University of Alberta.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:1.24MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注