Can Calibration Improve Sample Prioritization Ganesh Tata University of Alberta

2025-04-24 0 0 1.24MB 8 页 10玖币

侵权投诉

Can Calibration Improve Sample Prioritization?

Ganesh Tata∗

University of Alberta

gtata@ualberta.ca

Gautham Krishna Gudur∗

Global AI Accelerator, Ericsson

gautham.krishna.gudur@ericsson.com

Gopinath Chennupati

Amazon Alexa

cgnath.dr@gmail.com

Mohammad Emtiyaz Khan

RIKEN Center for AI Project

emtiyaz.khan@riken.jp

Abstract

Calibration can reduce overconﬁdent predictions of deep neural networks, but can

calibration also accelerate training? In this paper, we show that it can when used

to prioritize some examples for performing subset selection. We study the effect of

popular calibration techniques in selecting better subsets of samples during training

(also called sample prioritization) and observe that calibration can improve the

quality of subsets, reduce the number of examples per epoch (by at least 70%),

and can thereby speed up the overall training process. We further study the effect

of using calibrated pre-trained models coupled with calibration during training to

guide sample prioritization, which again seems to improve the quality of samples

selected.

1 Introduction

Calibration is a widely used technique in machine learning to reduce overconﬁdence in predictions.

Modern deep neural networks are known to be overconﬁdent classiﬁers or predictors, and calibrated

networks provide trustworthy and reliable conﬁdence estimates [

]. Hence, ﬁnding new calibration

techniques and improving them has been an active area of research [1, 6, 10, 11].

In this paper, we ask if calibration aids in accelerating training by using sample prioritization, i.e.,

we select training samples based on calibrated predictions to better steer the training performance.

We explore different calibration techniques and focus on selecting a subset with the most informative

samples during each epoch. We observe that calibration performed during training plays a crucial

role in choosing the most informative subsets, which in turn accelerates neural network training.

We then investigate the effect of an external pre-trained model which is well-calibrated (with larger

capacity) on the sample selection process during training.

Our contributions are as follows,

We provide an in-depth study analyzing the effect of various calibration techniques on sample

prioritization during training. We also consider pre-trained calibrated target models and observe their

effect on sample prioritization along with calibration during training. We benchmark our ﬁndings

on widely used CIFAR-10 and CIFAR-100 datasets and observe the improved quality of the chosen

subsets across different subset sizes, which ensures faster deep neural network training.

∗Both authors contributed equally to this work.

Has it Trained Yet? Workshop at the Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.06592v2 [cs.LG] 16 Nov 2022

2 Background

2.1 Problem Statement

We formulate the problem in the paper as follows. A calibration technique

is performed during

training at each epoch, and a sample prioritization function

is then used to select the most informative

samples for training each subsequent epoch. We use Expected Calibration Error (ECE) for model

calibration [

], which measures the absolute difference between the model’s accuracy and its

conﬁdence.

The paper discusses how a calibration technique

, when coupled with a sample prioritization

function

, affects the performance (accuracy and calibration error (ECE)) of the model. In addition,

we also observe if this phenomenon can aid in faster and more efﬁcient training. We hypothesize

a closer relationship between calibration and sample prioritization during training, wherein the

calibrated model probabilities at each epoch are used by a sample prioritization criterion to select the

most informative samples for training each subsequent epoch.

2.2 Calibration

Calibration is a technique that curbs overconﬁdent predictions in deep neural networks, wherein the

predicted (softmax) probabilities reﬂect true probabilities of correctness (better conﬁdence estimates)

[

]. In this paper, we consider various prominently used calibration techniques which are performed

during training.

Label Smoothing

implicitly calibrates a model by discouraging overconﬁdent prediction probabilities

during training [

]. The one-hot encoded ground truth labels (

) are smoothened using a parameter

, that is

yLS

k=yk(1 −α) + α/K

, where

is the number of classes. These smoothened targets

yLS

kand predicted outputs pkare then used to minimize the cross-entropy loss.

Mixup

is a data augmentation method [

] which is shown to output well-calibrated predictive scores

[13], and is again performed during training.

¯x=λxi+ (1 −λ)xj

¯y=λyi+ (1 −λ)yj

where

and

are two input data points that are randomly sampled, and

and

are their respective

one-hot encoded labels. Here, λ∼Beta(α, α)with λ∈[0,1].

Focal Loss

is an alternative loss function to cross-entropy which yields calibrated probabilities by

minimizing a regularized KL divergence between the predicted and target distributions [8].

LF ocal =−(1 −p)γlogp

where

is the probability assigned by the model to the ground-truth correct class, and

is a

hyperparameter. When compared with cross-entropy, Focal Loss has an added factor that encourages

the samples predicted with correct classes to have lower probabilities. This enables the predicted

distribution to have higher entropy, thereby helping avoid overconﬁdent predictions.

2.3 Sample Prioritization

Sample prioritization is the process of selecting important samples during different stages of training

to accelerate the training process of a deep neural network without compromising on performance. In

this paper, we perform sample prioritization during training using Max Entropy, which is a de facto

uncertainty sampling technique to select the most efﬁcient samples at each epoch.

Max Entropy

selects the most informative samples (top-

) that maximize the predictive entropy [

H[y|x, Dtrain] := −X

p(y=c|x, Dtrain) log p(y=c|x, Dtrain)

2.4 Pre-trained Calibrated Target models

Pre-trained models have been widely used in literature to obtain comprehensive sample representations

before training a downstream task [

]. We use a pre-trained calibrated model with larger capacity

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CanCalibrationImproveSamplePrioritization?GaneshTataUniversityofAlbertagtata@ualberta.caGauthamKrishnaGudurGlobalAIAccelerator,Ericssongautham.krishna.gudur@ericsson.comGopinathChennupatiAmazonAlexacgnath.dr@gmail.comMohammadEmtiyazKhanRIKENCenterforAIProjectemtiyaz.khan@riken.jpAbstractCalibratio...

展开>> 收起<<

Can Calibration Improve Sample Prioritization Ganesh Tata University of Alberta.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Can Calibration Improve Sample Prioritization Ganesh Tata University of Alberta

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: