FINE-TUNE YOUR CLASSIFIER FINDING CORRELATIONS WITH TEMPERATURE Benjamin Chamand1Olivier Risser-Maroix2y Camille Kurtz2Philippe Joly1Nicolas Lom enie2

2025-05-06 0 0 282.46KB 5 页 10玖币

侵权投诉

FINE-TUNE YOUR CLASSIFIER: FINDING CORRELATIONS WITH TEMPERATURE

Benjamin Chamand1∗Olivier Risser-Maroix2∗†

Camille Kurtz2Philippe Joly1Nicolas Lom´

enie2

1IRIT, Universit´

e de Toulouse, CNRS, Toulouse INP, UT3, Toulouse, France

2LIPADE, Universit´

e de Paris, France

benjamin.chamand@irit.fr, orissermaroix@gmail.com

ABSTRACT

Temperature is a widely used hyperparameter in various tasks

involving neural networks, such as classiﬁcation or metric

learning, whose choice can have a direct impact on the model

performance. Most of existing works select its value using

hyperparameter optimization methods requiring several runs

to ﬁnd the optimal value. We propose to analyze the impact of

temperature on classiﬁcation tasks by describing a dataset as a

set of statistics computed on representations on which we can

build a heuristic giving us a default value of temperature. We

study the correlation between these extracted statistics and the

observed optimal temperatures. This preliminary study on

more than a hundred combinations of different datasets and

features extractors highlights promising results towards the

construction of a general heuristic for temperature.

Index Terms—temperature, hyperparameter, heuristic,

softmax, cross-entropy

1. INTRODUCTION

The performance of a machine learning algorithm applied to a

computer vision task is highly dependent on the choice of its

hyperparameters. Among these, the temperature is a scaling

factor often used in a neural network linked to the softmax

layers, the latter being usually followed by a cross-entropy

(CE) like loss function. Intuitively, the temperature (in allu-

sion to statistical mechanics) is introduced to choose the level

of uniformity of the distribution. Since most deep classiﬁca-

tion models involve both softmax layer and CE like loss func-

tions for their training, determining an optimal temperature

for a particular task can then have a broad impact.

For example, this parameter is widely considered in var-

ious tasks such as knowledge distillation, classiﬁcation, text

generation, self-supervised and metric learning [1, 2, 3, 4, 5,

6, 7, 8, 9, 10, 11] Traditionally, in most of these domains and

in the underlying applications, the temperature is determined

empirically, with a value that can be constant (typically from

∗Equal contribution.

†Financed by Smiths Detection

a grid search) or evolve dynamically over iterations, in the

same vein as the learning rate parameter. Nevertheless, such

strategies for determining a good temperature may be subop-

timal or computationally too cumbersome. Surprisingly, there

are very few studies proposing strategies for determining an

optimal temperature. In this paper, we focus on the particu-

lar problem that, given a classiﬁcation task, we need to ﬁnd a

correlation between an optimal value for the temperature and

statistics describing the dataset such as complexity, dimen-

sion, number of classes, etc.

2. RELATED WORKS

The temperature hyperparameter is typically employed in the

softmax layer to control the uniformity of the distribution. Al-

though the use of a good temperature has shown its impact in

many computer vision tasks, the existing strategies to deﬁne

such a temperature parameter are quite limited.

The ﬁrst way to proceed is to consider a constant temper-

ature throughout the training. The choice can be done em-

pirically, as in [1, 8, 9]. It can also be considered as a ﬁxed

hyperparameter to be optimized via a grid search in a ﬁeld

of possible values, but this implies signiﬁcant computational

requirements and leads to different hyperparameters for each

dataset and architecture. A simple heuristic can also allow to

ﬁx the parameter as proposed in the Transformers [12] with

√d,dbeing the dimension of the queries and the key vectors.

Other strategies rely on dynamic temperature adjustment

during learning iterations. In this case, the elements of the

temperature can evolve at each epoch using a scheduler [4], in

the manner of the learning rate to reﬁne the network. In [13],

the authors also showed that a batch normalization rescaled by

√d, with dthe number of dimensions of embeddings, worked

slightly better than a simple L2normalization, and can also

lead to more embedding vectors. Dynamic adjustment of tem-

perature can also be done by learning it as a standard param-

eter [14, 15]. This usually requires additional steps like clip-

ping or adding exp to avoid negative values. Furthermore, the

learned temperature strongly depends on the learning rate hy-

perparameter.

arXiv:2210.09715v1 [cs.LG] 18 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FINE-TUNEYOURCLASSIFIER:FINDINGCORRELATIONSWITHTEMPERATUREBenjaminChamand1OlivierRisser-Maroix2yCamilleKurtz2PhilippeJoly1NicolasLom´enie21IRIT,Universit´edeToulouse,CNRS,ToulouseINP,UT3,Toulouse,France2LIPADE,Universit´edeParis,Francebenjamin.chamand@irit.fr,orissermaroix@gmail.comABSTRACTTempera...

展开>> 收起<<

FINE-TUNE YOUR CLASSIFIER FINDING CORRELATIONS WITH TEMPERATURE Benjamin Chamand1Olivier Risser-Maroix2y Camille Kurtz2Philippe Joly1Nicolas Lom enie2.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

FINE-TUNE YOUR CLASSIFIER FINDING CORRELATIONS WITH TEMPERATURE Benjamin Chamand1Olivier Risser-Maroix2y Camille Kurtz2Philippe Joly1Nicolas Lom enie2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: