FINE-TUNE YOUR CLASSIFIER FINDING CORRELATIONS WITH TEMPERATURE Benjamin Chamand1Olivier Risser-Maroix2y Camille Kurtz2Philippe Joly1Nicolas Lom enie2

2025-05-06 0 0 282.46KB 5 页 10玖币
侵权投诉
FINE-TUNE YOUR CLASSIFIER: FINDING CORRELATIONS WITH TEMPERATURE
Benjamin Chamand1Olivier Risser-Maroix2∗†
Camille Kurtz2Philippe Joly1Nicolas Lom´
enie2
1IRIT, Universit´
e de Toulouse, CNRS, Toulouse INP, UT3, Toulouse, France
2LIPADE, Universit´
e de Paris, France
benjamin.chamand@irit.fr, orissermaroix@gmail.com
ABSTRACT
Temperature is a widely used hyperparameter in various tasks
involving neural networks, such as classification or metric
learning, whose choice can have a direct impact on the model
performance. Most of existing works select its value using
hyperparameter optimization methods requiring several runs
to find the optimal value. We propose to analyze the impact of
temperature on classification tasks by describing a dataset as a
set of statistics computed on representations on which we can
build a heuristic giving us a default value of temperature. We
study the correlation between these extracted statistics and the
observed optimal temperatures. This preliminary study on
more than a hundred combinations of different datasets and
features extractors highlights promising results towards the
construction of a general heuristic for temperature.
Index Termstemperature, hyperparameter, heuristic,
softmax, cross-entropy
1. INTRODUCTION
The performance of a machine learning algorithm applied to a
computer vision task is highly dependent on the choice of its
hyperparameters. Among these, the temperature is a scaling
factor often used in a neural network linked to the softmax
layers, the latter being usually followed by a cross-entropy
(CE) like loss function. Intuitively, the temperature (in allu-
sion to statistical mechanics) is introduced to choose the level
of uniformity of the distribution. Since most deep classifica-
tion models involve both softmax layer and CE like loss func-
tions for their training, determining an optimal temperature
for a particular task can then have a broad impact.
For example, this parameter is widely considered in var-
ious tasks such as knowledge distillation, classification, text
generation, self-supervised and metric learning [1, 2, 3, 4, 5,
6, 7, 8, 9, 10, 11] Traditionally, in most of these domains and
in the underlying applications, the temperature is determined
empirically, with a value that can be constant (typically from
Equal contribution.
Financed by Smiths Detection
a grid search) or evolve dynamically over iterations, in the
same vein as the learning rate parameter. Nevertheless, such
strategies for determining a good temperature may be subop-
timal or computationally too cumbersome. Surprisingly, there
are very few studies proposing strategies for determining an
optimal temperature. In this paper, we focus on the particu-
lar problem that, given a classification task, we need to find a
correlation between an optimal value for the temperature and
statistics describing the dataset such as complexity, dimen-
sion, number of classes, etc.
2. RELATED WORKS
The temperature hyperparameter is typically employed in the
softmax layer to control the uniformity of the distribution. Al-
though the use of a good temperature has shown its impact in
many computer vision tasks, the existing strategies to define
such a temperature parameter are quite limited.
The first way to proceed is to consider a constant temper-
ature throughout the training. The choice can be done em-
pirically, as in [1, 8, 9]. It can also be considered as a fixed
hyperparameter to be optimized via a grid search in a field
of possible values, but this implies significant computational
requirements and leads to different hyperparameters for each
dataset and architecture. A simple heuristic can also allow to
fix the parameter as proposed in the Transformers [12] with
d,dbeing the dimension of the queries and the key vectors.
Other strategies rely on dynamic temperature adjustment
during learning iterations. In this case, the elements of the
temperature can evolve at each epoch using a scheduler [4], in
the manner of the learning rate to refine the network. In [13],
the authors also showed that a batch normalization rescaled by
d, with dthe number of dimensions of embeddings, worked
slightly better than a simple L2normalization, and can also
lead to more embedding vectors. Dynamic adjustment of tem-
perature can also be done by learning it as a standard param-
eter [14, 15]. This usually requires additional steps like clip-
ping or adding exp to avoid negative values. Furthermore, the
learned temperature strongly depends on the learning rate hy-
perparameter.
arXiv:2210.09715v1 [cs.LG] 18 Oct 2022
摘要:

FINE-TUNEYOURCLASSIFIER:FINDINGCORRELATIONSWITHTEMPERATUREBenjaminChamand1OlivierRisser-Maroix2yCamilleKurtz2PhilippeJoly1NicolasLom´enie21IRIT,Universit´edeToulouse,CNRS,ToulouseINP,UT3,Toulouse,France2LIPADE,Universit´edeParis,Francebenjamin.chamand@irit.fr,orissermaroix@gmail.comABSTRACTTempera...

展开>> 收起<<
FINE-TUNE YOUR CLASSIFIER FINDING CORRELATIONS WITH TEMPERATURE Benjamin Chamand1Olivier Risser-Maroix2y Camille Kurtz2Philippe Joly1Nicolas Lom enie2.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:282.46KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注