
of unlabeled data can be estimated with TD. However, most
uncertainty-based methods quantify data uncertainty based
on static information (e.g., loss [52] or predicted probabil-
ity [45]) from a fully-trained model “snapshot,” neglecting
the valuable information generated during training. We fur-
ther argue that TD is more effective in separating uncertain
and certain data than static information from a model snap-
shot captured after model training. In §3, we provide both
theoretical and empirical evidence to support our argument
that TD is a valuable tool for quantifying data uncertainty.
Despite its huge potential, TD is not yet actively ex-
plored in the domain of AL. This is because AL assumes
a massive unlabeled data pool. Previous studies track TD
only for the training data every epoch as it can be recorded
easily during model optimization. On the other hand, AL
targets a large number of unlabeled data, where tracking
the TD for each unlabeled sample requires an impractical
amount of computation (e.g., inference all the unlabeled
samples every training epoch).
Therefore, we propose TiDAL (Training Dynamics for
Active Learning), a novel AL method that efficiently quan-
tifies the uncertainty of unlabeled data by estimating their
TD. We avoid tracking the TD of large-scale unlabeled data
every epoch by predicting the TD of unlabeled samples with
a TD prediction module. The module is trained with the TD
of labeled data, which is readily available during model op-
timization. During the data selection phase, we predict the
TD of unlabeled data with the trained module to quantify
their uncertainties. We efficiently obtain TD using the mod-
ule, which avoids inferring all the unlabeled samples every
epoch. Experimental results demonstrate that our TiDAL
achieves better or comparable performance to existing AL
methods on both balanced and imbalanced datasets. Ad-
ditional analyses show that our prediction module success-
fully predicts TD, and the predicted TD is useful in estimat-
ing uncertainties of unlabeled data. Our proposed method
are illustrated in Figure 1.
Contributions of our study: (1) We bridge the concept
of training dynamics and active learning with the theoretical
and experimental evidence that training dynamics is effec-
tive in estimating data uncertainty. (2) We propose a new
method that efficiently predicts the training dynamics of un-
labeled data to estimate their uncertainty. (3) Our proposed
method achieves better or comparable performance on both
balanced and imbalanced benchmark datasets compared to
existing active learning methods. For reproducibility, we
release the source code1.
2. Preliminaries
To better understand our proposed method, we first sum-
marize key concepts, including uncertainty-based active
1https://github.com/hyperconnect/TiDAL
learning, quantification of uncertainty, and training dynam-
ics.
Uncertainty-based active learning. In this work, we fo-
cus on uncertainty-based AL for multi-class classification
problems. We define the predicted probabilities of the given
sample xfor Cclasses as:
p= [p(1|x), p(2|x),··· , p(C|x)]T∈[0,1]C,(1)
where we denote the true label of xas yand the classifier
as f.Dand Dudenote a labeled dataset and an unlabeled
data pool, respectively. The general cycle of uncertainty-
based AL is in two steps: (1) train the target classifier f
on the labeled dataset Dand (2) select top-kuncertain data
samples from the unlabeled data pool Du. Selected samples
are then given to the human annotators to expand the labeled
dataset D, cycling back to the first step.
Quantifying uncertainty. The objective of this study is to
establish a connection between the concept of TD and the
field of AL. In order to clearly demonstrate the effective-
ness of utilizing TD to quantify data uncertainty, we have
employed two of the most prevalent and straightforward es-
timators, entropy [43] and margin [41], to measure data un-
certainty in this paper. Entropy His defined as follows:
H(p) = −XC
c=1 p(c|x) log p(c|x),(2)
where the sample xis from the unlabeled data pool Du.
Entropy concentrates on the level of the model’s confidence
on the given sample xand gets bigger when the prediction
across the classes becomes uniform (i.e., uncertain). Margin
Mmeasures the difference between the probability of the
true label and the maximum of the others:
M(p) = p(y|x)−max
c̸=ˆyp(c|x),(3)
where ydenotes the true label. The smaller the margin, the
lower the model’s confidence in the sample, so it can be
considered uncertain. Both entropy and margin are com-
puted with the predicted probabilities pof the fully trained
classifier f, only taking the snapshot of finto account.
Defining training dynamics. Our TiDAL targets to lever-
age TD of unlabeled data to estimate their uncertainties. TD
can be defined as any model behavior during optimization,
such as the area under the margin between logit values of
the target class and the other largest class [39] or the vari-
ance of the predicted probabilities generated at each epoch
[47]. In this work, we define the TD ¯
p(t)as the area under
the predicted probabilities of each data sample xobtained