Improving the Reliability for Confidence Estimation 3
thus better tackle the imbalanced label problem; it will also be able to tackle di-
verse input (I) distributions, which improves performance on out-of-distribution
data. Based on this novel perspective, we propose to improve upon both of these
qualities simultaneously through a unified framework, that allows the confidence
estimator to learn to generalize, and perform well on distributions that might be
different from the distributions (of both Cand I) seen during training. In order
to achieve this, we incorporate meta-learning into our framework.
Meta-learning, also known as “learning to learn”, allows us to train a model
that can generalize well to different distributions. Specifically, in some meta-
learning works [9,28,13,2,16,46], a virtual testing set is used to mimic the
testing conditions during training, so that even though training is mainly done
on a virtual training set consisting of training data, performance on the testing
scenario is improved. In our work, we construct our virtual testing sets such that
they simulate various distributions that are different from the virtual training
set, which will push our model to learn distribution-generalizable knowledge to
perform well on diverse distributions, instead of learning distribution-specific
knowledge that only performs well on the training distribution. In particular, for
our confidence estimator to learn distribution-generalizable knowledge and tackle
diverse distributions of Cand I, we intentionally construct virtual training and
testing sets that simulate the different distribution shifts of Cand I, and use
them for meta-learning.
The contributions of our work are summarized as follows. 1) We propose a
novel framework, which incorporates meta-learning to learn a confidence estima-
tor to produce confidence estimates more reliably. 2) By carefully constructing
virtual training and testing sets that simulate the training and various testing
scenarios, our framework can learn to generalize well to different correctness
label distributions and input distributions. 3) We apply our framework upon
state-of-the-art confidence estimation methods [5,47] across various computer
vision tasks, including image classification and monocular depth estimation, and
achieve consistent performance enhancement throughout.
2 Related Work
Confidence Estimation. Being an important task that helps determine whether
a deep predictor’s predictions can be trusted, confidence estimation has been
studied extensively across various computer vision tasks [14,11,19,5,34,36,
32,44,4,35,26,43,47]. At the beginning, Hendrycks and Gimpel [14] proposed
Maximum Class Probability utilizing the classifier softmax distribution, Gal and
Ghahramani [11] proposed MCDropout from the perspective of uncertainty es-
timation, and Jiang et al. [19] proposed Trust Score to calculate the agreement
between the classifier and a modified nearest-neighbor classifier in the testing
set. More recently, the idea of separate confidence estimator was introduced by
several works [5,47]. Specifically, these works proposed to fix the task model,
and instead conduct confidence estimation via a separate confidence estimator.
Notably, Corbiere et al. [5] proposed a separate confidence estimator called Con-