a dimensionality reduction over both the detection-training (source) and test (target) samples, and then applying a
two-sample statistical test over these reduced representations to detect a deviation. This is further discussed in Section 3.
In particular, deep models can benefit from the semantic representation created by the model itself, which provides
meaningful dimensionality reduction that is readily available at the last layers of the model. Using the embedding layer
(or softmax) along with statistical two-sample tests was recently proposed by [
9
] and [
10
] who termed solutions of
this structure black-box shift detection (BBSD). Using both the univariate Kolmogorov-Smirnov (KS) test and the
maximum mean discrepancy (MMD) method, see details below, [
10
] achieves impressive detection results when using
MNIST and CIFAR-10 as proxies for the distribution
P
. As shown here, the KS test is also very effective over ImageNet
when a stronger model is used (ResNet50 vs ResNet18). However, BBSD methods have the disadvantage of being
computationally intensive (and probably inapplicable to read-world datasets) due to the use of two-sample tests between
the detection-training set (which can, and is preferred to be the largest possible) and the window
Wk
; a complexity
analysis is provided in Section 5.
We propose a different approach based on selective prediction [
11
,
12
], where a model quantifies its prediction
uncertainty and abstains from predicting uncertain instances. First, we develop a method for selective prediction with
guaranteed coverage. This method identifies the best abstaining threshold and coverage bound for a given pretrained
classifier
f
, such that the resulting empirical coverage will not violate the bound with high probability (when abstention
is determined using the threshold). The guaranteed coverage method is of independent interest, and it is analogous to
selective prediction with guaranteed risk [
12
]. Because the empirical coverage of such a classifier is highly unlikely to
violate the bound if the underlying distribution remains the same, a systematic violation indicates a distribution shift.
To be more specific, given a detection-training sample
Sm
, our coverage-based detection algorithm computes a fixed
number of tight generalization coverage bounds, which are then used to detect a distribution shift in a window
Wk
of
test data. The proposed detection algorithm exhibits remarkable computational efficiency due to its ability to operate
independently of the size of
Sm
during detection, which is crucial when considering “Google-scale” datasets, such as
the JFT-3B dataset. In contrast, the currently available distribution shift detectors rely on a framework that requires
significantly higher computational requirements (this framework is illustrated in Figure 3 in Appendix A). A run-time
comparison of these methods and ours is provided in Figure 1.
In a comprehensive empirical study, we compared our coverage-based detection algorithm with the best-performing
baselines, including the KS approach of [
10
]. Additionally, we investigated the suitability of single-instance detection
methods for identifying distribution shifts. For a fair comparison, all methods used the same (publicly available)
underlying models: ResNet50, MobileNetV3-S, and ViT-T. To evaluate the effectiveness of our approach, we employed
the ImageNet dataset to simulate the source distribution. We then introduced distribution shifts using a range of methods,
starting with simple noise and progressing to more sophisticated techniques such as adversarial examples. Based on
these experiments, it is evident that our coverage-based detection method is overall significantly more powerful than the
baselines across a wide range of test window sizes.
To summarize, the contributions of this paper are: (1) A theoretically justified algorithm (Algorithm 1), that produces a
coverage bound, which is of independent interest, and allows for the creation of selective classifiers with guaranteed
coverage. (2) A theoretically motivated “windowed” detection algorithm (Algorithm 2), which detects a distribution
shift over a window; this proposed algorithm exhibits remarkable efficiency compared to state-of-the-art methods (five
orders of magnitude better than the best method). (3) A comprehensive empirical study demonstrating significant
improvements relative to existing baselines, and introducing the use of single-instance methods for detecting distribution
shifts.
2 Problem Formulation
We consider the problem of detecting distribution shifts in input streams provided to pretrained deep neural models.
Let
P≜PX
denote a probability distribution over an input space
X
, and assume that a model
f
has been trained on a
set of instances drawn from
P
. Consider a setting where the model
f
is deployed and while being used in production
its input distribution might change or even be attacked by an adversary. Our goal is to detect such events to allow for
appropriate action, e.g., retraining the model with respect to the revised distribution.
Inspired by [
10
], we formulate this problem as follows. We are given a pretrained model
f
, and a detection-training set,
Sm∼ Pm
. Then we would like to train a detection model to be able to detect a distribution shift; namely, discriminate
between windows containing in-distribution (ID) data, and alternative-distribution (AD) data. Thus, given an unlabeled
test sample window
Wk∼Qk
, where
Q
is a possibly different distribution, the objective is to determine whether
P ̸=Q
. We also ask what is the smallest test sample size
k
required to determine that
P ̸=Q
. Since typically the
detection-training set
Sm
can be quite large, we further ask whether it is possible to devise an effective detection
procedure whose time complexity is o(m).
2