problem is especially acute for high-dimensional problems like image classification. Models are
typically trained in a close-world setting but inevitably faced with novel input classes when deployed
in the real world. The impact can range from displeasing customer experience to dire consequences
in the case of safety-critical applications such as autonomous driving [
31
] or medical analysis [
55
].
Although achieving high accuracy against all meaningful distributional shifts is the most desirable
solution, it is particularly challenging. An efficient method to mitigate the consequences of unexpected
inputs is to perform anomaly detection, which allows the system to anticipate its inability to process
unusual inputs and react adequately.
Anomaly detection methods generally rely on one of three types of statistics: features, logits, and
softmax probabilities, with some systems leveraging a mix of these [
66
]. An anomaly score
f(x)
is
computed, and then detection with threshold
τ
is performed based on whether
f(x)> τ
. The goal of
a detection system is to find an anomaly score that efficiently discriminates between in-distribution
and out-of-distribution samples. However, the common problem of these systems is that different
distributional shifts will unpredictably affect these statistics. Accordingly, detection systems either
achieve good performance on specific types of distributions or require tuning on OOD samples. In
both cases, their practical use is severely limited. Motivated by these issues, recent work has tackled
the challenge of designing detection systems for unseen classes without prior knowledge of the
unseen label set or access to OOD samples [68, 63, 66].
We first investigate the use of maximum mean discrepancy two-sample test (MMD) [
19
] in con-
junction with self-supervised contrastive learning to assess whether two sets of samples have been
drawn from the same distribution. Motivated by the strong testing power of this method, we then
introduce a statistic inspired by MMD and leveraging contrastive transformations. Based on this
statistic, we propose CADet (Contrastive Anomaly Detection), which is able to detect OOD samples
from single inputs and performs well on both label-based and adversarial detection benchmarks,
without requiring access to any OOD samples to train or tune the method.
Only a few works have addressed these tasks simultaneously. These works either focus on particular
in-distribution data such as medical imaging for specific diseases [
65
] or evaluate their performances
on datasets with very distant classes such as CIFAR10 [
32
], SVHN [
47
], and LSUN [
73
], resulting in
simple benchmarks that do not translate to general real world applications [33, 51].
Contributions Our main contributions are as follows:
•
We use similarity functions learned by self-supervised contrastive learning with MMD to
show that the test sets of CIFAR10 and CIFAR10.1 [52] have different distributions.
•
We propose a novel improvement to MMD and show it can also be used to confidently detect
distributional shifts when given a small number of samples.
•
We introduce CADet, a fully self-supervised method for OOD detection inspired by MMD,
and show it outperforms current methods in adversarial detection tasks while performing
well on label-based OOD detection.
The outline is as follows: in Section 2, we discuss relevant previous work. Section 3 describes the
self-supervised contrastive method based on SimCLRv2 [
5
] used in this work. Section 4 explores
the application of learned similarity functions in conjunction with MMD to verify whether two
independent sets of samples are drawn from the same distribution. Section 5 presents CADet and
evaluates its empirical performance. Finally, we discuss results and limitations in Section 6.
2 Related work
We propose a self-supervised contrastive method for anomaly detection (both unknown classes and
adversarial attacks) inspired by MMD. Thus, our work intersects with the MMD, label-based OOD
detection, adversarial detection, and self-supervised contrastive learning literature.
MMD two-sample test has been extensively studied [
19
,
67
,
18
,
62
,
8
,
29
], though it is, to the best
of our knowledge, the first time a similarity function trained via contrastive learning is used in
conjunction with MMD. Liu et al.
[35]
uses MMD with a deep kernel trained on a fraction of the
samples to argue that CIFAR10 and CIFAR10.1 have different test distributions. We build upon that
work by confirming their finding with higher confidence levels, using fewer samples. Dong et al.
[11]
explored applications of MMD to OOD detection.
2