2 T. Reiss et al.
a driving force behind biology. The task is also of significant economic poten-
tial. Anomaly detection methods are used to detect credit card fraud, faults on
production lines, and unusual patterns in network communications.
Detecting anomalies is essentially unsupervised as only ”normal” data, but no
anomalies, are seen during training. While the field has been intensely researched
for decades, the most successful recent approaches use a very simple two-stage
paradigm: (i) each data point is transformed to a representation, often learned
in a self-supervised manner. (ii) a density estimation model, often as simple as a
knearest neighbor estimator, is fitted to the normal data provided in a training
set. To classify a new sample as normal or anomalous, its estimated probability
density is computed - low likelihood samples are denoted as anomalies.
In this position paper, we first explain that advances in representation learn-
ing are the main explanatory factor for the performance of recent anomaly de-
tection (AD) algorithms. We show that this paradigm essentially ”solves” the
most commonly reported image anomaly detection benchmark (Sec. 4). While
this is encouraging, we argue that existing self-supervised representations are
unable to solve the next generation of AD tasks (Sec. 5). In particular, we high-
light the following issues: (i) masked-autoencoders are much worse for AD than
earlier self-supervised representation learning (SSRL) methods (ii) current ap-
proaches perform poorly in datasets with multiple objects per-image, complex
background, fine-grained anomalies. (iii) in some cases SSRL performs worse
than handcrafted representations (iv) for ”tabular” datasets, no representation
performed better than the original representation of the data (i.e. that data it-
self) (v) in the presence of nuisance factors of variation, it is unclear whether
SSRL can in-principle identify the optimal representation for effective AD.
Anomaly detection presents both rich rewards as well as significant chal-
lenges for representation learning. Overcoming these issues will require signifi-
cant progress, both technical and conceptual. We expect that increasing the in-
volvement of the self-supervised representation learning community in anomaly
detection will mutually benefit both fields.
2 Related Work
Classical AD approaches were typically based on either density estimation [9,20]
or reconstruction [15]. With the advent of deep learning, classical methods were
augmented by deep representations [23,38,19,24]. A prevalent way to learn these
representations was to use self-supervised methods, e.g. autoencoder [30], rota-
tion classification [10,13], and contrastive methods [36,35]. An alternative ap-
proach is to combine pretrained representations with anomaly scoring functions
[25,32,27,28]. The best performing methods [27,28] combine pretraining on aux-
iliary datasets and a second finetuning stage on the provided normal samples
in the training set. It was recently established [27] that given sufficiently pow-
erful representations (e.g. ImageNet classification), a simple criterion based on
the kNN distance to the normal training data achieves strong performance. We
therefore limit the discussion of AD in this paper to this simple technique.