Anomaly Detection Requires Better Representations Tal Reiss Niv Cohen Eliahu Horwitz Ron Abutbul and Yedid Hoshen

2025-04-30 0 0 1.46MB 13 页 10玖币
侵权投诉
Anomaly Detection Requires Better
Representations
Tal Reiss, Niv Cohen, Eliahu Horwitz, Ron Abutbul, and Yedid Hoshen
School of Computer Science and Engineering
The Hebrew University of Jerusalem, Israel
http://www.vision.huji.ac.il/ssrl_ad/
Abstract. Anomaly detection seeks to identify unusual phenomena, a
central task in science and industry. The task is inherently unsuper-
vised as anomalies are unexpected and unknown during training. Recent
advances in self-supervised representation learning have directly driven
improvements in anomaly detection. In this position paper, we first ex-
plain how self-supervised representations can be easily used to achieve
state-of-the-art performance in commonly reported anomaly detection
benchmarks. We then argue that tackling the next generation of anomaly
detection tasks requires new technical and conceptual improvements in
representation learning.
Keywords: Anomaly Detection, Self-Supervised Learning, Representa-
tion Learning
1 Introduction
Discovery commences with the awareness of anomaly, i.e., with the recog-
nition that nature has somehow violated the paradigm-induced expecta-
tions that govern normal science.
——–Kuhn, The Structure of Scientific Revolutions (1970)
I do not know what I may appear to the world, but to myself I seem to
have been only like a boy playing on the seashore, and diverting myself in
now and then finding a smoother pebble or a prettier shell than ordinary,
whilst the great ocean of truth lay all undiscovered before me.
——–Isaac Newton
Anomaly detection, discovering unusual patterns in data, is a core task for
human and machine intelligence. The importance of the task stems from the
centrality of discovering unique or unusual phenomena in science and industry.
For example, the fields of particle physics and cosmology have, to large extent,
been driven by the discovery of new fundamental particles and stellar objects.
Similarly, the discovery of new, unknown, biological organisms and systems is
arXiv:2210.10773v1 [cs.LG] 19 Oct 2022
2 T. Reiss et al.
a driving force behind biology. The task is also of significant economic poten-
tial. Anomaly detection methods are used to detect credit card fraud, faults on
production lines, and unusual patterns in network communications.
Detecting anomalies is essentially unsupervised as only ”normal” data, but no
anomalies, are seen during training. While the field has been intensely researched
for decades, the most successful recent approaches use a very simple two-stage
paradigm: (i) each data point is transformed to a representation, often learned
in a self-supervised manner. (ii) a density estimation model, often as simple as a
knearest neighbor estimator, is fitted to the normal data provided in a training
set. To classify a new sample as normal or anomalous, its estimated probability
density is computed - low likelihood samples are denoted as anomalies.
In this position paper, we first explain that advances in representation learn-
ing are the main explanatory factor for the performance of recent anomaly de-
tection (AD) algorithms. We show that this paradigm essentially ”solves” the
most commonly reported image anomaly detection benchmark (Sec. 4). While
this is encouraging, we argue that existing self-supervised representations are
unable to solve the next generation of AD tasks (Sec. 5). In particular, we high-
light the following issues: (i) masked-autoencoders are much worse for AD than
earlier self-supervised representation learning (SSRL) methods (ii) current ap-
proaches perform poorly in datasets with multiple objects per-image, complex
background, fine-grained anomalies. (iii) in some cases SSRL performs worse
than handcrafted representations (iv) for ”tabular” datasets, no representation
performed better than the original representation of the data (i.e. that data it-
self) (v) in the presence of nuisance factors of variation, it is unclear whether
SSRL can in-principle identify the optimal representation for effective AD.
Anomaly detection presents both rich rewards as well as significant chal-
lenges for representation learning. Overcoming these issues will require signifi-
cant progress, both technical and conceptual. We expect that increasing the in-
volvement of the self-supervised representation learning community in anomaly
detection will mutually benefit both fields.
2 Related Work
Classical AD approaches were typically based on either density estimation [9,20]
or reconstruction [15]. With the advent of deep learning, classical methods were
augmented by deep representations [23,38,19,24]. A prevalent way to learn these
representations was to use self-supervised methods, e.g. autoencoder [30], rota-
tion classification [10,13], and contrastive methods [36,35]. An alternative ap-
proach is to combine pretrained representations with anomaly scoring functions
[25,32,27,28]. The best performing methods [27,28] combine pretraining on aux-
iliary datasets and a second finetuning stage on the provided normal samples
in the training set. It was recently established [27] that given sufficiently pow-
erful representations (e.g. ImageNet classification), a simple criterion based on
the kNN distance to the normal training data achieves strong performance. We
therefore limit the discussion of AD in this paper to this simple technique.
Anomaly Detection Requires Better Representations 3
Fig. 1. Normal and Anomalous Representations: The self-supervised representa-
tions transform the raw data into a space in which normal and anomalous data can be
easily separated using density estimation methods
3 Anomaly Detection as a Downstream Task for
Representation Learning
In this section we describe the computational task, method, and evaluation set-
ting for anomaly detection.
Task definition. We assume access to Nrandom samples, denoted by
Xtrain ={x1, x2...xN}, from the distribution of the normal data pnorm(x). At
test time, the algorithm observes a sample ˜xfrom the real-world distribution
preal (x), which consists of a combination of the normal and anomalous data
distributions: pnorm (x) and panom(x). The task is to classify the sample ˜xas
normal or anomalous.
Representations for anomaly detection. In AD, it is typically assumed
that anomalies apanom have a low likelihood under the normal data distri-
bution, i.e. that pnorm (a) is small. Under this assumption, the PDF of normal
data pnorm acts as an effective anomaly classifier. In practice, however, train-
ing an estimator qfor scoring anomalies using pnorm is a challenging statistical
task. The challenge is greater when: (i) the data are high-dimensional (e.g. im-
ages) (ii) pnorm is sparse or irregular (iii) normal and anomalous data are not
separable using simple functions. Representation learning may overcome these
issues by transforming the sample xinto a representation ϕ(x), which is of lower
dimension, where pnorm is relatively smooth and where normal and anomalous
data are more separable. As no anomaly labels are provided, self-supervised
representation learning is needed.
A two-stage anomaly detection paradigm. Given a self-supervised rep-
resentation ϕ, we follow a simple two stage anomaly detection paradigm: (i) Rep-
resentation encoder : each sample during training or test is mapped to a feature
摘要:

AnomalyDetectionRequiresBetterRepresentationsTalReiss,NivCohen,EliahuHorwitz,RonAbutbul,andYedidHoshenSchoolofComputerScienceandEngineeringTheHebrewUniversityofJerusalem,Israelhttp://www.vision.huji.ac.il/ssrl_ad/Abstract.Anomalydetectionseekstoidentifyunusualphenomena,acentraltaskinscienceandindust...

展开>> 收起<<
Anomaly Detection Requires Better Representations Tal Reiss Niv Cohen Eliahu Horwitz Ron Abutbul and Yedid Hoshen.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:1.46MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注