Null Hypothesis Test for Anomaly Detection Jernej F. Kamenik Jo zef Stefan Institute Jamova 39 1000 Ljubljana Slovenia and

2025-05-02 0 0 643.29KB 10 页 10玖币

侵权投诉

Null Hypothesis Test for Anomaly Detection

Jernej F. Kamenik∗

Joˇzef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia and

Faculty of Mathematics and Physics, University of Ljubljana, Jadranska 19, 1000 Ljubljana, Slovenia

Manuel Szewc†

Joˇzef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia

We extend the use of Classiﬁcation Without Labels for anomaly detection with a hypothesis test

designed to exclude the background-only hypothesis. By testing for statistical independence of the

two discriminating dataset regions, we are able to exclude the background-only hypothesis without

relying on ﬁxed anomaly score cuts or extrapolations of background estimates between regions. The

method relies on the assumption of conditional independence of anomaly score features and dataset

regions, which can be ensured using existing decorrelation techniques. As a benchmark example, we

consider the LHC Olympics dataset where we show that mutual information represents a suitable test

for statistical independence and our method exhibits excellent and robust performance at diﬀerent

signal fractions even in presence of realistic feature correlations.

I. INTRODUCTION

The combination of increased experimental sensitivity and no clear leading theoretical guide for how physics beyond

the standard model would manifest in current and future particle physics experiments has resulted in increased devel-

opment of anomaly detection techniques for collider applications, see Ref. [1] for a living review with a continuously

updated list of references. These techniques, which make use of state of the art unsupervised and/or weakly supervised

algorithms, have the advantage of being sensitive to a large variety of signals at the expense of losing statistical power

in comparison to dedicated searches. However, appropriately quantifying said sensitivity is still an open problem [2],

with diﬀering proposals, see e.g. Ref. [3]. An especially pressing question is how to evaluate the null hypothesis exclu-

sion sensitivity of an anomaly detection method. The current strategy is to perform cuts using the anomalous score

and extrapolate a background model from a control region. This can be problematic for several reasons. First, the

use of the anomalous score itself to select events is not guaranteed to yield a robust method that disentangles the

underlying processes, see e.g. Ref. [4] for a recent discussion of how ambiguities in the data representation can lead

to diﬀerent notions of anomalous events which vary in their discriminating power. Second, even if the anomaly score

is an appropriate event selection tool, the use of cuts, which in an unsupervised search cannot be optimized on a

targeted signal model, necessarily introduces a loss in sensitivity by discarding possible signal events. Finally, the use

of a control region potentially introduces additional biases when assuming the absence of signal in the control region

and/or employing interpolation methods such as the ﬁt to a monotonic mass spectrum in a Bump Hunt.

In this work we aim to address some of the shortcomings outlined above. In particular, we propose a null hypothesis

statistical test for anomaly detection which does not rely on ﬁxed anomaly score cuts nor requires background model

extrapolations from control regions. We apply it to a speciﬁc anomaly detection technique, Classiﬁcation Without

Labels (CWoLa) introduced as a quark/gluon tagger in Ref. [5] and as an anomaly detection technique in Refs. [6,7],

and its extension introduced in Ref. [8] incorporating simulation assisted decorrelation of features. We show that by

testing for independence between the set of features used in the anomaly score, and those used to deﬁne signal

and control regions, we can obtain a p-value which avoids false signal-detection and is robust in presence of slight

correlations between the two sets of features.

The work is structured as follows. In Section II we review CWoLa and introduce the proposed statistical test.

In Section III we apply our method to a LHC Olympics benchmark to demonstrate its power and limitations. We

conclude in Section IV where we also discuss possible future extensions and improvements. All the necessary code to

reproduce our results is available at GitHub [9].

∗jernej.kamenik@cern.ch

†manuel.szewc@ijs.si

arXiv:2210.02226v3 [hep-ph] 15 Mar 2023

II. METHOD

Introduced in Ref. [5], CWoLa is a weakly-supervised technique for anomaly detection which aims to learn a

monotonic function of the Likelihood Ratio between Signal Sand Background Bprocesses for a set of features of

interest ~x,LS/B(~x) = p(~x|S)/p(~x|B), with the help of an additional feature yuncorrelated with ~x. The latter variable,

often but not necessarily the invariant mass of the event, can be used to deﬁne two regions of interest: the signal

region M1and the control (or side-band) region M2, where the signal-to-background ratio is assumed to be higher in

M1than in M2. A weakly-supervised algorithm, CWoLa trains a classiﬁer to distinguish between M1and M2. The

obtained output function s(~x) can then be mapped to LM1/M2(~x) through the likelihood ratio trick. The orthogonality

of yand ~x guarantees that LM1/M2(~x) is a monotonous function of LS/B (~x) and thus possesses in principle optimal

statistical power.

Usual applications of CWoLa use the learned optimal classiﬁer s(~x) to select events of interest and assign a certain

signiﬁcance to the diﬀerence in selected events in M1and M2. The diﬀerence in the resulting selection eﬃciencies

M1,2is a smoking-gun for the presence of signal in M1(and also M2). However, this is only true in the limit of inﬁnite

statistics. In a realistic setting where the dataset is ﬁnite, quantifying the degree to which the diﬀerence in eﬃciencies

relates to the presence of signal is non-trivial. One common strategy is to assume that there is no signal in M2and

assess the agreement between the selected events in M1and a background extrapolation from M2.

Our method constitutes an alternative to assess how the learned output s(~x) encodes diﬀerences between M1and

M2caused by the presence of a signal. To introduce it, we focus on the density estimation framing of CWoLa, which

clearly deﬁnes a background-only or null hypothesis. At its heart, CWoLa is a mixture model where ~x and yare

assumed to be conditionally independent given the process label z={S, B}. After deﬁning M1and M2using y, the

trained classiﬁer output is a function s(~x) that inherits the conditional independence with respect to y. The statistical

model can be explicitly written as

p(s(~x), y|π) = (1 −π)p(s(~x)|B)p(y|B) + π p(s(~x)|S)p(y|S),(1)

where πis the signal probability. The background-only hypothesis is explicitly written as p(s(~x), y|π= 0) and cor-

responds to the case where the observed data shows independence between s(~x) and y. This is the key observation

for our strategy. For a given measured dataset of pairs {s(~xi), yi}, one can assess whether they are statistically inde-

pendent. If statistical independence is ruled out, the background-only hypothesis is ruled out, provided conditional

independence holds. Conversely, if statistical independence cannot be ruled out, one has a clear statement about the

incapability of CWoLa to discern whether any diﬀerence between M1and M2originates from the presence of a signal

or is due to statistical ﬂuctuations in the data.

Several tests of statistical independence exist for both discrete and continuous distributions, including mutual

information [10], Hoeﬀding’s D independence test [11] and distance correlation [12]. For simplicity, in the present

work we focus on the use of the estimated mutual information (MI) Iof the measured probability distribution. MI

encodes the exact property we want to test as it measures the diﬀerence between the joint distribution and the

marginals:

I(s, y) = DKL(p(s, y)||p(s)p(y)) (2)

=Zds dy p(s, y) log p(s, y)

p(s)p(y),(3)

where DKL(p, q) is the Kullback-Leibler divergence between two probability distributions, capturing how much in-

formation is lost when approximating the distribution pwith the distribution q. The MI thus captures how well one

can approximate the joint distribution by the product of its marginals and it is trivial to show that it vanishes for

independent variables. Conditional Independence can then be expressed as a vanishing MI conditioned on a given

process

I(s, y|z) = Zds dy p(s, y|z) log p(s, y|z)

p(s|z)p(y|z)= 0 .(4)

On the other hand, for the full dataset the possible mixture between the two processes encoded in π∈[0,1] results in

I(s, y)≥0,(5)

with the equality achieved when there is only one process or the two processes have the same probability distributions.

A very nice feature of the MI is that it has well behaved asymptotic properties in the limit of small MI and large

sample size [13]. Thus, we can estimate it from the measured sample of Nevents and obtain the p-value of said

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NullHypothesisTestforAnomalyDetectionJernejF.KamenikJozefStefanInstitute,Jamova39,1000Ljubljana,SloveniaandFacultyofMathematicsandPhysics,UniversityofLjubljana,Jadranska19,1000Ljubljana,SloveniaManuelSzewcyJozefStefanInstitute,Jamova39,1000Ljubljana,SloveniaWeextendtheuseofClassicationWithoutLab...

展开>> 收起<<

Null Hypothesis Test for Anomaly Detection Jernej F. Kamenik Jo zef Stefan Institute Jamova 39 1000 Ljubljana Slovenia and.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Null Hypothesis Test for Anomaly Detection Jernej F. Kamenik Jo zef Stefan Institute Jamova 39 1000 Ljubljana Slovenia and

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: