Null Hypothesis Test for Anomaly Detection Jernej F. Kamenik Jo zef Stefan Institute Jamova 39 1000 Ljubljana Slovenia and

2025-05-02 0 0 643.29KB 10 页 10玖币
侵权投诉
Null Hypothesis Test for Anomaly Detection
Jernej F. Kamenik
Joˇzef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia and
Faculty of Mathematics and Physics, University of Ljubljana, Jadranska 19, 1000 Ljubljana, Slovenia
Manuel Szewc
Joˇzef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia
We extend the use of Classification Without Labels for anomaly detection with a hypothesis test
designed to exclude the background-only hypothesis. By testing for statistical independence of the
two discriminating dataset regions, we are able to exclude the background-only hypothesis without
relying on fixed anomaly score cuts or extrapolations of background estimates between regions. The
method relies on the assumption of conditional independence of anomaly score features and dataset
regions, which can be ensured using existing decorrelation techniques. As a benchmark example, we
consider the LHC Olympics dataset where we show that mutual information represents a suitable test
for statistical independence and our method exhibits excellent and robust performance at different
signal fractions even in presence of realistic feature correlations.
I. INTRODUCTION
The combination of increased experimental sensitivity and no clear leading theoretical guide for how physics beyond
the standard model would manifest in current and future particle physics experiments has resulted in increased devel-
opment of anomaly detection techniques for collider applications, see Ref. [1] for a living review with a continuously
updated list of references. These techniques, which make use of state of the art unsupervised and/or weakly supervised
algorithms, have the advantage of being sensitive to a large variety of signals at the expense of losing statistical power
in comparison to dedicated searches. However, appropriately quantifying said sensitivity is still an open problem [2],
with differing proposals, see e.g. Ref. [3]. An especially pressing question is how to evaluate the null hypothesis exclu-
sion sensitivity of an anomaly detection method. The current strategy is to perform cuts using the anomalous score
and extrapolate a background model from a control region. This can be problematic for several reasons. First, the
use of the anomalous score itself to select events is not guaranteed to yield a robust method that disentangles the
underlying processes, see e.g. Ref. [4] for a recent discussion of how ambiguities in the data representation can lead
to different notions of anomalous events which vary in their discriminating power. Second, even if the anomaly score
is an appropriate event selection tool, the use of cuts, which in an unsupervised search cannot be optimized on a
targeted signal model, necessarily introduces a loss in sensitivity by discarding possible signal events. Finally, the use
of a control region potentially introduces additional biases when assuming the absence of signal in the control region
and/or employing interpolation methods such as the fit to a monotonic mass spectrum in a Bump Hunt.
In this work we aim to address some of the shortcomings outlined above. In particular, we propose a null hypothesis
statistical test for anomaly detection which does not rely on fixed anomaly score cuts nor requires background model
extrapolations from control regions. We apply it to a specific anomaly detection technique, Classification Without
Labels (CWoLa) introduced as a quark/gluon tagger in Ref. [5] and as an anomaly detection technique in Refs. [6,7],
and its extension introduced in Ref. [8] incorporating simulation assisted decorrelation of features. We show that by
testing for independence between the set of features used in the anomaly score, and those used to define signal
and control regions, we can obtain a p-value which avoids false signal-detection and is robust in presence of slight
correlations between the two sets of features.
The work is structured as follows. In Section II we review CWoLa and introduce the proposed statistical test.
In Section III we apply our method to a LHC Olympics benchmark to demonstrate its power and limitations. We
conclude in Section IV where we also discuss possible future extensions and improvements. All the necessary code to
reproduce our results is available at GitHub [9].
jernej.kamenik@cern.ch
manuel.szewc@ijs.si
arXiv:2210.02226v3 [hep-ph] 15 Mar 2023
2
II. METHOD
Introduced in Ref. [5], CWoLa is a weakly-supervised technique for anomaly detection which aims to learn a
monotonic function of the Likelihood Ratio between Signal Sand Background Bprocesses for a set of features of
interest ~x,LS/B(~x) = p(~x|S)/p(~x|B), with the help of an additional feature yuncorrelated with ~x. The latter variable,
often but not necessarily the invariant mass of the event, can be used to define two regions of interest: the signal
region M1and the control (or side-band) region M2, where the signal-to-background ratio is assumed to be higher in
M1than in M2. A weakly-supervised algorithm, CWoLa trains a classifier to distinguish between M1and M2. The
obtained output function s(~x) can then be mapped to LM1/M2(~x) through the likelihood ratio trick. The orthogonality
of yand ~x guarantees that LM1/M2(~x) is a monotonous function of LS/B (~x) and thus possesses in principle optimal
statistical power.
Usual applications of CWoLa use the learned optimal classifier s(~x) to select events of interest and assign a certain
significance to the difference in selected events in M1and M2. The difference in the resulting selection efficiencies
M1,2is a smoking-gun for the presence of signal in M1(and also M2). However, this is only true in the limit of infinite
statistics. In a realistic setting where the dataset is finite, quantifying the degree to which the difference in efficiencies
relates to the presence of signal is non-trivial. One common strategy is to assume that there is no signal in M2and
assess the agreement between the selected events in M1and a background extrapolation from M2.
Our method constitutes an alternative to assess how the learned output s(~x) encodes differences between M1and
M2caused by the presence of a signal. To introduce it, we focus on the density estimation framing of CWoLa, which
clearly defines a background-only or null hypothesis. At its heart, CWoLa is a mixture model where ~x and yare
assumed to be conditionally independent given the process label z={S, B}. After defining M1and M2using y, the
trained classifier output is a function s(~x) that inherits the conditional independence with respect to y. The statistical
model can be explicitly written as
p(s(~x), y|π) = (1 π)p(s(~x)|B)p(y|B) + π p(s(~x)|S)p(y|S),(1)
where πis the signal probability. The background-only hypothesis is explicitly written as p(s(~x), y|π= 0) and cor-
responds to the case where the observed data shows independence between s(~x) and y. This is the key observation
for our strategy. For a given measured dataset of pairs {s(~xi), yi}, one can assess whether they are statistically inde-
pendent. If statistical independence is ruled out, the background-only hypothesis is ruled out, provided conditional
independence holds. Conversely, if statistical independence cannot be ruled out, one has a clear statement about the
incapability of CWoLa to discern whether any difference between M1and M2originates from the presence of a signal
or is due to statistical fluctuations in the data.
Several tests of statistical independence exist for both discrete and continuous distributions, including mutual
information [10], Hoeffding’s D independence test [11] and distance correlation [12]. For simplicity, in the present
work we focus on the use of the estimated mutual information (MI) Iof the measured probability distribution. MI
encodes the exact property we want to test as it measures the difference between the joint distribution and the
marginals:
I(s, y) = DKL(p(s, y)||p(s)p(y)) (2)
=Zds dy p(s, y) log p(s, y)
p(s)p(y),(3)
where DKL(p, q) is the Kullback-Leibler divergence between two probability distributions, capturing how much in-
formation is lost when approximating the distribution pwith the distribution q. The MI thus captures how well one
can approximate the joint distribution by the product of its marginals and it is trivial to show that it vanishes for
independent variables. Conditional Independence can then be expressed as a vanishing MI conditioned on a given
process
I(s, y|z) = Zds dy p(s, y|z) log p(s, y|z)
p(s|z)p(y|z)= 0 .(4)
On the other hand, for the full dataset the possible mixture between the two processes encoded in π[0,1] results in
I(s, y)0,(5)
with the equality achieved when there is only one process or the two processes have the same probability distributions.
A very nice feature of the MI is that it has well behaved asymptotic properties in the limit of small MI and large
sample size [13]. Thus, we can estimate it from the measured sample of Nevents and obtain the p-value of said
摘要:

NullHypothesisTestforAnomalyDetectionJernejF.KamenikJozefStefanInstitute,Jamova39,1000Ljubljana,SloveniaandFacultyofMathematicsandPhysics,UniversityofLjubljana,Jadranska19,1000Ljubljana,SloveniaManuelSzewcyJozefStefanInstitute,Jamova39,1000Ljubljana,SloveniaWeextendtheuseofClassi cationWithoutLab...

展开>> 收起<<
Null Hypothesis Test for Anomaly Detection Jernej F. Kamenik Jo zef Stefan Institute Jamova 39 1000 Ljubljana Slovenia and.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:643.29KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注