Resonant anomaly detection without background sculpting Anna Hallin1Gregor Kasieczka2 3Tobias Quadfasel2David Shih1and Manuel Sommerhalder2 1NHETC Dept. of Physics and Astronomy Rutgers University Piscataway NJ 08854 USA

2025-04-29 0 0 1.3MB 11 页 10玖币
侵权投诉
Resonant anomaly detection without background sculpting
Anna Hallin,1, Gregor Kasieczka,2, 3, Tobias Quadfasel,2, David Shih,1, §and Manuel Sommerhalder2,
1NHETC, Dept. of Physics and Astronomy, Rutgers University, Piscataway, NJ 08854, USA
2Institut f¨ur Experimentalphysik, Universit¨at Hamburg, 22761 Hamburg, Germany
3Center for Data and Computing in Natural Sciences (CDCS), 22607 Hamburg, Germany
We introduce a new technique named Latent CATHODE (LaCathode) for performing “enhanced
bump hunts”, a type of resonant anomaly search that combines conventional one-dimensional bump
hunts with a model-agnostic anomaly score in an auxiliary feature space where potential signals could
also be localized. The main advantage of LaCathode over existing methods is that it provides an
anomaly score that is well behaved when evaluating it beyond the signal region, which is essential to
prevent the sculpting of background distributions in the bump hunt. LaCathode accomplishes this
by constructing the anomaly score directly in the latent space learned by a conditional normalizing
flow trained on sideband regions. We demonstrate the superior stability and comparable performance
of LaCathode for enhanced bump hunting in an illustrative toy example as well as on the LHC
Olympics R&D dataset.
I. INTRODUCTION
Despite countless searches for new physics at the LHC,
so far no evidence for physics beyond the Standard Model
was found. The vast majority of these searches are model
specific, motivated by and optimized for particular sce-
narios and particle spectra. Recently there has been
much interest in the possibility that new physics could
be present in the data but we simply have not searched
in the right places yet. This has led to an enormous
activity in developing new methods for model-agnostic
searches at the LHC (see e.g. [1,2] for recent community
overviews of anomaly detection and [3] for a more general
overview of machine learning methods to search for new
physics).
One promising class of approaches can be referred to as
“enhanced bump hunts”, where the idea is to upgrade a
standard one-dimensional bump hunt, e.g. in an invariant
mass m1, to a multivariate setting. This is achieved by
including an anomaly score R(x) learned from auxiliary
features xRdwhere the signal may also be localized,
but in an a priori unknown way.
In general, enhanced bump hunts follow these steps:
i) Designate nonoverlapping signal region (SR) and
sidebands (SB) in m.
ii) Derive an anomaly score R(x) and select events that
pass a threshold value R(x)> Rc.
iii) Fit a suitable (e.g. falling spectrum) background-
only function to the selected events in the SB.
anna.hallin@rutgers.edu
gregor.kasieczka@uni-hamburg.de
tobias.quadfasel@uni-hamburg.de
§shih@physics.rutgers.edu
manuel.sommerhalder@uni-hamburg.de
1We will use mfor illustration in this text, but all features in
which the signal is resonant and the background is smooth can be
used [4].
iv) Compare the background-only prediction from step
iii) to data in the SR and derive limits or claim dis-
covery.
Methods for enhanced bump hunts include those con-
structed using autoencoders [5,6] or based on weak su-
pervision [7,8]. While weak supervision allows the con-
struction (in an ideal case) of a provably optimal anomaly
score—see Section II A for details—correlations between
the bump hunt feature and auxiliary features can spoil
these methods. This observation has motivated the de-
velopment of a number of new techniques that aim to
improve the sensitivity and stability of anomaly detec-
tion in the presence of correlations [913]. In particular,
the recently proposed Cathode [12] and Curtains [13]
techniques have been demonstrated to achieve close-to-
optimal signal sensitivity, even in the presence of corre-
lations between features.
This paper is concerned with another issue that has
received less attention but still might spoil the practi-
cal application of enhanced bump hunts: background
sculpting. The enhanced bump hunting procedure out-
lined above can only work if the cut introduced in ii)
does not sculpt the background (i.e. introduce artificial
bumps in the background-only mspectrum). Alas, state-
of-the-art protocols like Cathode and Curtains have no
built-in measures to prevent such sculpting. Even worse,
the anomaly score of these approaches is only derived for
the SR, leading to potentially unpredictable extrapola-
tion behavior elsewhere.
The scope of this paper is to clearly identify this sculpt-
ing issue and to provide a viable solution. In Sec. II, we
first discuss enhanced bump hunt strategies and then in-
troduce the novel LaCathode approach. Section III uses
an analytic toy model to illustrate the problem and shows
that correlations between mand the auxiliary features
are the root cause of background sculpting. It also shows
that LaCathode indeed successfully mitigates this issue.
Section IV reiterates these points, but in the context
of the more physically motivated LHC Olympics R&D
dataset [14]. Section Vconcludes this work.
arXiv:2210.14924v2 [hep-ph] 10 Jul 2023
2
II. METHOD
A. Existing strategies for enhanced bump hunts
According to the Neyman-Pearson lemma [15], the
provably optimal anomaly score for any model-agnostic
search would be:
R(x) = pdata(x)
pbg(x)(1)
where pdata(x) and pbg(x) are the probability densities
of the data and the background respectively. Of course,
in practice we never have access to this likelihood ratio,
since the probability densities of data and background are
in general intractable. At best, one could hope for a large
number of samples drawn from the data and true back-
ground distributions; then one could approximate R(x)
with a classifier trained on these samples. We will refer
to this approximation of (1) as the “idealized anomaly
detector” throughout.
Since it is generally not possible to draw samples from
the true pbg(x) in a realistic anomaly search scenario, we
can at best approximate this idealized case either with
simulations or in a data-driven way. The focus here will
be on the latter strategy.
The challenge then is to obtain a high-quality estimate
for pbg(x) from data, e.g. by interpolating from sidebands
(SB) in minto a signal region (SR), and use weak super-
vision to obtain an anomaly score R(x). As long as a
cut on R(x)> Rcdoes not sculpt the mdistribution,
one can combine this cut with the 1D bump hunt in m
to greatly enhance the significance of the signal over the
background.
In the original enhanced bump hunt method, called
CWoLa-Hunting [8], R(x) comes from a SR vs SB clas-
sifier. This works as long as the features xand mare
statistically independent in the background (i.e. the x
features are distributed identically in the SR and the SB
for the background). This also ensures that R(x)> Rc
will not sculpt the mdistribution. Using these proper-
ties, the full enhanced bump hunt search strategy using
CWoLa-Hunting was successfully demonstrated on toy
simulation data [8,16], and then implemented on actual
data by the ATLAS Collaboration in [17].
However, it can be challenging to ensure that xand m
are independent in the background. Even a small corre-
lation can degrade or destroy the sensitivity of CWoLa
Hunting to anomalies. This has motivated the develop-
ment of alternative approaches that are more robust to
correlations.
In Anode [9], one learns pdata(x) and pbg (x) using
conditional density estimators trained on the data
with mSR and with mSB; the latter are
automatically interpolated in minto the SR, which
alleviates the problem with correlations between x
and m. It was shown in [12] that in the presence of
correlations between xand m, the signal sensitivity
of Anode is robust while that of CWoLa-Hunting
collapses.
In Cathode [12], one learns pbg (x) using the SB
density estimator just as in Anode. However, in-
stead of the second SR density estimator (which
will be more difficult to learn as it must also capture
the tiny deviations from the smooth pbg(x) from
a small localized signal), one samples from pbg(x)
in the SR, and trains a classifier (as in CWoLa-
Hunting) between the data and the synthetic back-
ground samples. Cathode thereby captures the
best of both Anode and CWoLa-Hunting, achiev-
ing a signal sensitivity that is nearly optimal and
yet robust to correlations between xand m.
Finally, the Curtains [13] protocol operates similar
to Cathode, with the main difference that condi-
tional invertible neural networks (cINNs) are used
to map background examples from the SB into the
SR.
B. The problem of background sculpting
So far, apart from CWoLa-Hunting, the majority of
the effort has been invested in exploring data-driven ap-
proaches to learn R(x) as accurately as possible from
sidebands, while much less attention has been paid to
the issue of background sculpting. However, signal sen-
sitivity is not the only component of a successful new
physics search; background estimation is also essential.
In the presence of correlations between xand min the
background events, one must also show that R(x), even
if ideal, does not sculpt the background mdistribution
around the signal region, which would prevent back-
ground estimation via the 1D bump hunt. See Fig. 1
for an illustration of such correlated input features.
Note that, in any complete enhanced bump hunt strat-
egy, two data-driven background estimations must take
place:
1. An interpolation of the learned pbg (x) from SB to
SR in order to construct R(x).
2. After cutting on R(x)> Rc, we proceed with the
usual 1D bump hunt: an interpolation in the m
distribution from SB to SR (e.g. by fitting a suitable
functional form to the data excluding the SR).
This work is concerned with ensuring the robustness
of the second estimation. We will demonstrate—using
both a simple analytic toy model and examples drawn
from the LHC Olympics 2020 R&D dataset [14]—that
in the presence of correlations between xand m, cutting
on the learned R(x) can result in significant sculpting of
the mdistribution. This can be understood by the fact
that R(x) must be a more-or-less smooth function of x,
so any correlations of mwith xwill be inherited by R(x).
Furthermore, R(x) was learned using events in the SR, so
3
Signal Region
Lower Sideband Upper Sideband
m
log(counts)
x0
x1
x1
x0
x1
x0
FIG. 1. Illustration of the correlation of (hypothetical) input
features x0and x1with min the background. This figure
describes the situation mentioned in the text, where one can
clearly observe that the background distributions of input fea-
tures x0and x1change dramatically from the lower sideband
(low m) to the upper sideband (high m) and thus are strongly
correlated with m.
it has to be extrapolated from SR to SB in order to apply
the threshold R(x)> Rceverywhere. This extrapolation
could lead to unpredictable effects, including sculpting,
especially in the presence of strong correlations between
xand m.
C. LaCATHODE to the rescue
After identifying the issue that leads to sculpting of the
background mdistribution, we present a solution to the
problem, which is outlined in Fig. 2and described in the
following. We call our new approach Latent CATHODE
or LaCathode for short, because it is closely derived
from the Cathode method.
The solution actually lies at the heart of the Cathode
method: the SB density estimator is a conditional nor-
malizing flow, which is an invertible map ffrom data
space xto a latent space z, for every value of m:
z=f(x;m) (2)
The background events in the latent space zare supposed
to follow a simple prespecified base distribution, which we
take to be the unit normal distribution N(µ= 0, σ = 1)d
for concreteness.
The idea of LaCathode is to train the classifier be-
tween SR data and SR background in the latent space z
instead of in the physical feature space x. Since fmaps x
to the same latent space for every value of m, working in
the latent space has the effect of decorrelating the data
from min the background, which should eliminate the
problem of sculpting. In other words, since the zspace is
always the same for every m, no extrapolation is needed
to evaluate R(z) outside the SR where it was learned.2
Furthermore, since fis invertible, and likelihood ratios
are invariant under coordinate reparametrizations, the z-
space classifier should in principle be as asymptotically
optimal as the x-space classifier, i.e.
R(x) = pdata(x)
pbg(x)=pdata(z)
pbg(z)=R(z) (3)
The performance of the Cathode and LaCathode
methods similarly rely on the quality of the trained and
interpolated normalizing flow. For Cathode it con-
trols the fidelity of the pbg (x) estimate, whereas for
LaCathode the flow is responsible for pdata(z). If the
background events in data were not mapped to a unit
normal distribution, both the learning of the likelihood
ratio via the classifier and the decorrelation of auxiliary
features from the resonant one would deteriorate.
We will show with examples in the following sections
that LaCathode retains much of the excellent signal sen-
sitivity as Cathode, while avoiding the sculpting of the
mdistribution in the presence of correlations.3
While LaCathode seems to be the superior en-
hanced bump hunting method, all is not lost for original
Cathode—it remains a robust and powerful anomaly
detection method as long as the correlations are suffi-
ciently small. This is the case for the original feature set
of the LHC Olympics 2020 R&D dataset—as discussed
in [12], these have percent-level correlations with m, and
we showed there that Cathode signal sensitivity remains
robust to this small correlation (unlike CWoLa-Hunting,
which is more fragile to correlations). In this paper we
demonstrate that the mdistribution is also not sculpted
after a cut on R(x).
III. TOY MODEL
We begin by demonstrating the idea of LaCathode
with a simple 1+2D toy model. In the first part, we will
investigate how correlations between xand maffect the
2A closely related approach would be to use the invertible map
ftwice to decorrelate xfrom min the SB regions: xx=
f1(f(x, m), m0) for some suitable choice of m0SR. Then one
could apply the anomaly score to xand mitigate the background
sculpting issue. We thank B. Nachman for this suggestion. We
also observe that a similar map xxis available directly from
the Curtains method, without having to pass through the latent
space; this could be used to prevent background sculpting in the
Curtains method.
3Another minor advantage of mapping the SR data to the la-
tent space for the classification task is that the values of mfor
this transformation are directly available, which simplifies things
somewhat. In the case of Cathode, the mapping from the latent
space samples to the data space via f1(z;m) needs a separate
estimation of the SR mdensity.
摘要:

ResonantanomalydetectionwithoutbackgroundsculptingAnnaHallin,1,∗GregorKasieczka,2,3,†TobiasQuadfasel,2,‡DavidShih,1,§andManuelSommerhalder2,¶1NHETC,Dept.ofPhysicsandAstronomy,RutgersUniversity,Piscataway,NJ08854,USA2Institutf¨urExperimentalphysik,Universit¨atHamburg,22761Hamburg,Germany3CenterforDat...

收起<<
Resonant anomaly detection without background sculpting Anna Hallin1Gregor Kasieczka2 3Tobias Quadfasel2David Shih1and Manuel Sommerhalder2 1NHETC Dept. of Physics and Astronomy Rutgers University Piscataway NJ 08854 USA.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:1.3MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注