Generative models uncertainty estimation L Anderlini1 C Chimpoesh2 N Kazeev2and A Shishigina2on behalf of the LHCb collaboration

2025-05-06 0 0 1.4MB 6 页 10玖币
侵权投诉
Generative models uncertainty estimation
L Anderlini1, C Chimpoesh2, N Kazeev2and A Shishigina2on behalf
of the LHCb collaboration
1Universita e INFN, Firenze (IT), via Sansone 1, 50019 Sesto Fiorentino (FI), Italia
2HSE University, 20 Myasnitskaya st., Moscow 101000, Russia
E-mail: nikita.kazeev@cern.ch
Abstract. In recent years fully-parametric fast simulation methods based on generative
models have been proposed for a variety of high-energy physics detectors. By their nature,
the quality of data-driven models degrades in the regions of the phase space where the data
are sparse. Since machine-learning models are hard to analyse from the physical principles, the
commonly used testing procedures are performed in a data-driven way and can’t be reliably
used in such regions. In our work we propose three methods to estimate the uncertainty of
generative models inside and outside of the training phase space region, along with data-driven
calibration techniques. A test of the proposed methods on the LHCb RICH fast simulation is
also presented.
1. Introduction
In the last years, deep learning has become a common tool in natural science. Generative
models, such as generative adversarial networks (GANs) [1], variational autoencoder (VAE) [2],
normalising flows [3], and diffusion models [4], can learn to sample from a distribution efficiently.
They are used for fully-parametric simulation of detectors – in place of computationally-intensive
simulation from the physical principles, usually with Geant4 [5, 6, 7, 8].
Neural networks (NN) are black-box models that don’t provide theoretical guarantees on the
uncertainty of the prediction. This makes it difficult to use them in rigorous scientific reasoning.
Uncertainty of machine learning models is an active area of research, but almost all works deal
with classification and regression tasks, not generative modelling [9]. A recent work [10] shows
how Bayesian normalising flows capture uncertainties.
Our work extends uncertainty estimation research by introducing new methods for estimating
the uncertainty of GANs. Comparing with [10], in practice GANs are usually faster in training
and inference, and more accurate than normalising flows, and thus are more widely used for fast
simulations in high-energy physics. The contributions of this work are summarised as follows:
We propose methods for estimating uncertainty of GANs
We propose an approach to distillate the ensemble into a single model for efficient
uncertainty computation
2. LHCb RICH fast simulation
In the LHCb experiment, the new fully-parametric simulation of Ring-Imaging Cherenkov
detectors (RICH) is based on training a fully-connected Cramer GAN [11, 12] to approximate
the reconstructed detector response. It is trained using the real data calibration samples [13].
arXiv:2210.09767v1 [cs.LG] 18 Oct 2022
RICH particle identification works as follows. First, the likelihoods for each particle type
hypothesis are computed for each track. Second, the delta log-likelihoods are computed as
the difference between the given hypothesis and the pion hypothesis. The variables are named
RichDLL*, where *can be k(kaon), p(proton), mu (muon), e(electron) and bt (below the
threshold of emitting Cherenkov light).
For the generator, input xXR3+dnoise consists of kinematic characteristics of particles
(pseudorapidity η, momentum P, number of tracks) and random noise. The output yYR5
corresponds to the delta log-likelihoods.
3. Uncertainty estimation methods
3.1. MC dropout
Common dropout [14] acts as a regularisation to avoid overfitting when training an NN. The
dropout is applied at both training and inference for Monte Carlo dropout (MC dropout) [15].
The prediction is no longer deterministic but depends on which NN nodes are randomly chosen
to be kept. The MC dropout generates random predictions, and the latter has the interpretation
of samples from a probabilistic distribution.
In our work, for MC dropout experiments, we add a dropout layer after each fully connected
one and train with the same configuration as before. In the beginning, we used Bernoulli
dropout, and then we experimented with Gaussian and Variational dropouts [16]. Finally, we
found that the “structured” dropout modification (neuron with the neighborhood of arbitrary
size kzeroed with probability p) improves uncertainty quality.
During inference, for each batch we generate a fixed set of dropout masks as a way to have
a virtual ensemble.
3.2. Adversarial deep ensembles
Ensemble methods are a widely-used heuristic uncertainty estimation method [17]. The core
idea of ensembles is to introduce perturbations to the training procedure that shouldn’t affect
the outputs. Thus, the observed deviation in outputs is considered as uncertainty.
These perturbations can be implemented using randomisation techniques such as bagging and
random initialisation of the NN parameters. Bagging on average uses 63% unique data points
which leads to a biased performance estimate [17]. The diversity of the ensemble also tends
to zero with the increase of training dataset size. While a reasonable outcome for in-domain
uncertainty, it renders bagging unsuitable for out-domain uncertainty estimation.
In our method we start with the idea of diversity through NN weights. In addition to
random initialisation, we add a component to the loss function that rewards the models for
being different. For Cramer GAN [12] the loss function is modified as follows:
f(y) = ||D(y)D(y0
g)||2− ||D(y)||2,(1)
LG=f(yr)f(yg)α||D(yg)D(ySg)||2
α||D(yg)D(ySg)||2
α||D(yg)D(ySg)||2,(2)
where yrare real data, ygare generated data, and ySgis a concatenation of the predictions
of the ensemble, corresponding to a model with averaged probability density; α0 is a
hyperparameter. The method is not specific to Cramer GAN and can be used with any GAN.
We reduce the influence of the adversarial component as the training progresses. Using
pre-trained discriminators leads to more variety among the models. The overall training
scheme is summarised in Algorithm 1. As αtends to zero, each ensemble member is trained
without additional bias. This provide a principled advantage to adversarial ensembles: instead
of heuristically perturbing the training objective as common in other methods [9], we take
摘要:

GenerativemodelsuncertaintyestimationLAnderlini1,CChimpoesh2,NKazeev2andAShishigina2onbehalfoftheLHCbcollaboration1UniversitaeINFN,Firenze(IT),viaSansone1,50019SestoFiorentino(FI),Italia2HSEUniversity,20Myasnitskayast.,Moscow101000,RussiaE-mail:nikita.kazeev@cern.chAbstract.Inrecentyearsfully-parame...

展开>> 收起<<
Generative models uncertainty estimation L Anderlini1 C Chimpoesh2 N Kazeev2and A Shishigina2on behalf of the LHCb collaboration.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:1.4MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注