Generative models uncertainty estimation L Anderlini1 C Chimpoesh2 N Kazeev2and A Shishigina2on behalf of the LHCb collaboration

2025-05-06 0 0 1.4MB 6 页 10玖币

侵权投诉

Generative models uncertainty estimation

L Anderlini1, C Chimpoesh2, N Kazeev2and A Shishigina2on behalf

of the LHCb collaboration

1Universita e INFN, Firenze (IT), via Sansone 1, 50019 Sesto Fiorentino (FI), Italia

2HSE University, 20 Myasnitskaya st., Moscow 101000, Russia

E-mail: nikita.kazeev@cern.ch

Abstract. In recent years fully-parametric fast simulation methods based on generative

models have been proposed for a variety of high-energy physics detectors. By their nature,

the quality of data-driven models degrades in the regions of the phase space where the data

are sparse. Since machine-learning models are hard to analyse from the physical principles, the

commonly used testing procedures are performed in a data-driven way and can’t be reliably

used in such regions. In our work we propose three methods to estimate the uncertainty of

generative models inside and outside of the training phase space region, along with data-driven

calibration techniques. A test of the proposed methods on the LHCb RICH fast simulation is

also presented.

1. Introduction

In the last years, deep learning has become a common tool in natural science. Generative

models, such as generative adversarial networks (GANs) [1], variational autoencoder (VAE) [2],

normalising ﬂows [3], and diﬀusion models [4], can learn to sample from a distribution eﬃciently.

They are used for fully-parametric simulation of detectors – in place of computationally-intensive

simulation from the physical principles, usually with Geant4 [5, 6, 7, 8].

Neural networks (NN) are black-box models that don’t provide theoretical guarantees on the

uncertainty of the prediction. This makes it diﬃcult to use them in rigorous scientiﬁc reasoning.

Uncertainty of machine learning models is an active area of research, but almost all works deal

with classiﬁcation and regression tasks, not generative modelling [9]. A recent work [10] shows

how Bayesian normalising ﬂows capture uncertainties.

Our work extends uncertainty estimation research by introducing new methods for estimating

the uncertainty of GANs. Comparing with [10], in practice GANs are usually faster in training

and inference, and more accurate than normalising ﬂows, and thus are more widely used for fast

simulations in high-energy physics. The contributions of this work are summarised as follows:

•We propose methods for estimating uncertainty of GANs

•We propose an approach to distillate the ensemble into a single model for eﬃcient

uncertainty computation

2. LHCb RICH fast simulation

In the LHCb experiment, the new fully-parametric simulation of Ring-Imaging Cherenkov

detectors (RICH) is based on training a fully-connected Cramer GAN [11, 12] to approximate

the reconstructed detector response. It is trained using the real data calibration samples [13].

arXiv:2210.09767v1 [cs.LG] 18 Oct 2022

RICH particle identiﬁcation works as follows. First, the likelihoods for each particle type

hypothesis are computed for each track. Second, the delta log-likelihoods are computed as

the diﬀerence between the given hypothesis and the pion hypothesis. The variables are named

RichDLL*, where *can be k(kaon), p(proton), mu (muon), e(electron) and bt (below the

threshold of emitting Cherenkov light).

For the generator, input x∈X⊂R3+dnoise consists of kinematic characteristics of particles

(pseudorapidity η, momentum P, number of tracks) and random noise. The output y∈Y⊂R5

corresponds to the delta log-likelihoods.

3. Uncertainty estimation methods

3.1. MC dropout

Common dropout [14] acts as a regularisation to avoid overﬁtting when training an NN. The

dropout is applied at both training and inference for Monte Carlo dropout (MC dropout) [15].

The prediction is no longer deterministic but depends on which NN nodes are randomly chosen

to be kept. The MC dropout generates random predictions, and the latter has the interpretation

of samples from a probabilistic distribution.

In our work, for MC dropout experiments, we add a dropout layer after each fully connected

one and train with the same conﬁguration as before. In the beginning, we used Bernoulli

dropout, and then we experimented with Gaussian and Variational dropouts [16]. Finally, we

found that the “structured” dropout modiﬁcation (neuron with the neighborhood of arbitrary

size kzeroed with probability p) improves uncertainty quality.

During inference, for each batch we generate a ﬁxed set of dropout masks as a way to have

a virtual ensemble.

3.2. Adversarial deep ensembles

Ensemble methods are a widely-used heuristic uncertainty estimation method [17]. The core

idea of ensembles is to introduce perturbations to the training procedure that shouldn’t aﬀect

the outputs. Thus, the observed deviation in outputs is considered as uncertainty.

These perturbations can be implemented using randomisation techniques such as bagging and

random initialisation of the NN parameters. Bagging on average uses 63% unique data points

which leads to a biased performance estimate [17]. The diversity of the ensemble also tends

to zero with the increase of training dataset size. While a reasonable outcome for in-domain

uncertainty, it renders bagging unsuitable for out-domain uncertainty estimation.

In our method we start with the idea of diversity through NN weights. In addition to

random initialisation, we add a component to the loss function that rewards the models for

being diﬀerent. For Cramer GAN [12] the loss function is modiﬁed as follows:

f(y) = ||D(y)−D(y0

g)||2− ||D(y)||2,(1)

LG=f(yr)−f(yg)−α||D(yg)−D(ySg)||2

α||D(yg)−D(ySg)||2

α||D(yg)−D(ySg)||2,(2)

where yrare real data, ygare generated data, and ySgis a concatenation of the predictions

of the ensemble, corresponding to a model with averaged probability density; α≥0 is a

hyperparameter. The method is not speciﬁc to Cramer GAN and can be used with any GAN.

We reduce the inﬂuence of the adversarial component as the training progresses. Using

pre-trained discriminators leads to more variety among the models. The overall training

scheme is summarised in Algorithm 1. As αtends to zero, each ensemble member is trained

without additional bias. This provide a principled advantage to adversarial ensembles: instead

of heuristically perturbing the training objective as common in other methods [9], we take

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GenerativemodelsuncertaintyestimationLAnderlini1,CChimpoesh2,NKazeev2andAShishigina2onbehalfoftheLHCbcollaboration1UniversitaeINFN,Firenze(IT),viaSansone1,50019SestoFiorentino(FI),Italia2HSEUniversity,20Myasnitskayast.,Moscow101000,RussiaE-mail:nikita.kazeev@cern.chAbstract.Inrecentyearsfully-parame...

展开>> 收起<<

Generative models uncertainty estimation L Anderlini1 C Chimpoesh2 N Kazeev2and A Shishigina2on behalf of the LHCb collaboration.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Generative models uncertainty estimation L Anderlini1 C Chimpoesh2 N Kazeev2and A Shishigina2on behalf of the LHCb collaboration

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: