FERMILAB-PUB-22-741-SCD SciPost Physics Submission New Machine Learning Techniques for Simulation-Based Inference

2025-05-06 0 0 1.69MB 36 页 10玖币
侵权投诉
FERMILAB-PUB-22-741-SCD
SciPost Physics Submission
New Machine Learning Techniques for Simulation-Based Inference:
InferoStatic Nets, Kernel Score Estimation, and Kernel Likelihood
Ratio Estimation
Kyoungchul Kong1, Konstantin T. Matchev2, Stephen Mrenna3, and Prasanth Shyamsundar4?,
1Department of Physics and Astronomy, University of Kansas, Lawrence, KS 66045, USA
2Institute for Fundamental Theory, Physics Department, University of Florida,
Gainesville, FL 32611, USA
3Scientific Computing Division, Fermi National Accelerator Laboratory, Batavia, IL 60510, USA
4Fermilab Quantum Institute, Fermi National Accelerator Laboratory, Batavia, IL 60510, USA
?prasanth@fnal.gov
October 4, 2022
Abstract
We propose an intuitive, machine-learning approach to multiparameter inference, dubbed the
InferoStatic Networks (ISN) method, to model the score and likelihood ratio estimators in cases
when the probability density can be sampled but not computed directly. The ISN uses a backend
neural network that models a scalar function called the inferostatic potential ϕ. In addition, we
introduce new strategies, respectively called Kernel Score Estimation (KSE) and Kernel Likelihood
Ratio Estimation (KLRE), to learn the score and the likelihood ratio functions from simulated data.
We illustrate the new techniques with some toy examples and compare to existing approaches in
the literature. We mention en passant some new loss functions that optimally incorporate latent
information from simulations into the training procedure.
Contents
1 Introduction 2
1.1 Applications of Estimated Scores and Likelihood Ratios 3
1.2 Related Techniques and New Contributions in This Work 4
2 Methodology: InferoStatic Networks (ISNs) 5
3 Methodology: Kernel Score Estimation 8
3.1 Intuition and Motivation 8
3.2 Kernel Score Approximation 11
3.3 Kernel Score Estimation using ML 12
3.4 Alternative Version of Kernel Score Approximation and Estimation 14
4 Methodology: Kernel Likelihood Ratio Estimation 16
5 Experiments and Results 18
5.1 Tasks 18
1
arXiv:2210.01680v2 [stat.ML] 3 Feb 2023
SciPost Physics Submission
5.2 NN Architecture and Training Details 20
5.3 Results 20
6 Conclusions and Outlook 22
A New Loss Functions to Utilize Latent Information 25
A.1 Background 25
A.2 New Loss Functions 26
A.3 Proofs 27
B Feed-Forward Nature of the Gradient Network 29
C Derivation of the Kernel Score Estimation Technique 30
D Narrowing Down the Choices for KSE 31
D.1 Bias of the Kernel Score Approximation 31
D.2 Local Variance of the Regression Target 32
D.3 Choosing the Kernel and Difference Function 32
References 34
1 Introduction
Inference in physical sciences, such as particle physics, relies on comparing detailed predictions
from computationally expensive simulations to data. These predictions depend upon input pa-
rameters that are the objects of interest in parameter-estimation analyses. Classical inference
techniques for parameter measurement include the analysis of histograms of summary statistics,
the matrix element method, optimal observables, etc. (see [1,2]for recent reviews and a guide to
the literature). More recently, there has been an explosion of interest in corresponding Machine
Learning (ML) techniques for parameter measurement, which rely only on samples generated at
different parameter values. The basic appeal of the ML approach is that it can leverage high-
dimensional information not captured by summary statistics. An up-to-date compendium of the
literature on ML applications in particle physics is maintained at [3].
The ML problem at hand can be described as follows. Let x= (x1, . . . , xD)be a D-dimensional
random variable (datapoint; collision event in the context of collider physics) whose unit-normalized
distribution under a given theory model is p(x;θ), where θ(θ1, . . . , θd)is a d-dimensional con-
tinuous parameter of the model. A standard problem, also encountered in high energy physics, is
to estimate the value of the parameter θusing sets of Nindependent datapoints X≡ {x1, . . . , xN},
with each set produced using the same value of θ=θtrue.1The following definitions are often
relevant in the context of such estimations:
s(x;θ)≡ ∇
θln p(x;θ), (1a)
1We will assume that the theory model psatisfies all the conditions for the maximum likelihood estimator for θtrue to
be asymptotically consistent, for all possible values of θtrue. Among other things, this ensures that if p(x;θ) = p(x;θ0)
almost everywhere, then θ=θ0.
2
SciPost Physics Submission
r(x;θ0,θ1)p(x;θ0)
p(x;θ1), (1b)
rref(x;θ)p(x;θ)
pref(x), (1c)
where
θrepresents the d-dimensional gradient with respect to θand pref is a reference dis-
tribution. The d-dimensional function sis referred to as the score function, rref as the “singly
parameterized likelihood ratio function” (because it has one θ), and ras the “doubly parameter-
ized likelihood ratio” (because it involves θ0and θ1). In the rest of this paper, the term “likelihood
ratio function” refers to the doubly parameterized likelihood ratio r, unless otherwise stated.
In many situations, there is no feasible technique to compute pdirectly, particularly when the
dimensionality of xor θis large. Nevertheless, there can exist an oracle to produce datapoints
distributed according to pfor any chosen value of θ. Several approaches have been developed to
learn the function pitself using simulated data produced by such an oracle. The learned function
ˆ
p(x;θ)can be used for estimating θtrue and for a number of other tasks, such as event generation,
unfolding [4], and anomaly detection [5]. However, for high-dimensional data, it is often easier
to train a neural network to learn the likelihood ratio rather than the likelihood function itself.
This motivates alternative approaches which use the simulated data to estimate the functions s,
r, and rref over a range of θusing ML techniques [611]. These learned s,r, and rref functions
can then be used in the estimation of θtrue from experimental data, as well as for other related
tasks, as reviewed in Section 1.1, using standard methods like gradient descent, etc. In this paper,
we introduce some new ML strategies to learn the score function sand likelihood ratio function r
from the simulated data, particularly in those situations where the function pcan be sampled but
not computed directly.
1.1 Applications of Estimated Scores and Likelihood Ratios
Here we briefly review some applications of the score function sand likelihood ratio functions rref
and rafter they are estimated from simulations.
Direct parameter estimation. The unknown value of θcan be estimated from an experimental
dataset xiN
i=1of Npoints, sampled independently from p(x;θtrue), using the parameterized
likelihood ratio functions and/or the score function. For example, the maximum likelihood esti-
mator can be written as [6]
ˆ
θMLE =arg min
θ01
N
N
X
i=1
ln rref(xi;θ0). (2)
Similarly, Ref. [12]showed how to estimate θusing the binary cross-entropy loss function, written
as
ˆ
θBCE =arg min
θ01
N
N
X
i=1
ln rref(xi;θ0)
1+rref(xi;θ0)1
Nref
Nref
X
i=1
ln 1
1+rref(xref
i;θ0), (3)
which uses the experimental data xiN
i=1produced under the true unknown θtrue and the addi-
tional xref
iNref
i=1simulated dataset produced under a reference value θref.
Eqs. (2) and (3) provide two ways for using the rref function to estimate θ. Furthermore,
if the optimization in (2) and (3) is to be performed using gradient-based techniques, the score
3
SciPost Physics Submission
function implicitly becomes relevant. The gradient of the objective function in (2) with respect to
θ0is given by
θ01
N
N
X
i=1
ln rref(xi;θ0)=1
N
N
X
i=1
s(xi;θ0). (4)
Likewise, the gradients of the two terms in (3) are given by
θ0ln rref(x;θ0)
1+rref(x;θ0)=1
1+rref(x;θ0)s(x;θ0), (5a)
θ0ln 1
1+rref(x;θ0)=rref(x;θ0)
1+rref(x;θ0)s(x;θ0). (5b)
Equations (2)-(5) show how the score function or the singly parameterized likelihood ratio func-
tion could be used to perform the maximum likelihood estimation of θ.
In the context of high energy physics, this technique can be used to estimate either theory
parameters or nuisance parameters. The former is usually referred to as parameter measurement,
while the latter is referred to as parameter tuning [13]. Theory parameter measurement and
nuisance parameter tuning often have different requirements and standards on a) uncertainty
quantification, b) interpretability of the estimation technique, and c) how validatable the simu-
lation models are for the purposes of the chosen estimation technique. For example one should
opt for highly validatable estimation techniques for theory parameter measurements, in order to
be robust against unknown errors in the simulation-models (i.e., not accounted for by known
systematic uncertainties). On the other hand, nuisance parameter tuning methods should en-
sure that the systematic uncertainties corresponding to the relevant nuisance parameters are not
underestimated in final results.
Locally optimal observables. The score function evaluated at θ=θ0is a sufficient statistic, i.e.,
optimal variable, for the estimation of a parameter θnear θ0[14]. In this way, the learned score
ˆ
scan be used as an optimal analysis variable, provided that one expects the true value to be in
the vicinity of θ0.
Dataset reweighting. The knowledge of the parameterized likelihood ratio functions allow us to
reweight events produced under one value of θ, say θ0, to emulate a dataset produced under a
different value, say θ1[15,16]. In this case, the appropriate weighting function will be
weight(x) = p(x;θ1)
p(x;θ0)=r(x;θ1,θ0). (6)
1.2 Related Techniques and New Contributions in This Work
This work builds on a previous, related body of knowledge [1,612]. To provide some context,
Table 1 lists some of the existing simulation-based score, likelihood, and likelihood ratio estima-
tors, categorizing them according to i) which of the three quantities in (1) they estimate (rows)
and ii) whether or not they use additional latent information from the simulators (columns). Four
new and distinct contributions are presented in this work, listed below.
1. In Section 2, we propose an intuitive approach to model the estimators ˆ
s(x;θ)and ˆ
r(x;θ0,θ1)
via a backend neural network for a scalar function ˆ
ϕ(x,θ). This approach, dubbed the In-
feroStatic Networks method (ISN), offers some advantages over directly modeling ˆ
sand
ˆ
rusing neural networks.
4
SciPost Physics Submission
Table 1: The landscape of the simulation-based score and likelihood ratio estimators
described in this paper and the existing approaches in the literature. The ISN approach
described in Section 2can be applied to all these cases.
Only requires observable
data from the simulator
Requires additional latent
simulation information
Singly parameterized
likelihood estimator
NDE [7]
MEM [17] MadMiner [8][SCANDAL]
Singly parameterized
likelihood ratio estimator (to
a reference distribution)
MadMiner [6][CARL]
• DCTR [12]
MadMiner [ROLR, ALICE,
CASCAL, RASCAL, ALICES]
Doubly parameterized
likelihood ratio estimation
MadMiner [CARL]
KLRE [Section 4]
MadMiner [ROLR, ALICE,
CASCAL, RASCAL, ALICES]
This work [Appendix A]
Score estimator KSE [Section 3] MadMiner [9]
[SALLY, SALLINO]
2. In Section 3, we introduce a technique, dubbed Kernel Score Estimation (KSE), to train a
network to learn the score function sfrom simulated data.
3. In Section 4, we introduce a technique, dubbed Kernel Likelihood Ratio Estimation (KLRE),
to learn the doubly parameterized likelihood ratio function rfrom simulated data. This
technique generalizes the previously known CARL technique for learning r[6].
4. In Appendix A, we provide some new loss functions for incorporating additional latent in-
formation from the simulation pipeline into the training of ˆ
r.
In Section 5, we illustrate the new techniques with some toy examples and compare to the cor-
responding approaches already existing in the literature. Section 6 is reserved for our conclusions.
Several technical discussions and derivations are collected in the appendices.
2 Methodology: InferoStatic Networks (ISNs)
This work is focused on developing ML techniques to infer the score or likelihood ratio. The stan-
dard approach in the literature is to model ˆ
s(x;θ)and ˆ
r(x;θ0,θ1)directly as neural networks.
However, inspired by the definitions of sand rin (1), we propose to use a neural network to
model a scalar function ˆ
ϕ(x,θ), and define ˆ
sand ˆ
rvia ˆ
ϕas
ˆ
s(x;θ)≡ ∇
θˆ
ϕ(x,θ), (7a)
ˆ
r(x;θ0,θ1)exp ˆ
ϕ(x,θ0)ˆ
ϕ(x,θ1)=exp ˆ
ϕ(x,θ0)
exp ˆ
ϕ(x,θ1). (7b)
Here ˆ
ϕplays the same role in the definitions of ˆ
sand ˆ
ras ln pdoes in the definitions of sand r
in (1). We dub ˆ
ϕas the “inferostatic potential”, in analogy with the electrostatic potential from
5
摘要:

FERMILAB-PUB-22-741-SCDSciPostPhysicsSubmissionNewMachineLearningTechniquesforSimulation-BasedInference:InferoStaticNets,KernelScoreEstimation,andKernelLikelihoodRatioEstimationKyoungchulKong1,KonstantinT.Matchev2,StephenMrenna3,andPrasanthShyamsundar4?,1DepartmentofPhysicsandAstronomy,UniversityofK...

展开>> 收起<<
FERMILAB-PUB-22-741-SCD SciPost Physics Submission New Machine Learning Techniques for Simulation-Based Inference.pdf

共36页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:36 页 大小:1.69MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 36
客服
关注