FERMILAB-PUB-22-741-SCD SciPost Physics Submission New Machine Learning Techniques for Simulation-Based Inference

2025-05-06 0 0 1.69MB 36 页 10玖币

侵权投诉

FERMILAB-PUB-22-741-SCD

SciPost Physics Submission

New Machine Learning Techniques for Simulation-Based Inference:

InferoStatic Nets, Kernel Score Estimation, and Kernel Likelihood

Ratio Estimation

Kyoungchul Kong1, Konstantin T. Matchev2, Stephen Mrenna3, and Prasanth Shyamsundar4?,

1Department of Physics and Astronomy, University of Kansas, Lawrence, KS 66045, USA

2Institute for Fundamental Theory, Physics Department, University of Florida,

Gainesville, FL 32611, USA

3Scientiﬁc Computing Division, Fermi National Accelerator Laboratory, Batavia, IL 60510, USA

4Fermilab Quantum Institute, Fermi National Accelerator Laboratory, Batavia, IL 60510, USA

?prasanth@fnal.gov

October 4, 2022

Abstract

We propose an intuitive, machine-learning approach to multiparameter inference, dubbed the

InferoStatic Networks (ISN) method, to model the score and likelihood ratio estimators in cases

when the probability density can be sampled but not computed directly. The ISN uses a backend

neural network that models a scalar function called the inferostatic potential ϕ. In addition, we

introduce new strategies, respectively called Kernel Score Estimation (KSE) and Kernel Likelihood

Ratio Estimation (KLRE), to learn the score and the likelihood ratio functions from simulated data.

We illustrate the new techniques with some toy examples and compare to existing approaches in

the literature. We mention en passant some new loss functions that optimally incorporate latent

information from simulations into the training procedure.

Contents

1 Introduction 2

1.1 Applications of Estimated Scores and Likelihood Ratios 3

1.2 Related Techniques and New Contributions in This Work 4

2 Methodology: InferoStatic Networks (ISNs) 5

3 Methodology: Kernel Score Estimation 8

3.1 Intuition and Motivation 8

3.2 Kernel Score Approximation 11

3.3 Kernel Score Estimation using ML 12

3.4 Alternative Version of Kernel Score Approximation and Estimation 14

4 Methodology: Kernel Likelihood Ratio Estimation 16

5 Experiments and Results 18

5.1 Tasks 18

arXiv:2210.01680v2 [stat.ML] 3 Feb 2023

SciPost Physics Submission

5.2 NN Architecture and Training Details 20

5.3 Results 20

6 Conclusions and Outlook 22

A New Loss Functions to Utilize Latent Information 25

A.1 Background 25

A.2 New Loss Functions 26

A.3 Proofs 27

B Feed-Forward Nature of the Gradient Network 29

C Derivation of the Kernel Score Estimation Technique 30

D Narrowing Down the Choices for KSE 31

D.1 Bias of the Kernel Score Approximation 31

D.2 Local Variance of the Regression Target 32

D.3 Choosing the Kernel and Difference Function 32

References 34

1 Introduction

Inference in physical sciences, such as particle physics, relies on comparing detailed predictions

from computationally expensive simulations to data. These predictions depend upon input pa-

rameters that are the objects of interest in parameter-estimation analyses. Classical inference

techniques for parameter measurement include the analysis of histograms of summary statistics,

the matrix element method, optimal observables, etc. (see [1,2]for recent reviews and a guide to

the literature). More recently, there has been an explosion of interest in corresponding Machine

Learning (ML) techniques for parameter measurement, which rely only on samples generated at

different parameter values. The basic appeal of the ML approach is that it can leverage high-

dimensional information not captured by summary statistics. An up-to-date compendium of the

literature on ML applications in particle physics is maintained at [3].

The ML problem at hand can be described as follows. Let x= (x1, . . . , xD)be a D-dimensional

random variable (datapoint; collision event in the context of collider physics) whose unit-normalized

distribution under a given theory model is p(x;θ), where θ≡(θ1, . . . , θd)is a d-dimensional con-

tinuous parameter of the model. A standard problem, also encountered in high energy physics, is

to estimate the value of the parameter θusing sets of Nindependent datapoints X≡ {x1, . . . , xN},

with each set produced using the same value of θ=θtrue.1The following deﬁnitions are often

relevant in the context of such estimations:

s(x;θ)≡ ∇

θln p(x;θ), (1a)

1We will assume that the theory model psatisﬁes all the conditions for the maximum likelihood estimator for θtrue to

be asymptotically consistent, for all possible values of θtrue. Among other things, this ensures that if p(x;θ) = p(x;θ0)

almost everywhere, then θ=θ0.

SciPost Physics Submission

r(x;θ0,θ1)≡p(x;θ0)

p(x;θ1), (1b)

rref(x;θ)≡p(x;θ)

pref(x), (1c)

where ∇

θrepresents the d-dimensional gradient with respect to θand pref is a reference dis-

tribution. The d-dimensional function sis referred to as the score function, rref as the “singly

parameterized likelihood ratio function” (because it has one θ), and ras the “doubly parameter-

ized likelihood ratio” (because it involves θ0and θ1). In the rest of this paper, the term “likelihood

ratio function” refers to the doubly parameterized likelihood ratio r, unless otherwise stated.

In many situations, there is no feasible technique to compute pdirectly, particularly when the

dimensionality of xor θis large. Nevertheless, there can exist an oracle to produce datapoints

distributed according to pfor any chosen value of θ. Several approaches have been developed to

learn the function pitself using simulated data produced by such an oracle. The learned function

p(x;θ)can be used for estimating θtrue and for a number of other tasks, such as event generation,

unfolding [4], and anomaly detection [5]. However, for high-dimensional data, it is often easier

to train a neural network to learn the likelihood ratio rather than the likelihood function itself.

This motivates alternative approaches which use the simulated data to estimate the functions s,

r, and rref over a range of θusing ML techniques [6–11]. These learned s,r, and rref functions

can then be used in the estimation of θtrue from experimental data, as well as for other related

tasks, as reviewed in Section 1.1, using standard methods like gradient descent, etc. In this paper,

we introduce some new ML strategies to learn the score function sand likelihood ratio function r

from the simulated data, particularly in those situations where the function pcan be sampled but

not computed directly.

1.1 Applications of Estimated Scores and Likelihood Ratios

Here we brieﬂy review some applications of the score function sand likelihood ratio functions rref

and rafter they are estimated from simulations.

Direct parameter estimation. The unknown value of θcan be estimated from an experimental

dataset xiN

i=1of Npoints, sampled independently from p(x;θtrue), using the parameterized

likelihood ratio functions and/or the score function. For example, the maximum likelihood esti-

mator can be written as [6]

θMLE =arg min

θ0−1

i=1

ln rref(xi;θ0). (2)

Similarly, Ref. [12]showed how to estimate θusing the binary cross-entropy loss function, written

θBCE =arg min

θ0−1

i=1

ln rref(xi;θ0)

1+rref(xi;θ0)−1

Nref

i=1

ln 1

1+rref(xref

i;θ0), (3)

which uses the experimental data xiN

i=1produced under the true unknown θtrue and the addi-

tional xref

iNref

i=1simulated dataset produced under a reference value θref.

Eqs. (2) and (3) provide two ways for using the rref function to estimate θ. Furthermore,

if the optimization in (2) and (3) is to be performed using gradient-based techniques, the score

SciPost Physics Submission

function implicitly becomes relevant. The gradient of the objective function in (2) with respect to

θ0is given by

∇

θ0−1

i=1

ln rref(xi;θ0)=−1

i=1

s(xi;θ0). (4)

Likewise, the gradients of the two terms in (3) are given by

∇

θ0−ln rref(x;θ0)

1+rref(x;θ0)=−1

1+rref(x;θ0)s(x;θ0), (5a)

∇

θ0−ln 1

1+rref(x;θ0)=rref(x;θ0)

1+rref(x;θ0)s(x;θ0). (5b)

Equations (2)-(5) show how the score function or the singly parameterized likelihood ratio func-

tion could be used to perform the maximum likelihood estimation of θ.

In the context of high energy physics, this technique can be used to estimate either theory

parameters or nuisance parameters. The former is usually referred to as parameter measurement,

while the latter is referred to as parameter tuning [13]. Theory parameter measurement and

nuisance parameter tuning often have different requirements and standards on a) uncertainty

quantiﬁcation, b) interpretability of the estimation technique, and c) how validatable the simu-

lation models are for the purposes of the chosen estimation technique. For example one should

opt for highly validatable estimation techniques for theory parameter measurements, in order to

be robust against unknown errors in the simulation-models (i.e., not accounted for by known

systematic uncertainties). On the other hand, nuisance parameter tuning methods should en-

sure that the systematic uncertainties corresponding to the relevant nuisance parameters are not

underestimated in ﬁnal results.

Locally optimal observables. The score function evaluated at θ=θ0is a sufﬁcient statistic, i.e.,

optimal variable, for the estimation of a parameter θnear θ0[14]. In this way, the learned score

scan be used as an optimal analysis variable, provided that one expects the true value to be in

the vicinity of θ0.

Dataset reweighting. The knowledge of the parameterized likelihood ratio functions allow us to

reweight events produced under one value of θ, say θ0, to emulate a dataset produced under a

different value, say θ1[15,16]. In this case, the appropriate weighting function will be

weight(x) = p(x;θ1)

p(x;θ0)=r(x;θ1,θ0). (6)

1.2 Related Techniques and New Contributions in This Work

This work builds on a previous, related body of knowledge [1,6–12]. To provide some context,

Table 1 lists some of the existing simulation-based score, likelihood, and likelihood ratio estima-

tors, categorizing them according to i) which of the three quantities in (1) they estimate (rows)

and ii) whether or not they use additional latent information from the simulators (columns). Four

new and distinct contributions are presented in this work, listed below.

1. In Section 2, we propose an intuitive approach to model the estimators ˆ

s(x;θ)and ˆ

r(x;θ0,θ1)

via a backend neural network for a scalar function ˆ

ϕ(x,θ). This approach, dubbed the In-

feroStatic Networks method (ISN), offers some advantages over directly modeling ˆ

sand

rusing neural networks.

SciPost Physics Submission

Table 1: The landscape of the simulation-based score and likelihood ratio estimators

described in this paper and the existing approaches in the literature. The ISN approach

described in Section 2can be applied to all these cases.

Only requires observable

data from the simulator

Requires additional latent

simulation information

Singly parameterized

likelihood estimator

• NDE [7]

• MEM [17]• MadMiner [8][SCANDAL]

Singly parameterized

likelihood ratio estimator (to

a reference distribution)

• MadMiner [6][CARL]

• DCTR [12]

• MadMiner [ROLR, ALICE,

CASCAL, RASCAL, ALICES]

Doubly parameterized

likelihood ratio estimation

• MadMiner [CARL]

• KLRE [Section 4]

• MadMiner [ROLR, ALICE,

CASCAL, RASCAL, ALICES]

• This work [Appendix A]

Score estimator • KSE [Section 3]• MadMiner [9]

[SALLY, SALLINO]

2. In Section 3, we introduce a technique, dubbed Kernel Score Estimation (KSE), to train a

network to learn the score function sfrom simulated data.

3. In Section 4, we introduce a technique, dubbed Kernel Likelihood Ratio Estimation (KLRE),

to learn the doubly parameterized likelihood ratio function rfrom simulated data. This

technique generalizes the previously known CARL technique for learning r[6].

4. In Appendix A, we provide some new loss functions for incorporating additional latent in-

formation from the simulation pipeline into the training of ˆ

In Section 5, we illustrate the new techniques with some toy examples and compare to the cor-

responding approaches already existing in the literature. Section 6 is reserved for our conclusions.

Several technical discussions and derivations are collected in the appendices.

2 Methodology: InferoStatic Networks (ISNs)

This work is focused on developing ML techniques to infer the score or likelihood ratio. The stan-

dard approach in the literature is to model ˆ

s(x;θ)and ˆ

r(x;θ0,θ1)directly as neural networks.

However, inspired by the deﬁnitions of sand rin (1), we propose to use a neural network to

model a scalar function ˆ

ϕ(x,θ), and deﬁne ˆ

sand ˆ

rvia ˆ

ϕas

s(x;θ)≡ ∇

θˆ

ϕ(x,θ), (7a)

r(x;θ0,θ1)≡exp ˆ

ϕ(x,θ0)−ˆ

ϕ(x,θ1)=exp ˆ

ϕ(x,θ0)

exp ˆ

ϕ(x,θ1). (7b)

Here ˆ

ϕplays the same role in the deﬁnitions of ˆ

sand ˆ

ras ln pdoes in the deﬁnitions of sand r

in (1). We dub ˆ

ϕas the “inferostatic potential”, in analogy with the electrostatic potential from

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FERMILAB-PUB-22-741-SCDSciPostPhysicsSubmissionNewMachineLearningTechniquesforSimulation-BasedInference:InferoStaticNets,KernelScoreEstimation,andKernelLikelihoodRatioEstimationKyoungchulKong1,KonstantinT.Matchev2,StephenMrenna3,andPrasanthShyamsundar4?,1DepartmentofPhysicsandAstronomy,UniversityofK...

展开>> 收起<<

FERMILAB-PUB-22-741-SCD SciPost Physics Submission New Machine Learning Techniques for Simulation-Based Inference.pdf

共36页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

FERMILAB-PUB-22-741-SCD SciPost Physics Submission New Machine Learning Techniques for Simulation-Based Inference

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: