Truncated proposals for scalable and hassle-free simulation-based inference Michael Deistler

2025-05-06 0 0 6.28MB 35 页 10玖币

侵权投诉

Truncated proposals for scalable and hassle-free

simulation-based inference

Michael Deistler

University of Tübingen

michael.deistler@uni-tuebingen.de

Pedro J Gonçalves∗

University of Tübingen

pedro.goncalves@uni-tuebingen.de

Jakob H Macke∗

University of Tübingen

Max Planck Institute for Intelligent Systems

jakob.macke@uni-tuebingen.de

Abstract

Simulation-based inference (SBI) solves statistical inverse problems by repeatedly

running a stochastic simulator and inferring posterior distributions from model-

simulations. To improve simulation efﬁciency, several inference methods take a

sequential approach and iteratively adapt the proposal distributions from which

model simulations are generated. However, many of these sequential methods are

difﬁcult to use in practice, both because the resulting optimisation problems can be

challenging and efﬁcient diagnostic tools are lacking. To overcome these issues,

we present Truncated Sequential Neural Posterior Estimation (TSNPE). TSNPE

performs sequential inference with truncated proposals, sidestepping the optimi-

sation issues of alternative approaches. In addition, TSNPE allows to efﬁciently

perform coverage tests that can scale to complex models with many parameters.

We demonstrate that TSNPE performs on par with previous methods on established

benchmark tasks. We then apply TSNPE to two challenging problems from neuro-

science and show that TSNPE can successfully obtain the posterior distributions,

whereas previous methods fail. Overall, our results demonstrate that TSNPE is

an efﬁcient, accurate, and robust inference method that can scale to challenging

scientiﬁc models.

1 Introduction

Computational models are an important tool to understand physical processes underlying empirically

observed phenomena. These models, often implemented as numerical simulators, incorporate

mechanistic knowledge about the physical process underlying data generation, and thereby provide an

interpretable model of empirical observations. In many cases, several parameters of the simulator have

to be inferred from data, e.g., with Bayesian inference. However, performing Bayesian inference in

these models can be difﬁcult: Running the simulator may be computationally expensive, evaluating the

likelihood-function might be computationally infeasible, and the model might not be differentiable. In

order to overcome these limitations, Approximate Bayesian Computation (ABC) methods [Beaumont

et al., 2002, 2009], synthetic likelihood approaches [Wood, 2010], and neural network-based methods

[e.g., Papamakarios and Murray, 2016, Hermans et al., 2020, Thomas et al., 2022] have been

developed.

∗Equal contribution

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.04815v2 [stat.ML] 10 Nov 2022

Figure 1:

APT vs

TSNPE.

Top: Prior

(gray) and true pos-

terior (black). APT

matches true posterior

within the prior bounds

but ‘leaks’ into region

without prior support.

TSNPE (ours) matches

true posterior.

A subset of neural network-based methods, known as neural posterior

estimation (NPE) [Papamakarios and Murray, 2016, Lueckmann et al.,

2017, Greenberg et al., 2019], train a neural density estimator on simulated

data such that the density estimator directly approximates the posterior.

Unlike other methods, NPE does not require any further Markov-chain

Monte-Carlo (MCMC) or variational inference (VI) steps. As it provides

an amortized approximation of the posterior, which can be used to quickly

evaluate and sample the approximate posterior for any observation, NPE

allows the application in time-critical and high-throughput inference scenar-

ios [Gonçalves et al., 2020, Radev et al., 2020, Dax et al., 2021], and fast

application of diagnostic methods which require posterior samples for many

different observations [Cook et al., 2006, Talts et al., 2018]. In addition,

unlike methods targeting the likelihood (e.g., neural likelihood estimation,

NLE [Papamakarios et al., 2019, Lueckmann et al., 2019]), NPE can learn

summary statistics from data and it can use equivariances in the simulations

to improve the quality of inference [Dax et al., 2021, 2022].

If inference is performed for a particular observation

, sampling efﬁ-

ciency of NPE can be improved with sequential training schemes: Instead

of drawing parameters from the prior distribution, they are drawn adaptively

from a proposal (e.g., a posterior estimate obtained with NPE) in order to

optimize the posterior accuracy for a particular

. These procedures are

called Sequential Neural Posterior Estimation (SNPE) [Papamakarios and

Murray, 2016, Lueckmann et al., 2017, Greenberg et al., 2019] and have

been reported to be more simulation-efﬁcient than training the neural net-

work only on parameters sampled from the prior, across a set of benchmark

tasks [Lueckmann et al., 2021].

Despite the potential to improve simulation-efﬁciency, two limitations have impeded a more

widespread adoption of SNPE by practitioners: First, the sequential scheme of SNPE can be unstable.

SNPE requires a modiﬁcation of the loss function compared to NPE, which suffers from issues that

can limit its effectiveness on (or even prevent their application to) complex problems (see Sec. 2).

Second, several commonly used diagnostic tools for SBI [Talts et al., 2018, Miller et al., 2021,

Hermans et al., 2021] rely on performing inference across multiple observations. In SNPE (in contrast

to NPE), this requires generating new simulations and network retraining for each observation, which

often prohibits the use of such diagnostic tools [Lueckmann et al., 2021, Hermans et al., 2021].

Here, we introduce Truncated Sequential Neural Posterior Estimation (TSNPE) to overcome these

limitations. TSNPE follows the SNPE formalism, but uses a proposal which is a truncated version

of the prior: TSNPE draws simulations from the prior, but rejects them before simulation if they

lie outside of the support of the approximate posterior. Thus, the proposal is (within its support)

proportional to the prior, which allows us to train the neural network with maximum-likelihood in

every round and, therefore, sidesteps the instabilities (and hence ‘hassle’) of previous SNPE methods.

Our use of truncated proposals is strongly inspired by Blum and François [2010] and Miller et al.

[2020, 2021], who proposed truncated proposals respectively for regression-adjustment approaches

in ABC and for neural ratio estimation (see Discussion). Unlike methods based on likelihood(-ratio)-

estimation [Miller et al., 2021, Hermans et al., 2021], TSNPE allows direct sampling and density

evaluation of the approximate posterior, and thus permits computing expected coverage of the full

posterior quickly (without MCMC) and at every iteration of the algorithm, thus allowing to diagnose

failures of the method even for high-dimensional parameter spaces (we term this ‘simulation-based

coverage calibration’ (SBCC), given its close connection with simulation-based calibration, SBC,

Cook et al. [2006], Talts et al. [2018]).

We show that TSNPE is as efﬁcient as the SNPE method ‘Automatic Posterior Transformation’ (APT,

Greenberg et al. [2019]) on several established benchmark problems (Sec. 4.1). We then demonstrate

that for two challenging neuroscience problems, TSNPE—but not APT—can robustly identify the

posterior distributions (Sec. 4.2).

Figure 2:

Truncated Sequential Neural Posterior Estimation (TSNPE).

The method starts by

sampling from the prior, running the simulator, and training a neural density estimator with maximum-

likelihood to approximate the posterior. In subsequent rounds, parameters are sampled from the prior,

but rejected if they lie outside of the support of the approximate posterior. With these proposals, the

neural density estimator can be trained with maximum-likelihood in all rounds.

2 Background

In Neural Posterior Estimation (NPE), parameters are sampled from the prior

p(θ)

and simulated

(i.e.,

is sampled from

p(x|θ)

). Then, a neural density estimator

qφ(θ|x)

(in our case a normalizing

ﬂow), with learnable parameters φ, is trained to minimize the loss:

min

φL= min

Eθ∼p(θ)Ex∼p(x|θ)[−log qφ(θ|x)],

which is minimized if and only if, for a sufﬁciently expressive density estimator,

qφ(θ|x) = p(θ|x)

for all

x∈supp(p(x))

[Paige and Wood, 2016, Papamakarios and Murray, 2016]. Throughout this

study, we refer to training with this loss function as maximum-likelihood training, although the neural

density estimator targets the posterior directly.

Sequential Neural Posterior Estimation (SNPE) aims to infer the posterior distribution

p(θ|xo)

for a

particular observation

. SNPE initially performs NPE and, thereby, obtains an initial estimate of

the posterior distribution. It then samples parameters from a proposal

˜p(θ)

, which is often chosen to

be the previously obtained estimate of the posterior

˜p(θ) = qφ(θ|xo)

, and retrains the neural density

estimator [Papamakarios and Murray, 2016]. This procedure can be repeated for several rounds.

Importantly, if parameters

are sampled from the proposal

˜p(θ)

rather than from the prior

p(θ)

the estimator

qφ(θ|x)

that minimizes the maximum-likelihood loss function no longer converges

to the true posterior. If one used the maximum-likelihood loss on data sampled from

˜p(θ)

, i.e.,

L=Eθ∼˜p(θ)Ex∼p(x|θ)[−log qφ(θ|x)]

, then

would be minimized by

qφ(θ|x)∝p(θ|x)˜p(θ)

p(θ)

which is not the true posterior. Multiple schemes have been developed to overcome this [Papamakarios

and Murray, 2016, Lueckmann et al., 2017]. The most recent of these methods, Automatic Posterior

Transformation (APT, or SNPE-C, in its atomic version) [Greenberg et al., 2019, Durkan et al., 2020]

employs a loss that aims to classify the parameter set that generated a particular data point among

other parameter sets (details in Appendix Sec. 6.5).

While APT has been reported to signiﬁcantly outperform previous methods, several studies have also

described cases in which the approach exhibits performance issues: Both the original APT paper

[Greenberg et al., 2019] and Durkan et al. [2020] reported that APT can show ‘leakage’ of posterior

mass outside of bounded priors. We demonstrate this issue on a simple 1-dimensional simulator with

bounded prior (Fig. 1, Appendix Fig. 7). The posterior estimated by APT is only required to match

the true posterior density within the support of the prior (details in Appendix Sec. 6.5). Thus, after

ﬁve rounds of APT, while the approximate posterior matches the true posterior within the bounds of

the prior, a substantial fraction of posterior mass lies in regions with zero prior probability. In simple

models, approximate posterior samples that lie outside of the prior bounds can be efﬁciently rejected.

However, in models with high numbers of parameters, the rejection rate can become so large that

drawing posterior samples which lie inside of the prior bounds is prohibitive. For example, Glöckler

et al. [2022] reported a rejection rate of more than 99.9999% in a model with 31 parameters, thus

requiring approximately one minute to draw a single posterior sample from within the prior bounds.

Algorithm 1: TSNPE

Inputs:

prior

p(θ)

, observation

, simulations per round

, number of rounds



that deﬁnes

the highest-probability region (HPR)

Outputs: Approximate posterior qφ.

Initialize: Proposal ˜p(θ) = p(θ), dataset X={}

for r∈[1, ..., R]do

for i∈[1, ..., N]do

θi∼˜p(θ)

simulate xi∼p(x|θi)

add (θi,xi)to X

φ∗= arg minφ−1

NP(θi,xi)∈X log qφ(θi|xi)

Compute expected coverage(˜p(θ), qφ) ; // see Alg. 2

˜p(θ)∝p(θ)·1θ∈HPR;// see Alg. 3

We overcome these limitations by using ‘truncated’ proposal distributions. This allows us to train

with maximum-likelihood at every round, thereby sidestepping issues of previous SNPE methods.

3 Methodology

3.1 Truncated proposals for SNPE

Given a particular observation

, we suggest to restrict the proposals

˜p(θ)

to be proportional to the

prior

p(θ)

at least in the

1−

highest-probability-region (

HPR

, the smallest region that contains

1−of the mass) of p(θ|xo), i.e.

˜p(θ)∝p(θ)·1θ∈M

with

HPR(p(θ|xo)) ⊆ M

. Thus,

˜p(θ)

is a ‘truncated’ proposal. The key insight is that, when using

such a proposal and = 0, one can train qφ(θ|x)with maximum likelihood:

min

φL= min

Eθ∼˜p(θ)Ex∼p(x|θ)[−log qφ(θ|x)],

and qφ(θ|xo)will still converge to p(θ|xo)(Proof in Appendix Sec. 6.2).

We estimate

as the

HPR

of the approximate posterior

M=HPR(qφ(θ|xo))

. Since the

maximum-likelihood loss employed to train

qφ(θ|x)

is support-covering, the

HPR

qφ(θ|xo)

tends to cover the HPRof p(θ|xo)[Bishop and Nasrabadi, 2006].

In order to obtain the

HPR

qφ(θ|xo)

, we deﬁne a threshold

on the approximate posterior density

qφ(θ|xo)

. To do so, we use a normalizing ﬂow as

qφ(θ|x)

, which allows for closed-form density

evaluation and fast sampling. We then approximate the HPRof qφ(θ|xo)as

HPR(qφ(θ|xo)) ≈1qφ(θ|xo)>τ.

We chose

as the



-quantile of approximate posterior densities of samples from

qφ(θ|xo)

, and

evaluated TSNPE for

= 10−3

10−4

, and

10−5

. Values of

 > 0

yield a proposal prior which has

smaller support than the current estimate of the posterior, e.g., using

= 10−3

neglects 0.1% of mass

from the approximate-posterior support. Thus, this approach leads to errors in posterior estimation,

e.g., to ‘under-covered’ posteriors (Appendix Sec. 6.10). However, empirically, the error induced

by this truncation is negligible, as we will demonstrate on several benchmark tasks. We note that

TSNPE can be trained on data pooled from all rounds (Appendix Sec. 6.2). TSNPE is summarized in

Alg. 1 (Fig. 2).

3.2 Sampling from the truncated proposal

To generate training data for subsequent rounds, we have to draw samples from the truncated proposal

˜p(θ)

, and here we explored rejection sampling and sampling importance resampling (SIR) [Rubin,

1988]. For rejection sampling, we sample the prior

θ∼p(θ)

and accept samples only if their

probability under the approximate posterior qφ(θ|x)is above threshold τ.

Figure 3

Diagnostic tool.

(a)

Parameter

θ∗

(green) lies within

the 1-

conﬁdence region (gray)

of the estimated posterior. (b)

log(p(θ∗|x))

is above the 1-

quantile of posterior samples. (c)

versus empirical coverage,

averaged over θ∗.

This strategy samples from the truncated proposal exactly, but can fail if the rejection rate be-

comes too high. To deal with these situations, we used SIR. For each sample from the truncated

proposal, SIR draws

samples from the approximate posterior, computes weights

wi=1...K =

p(θi)1θi∈M/qφ(θi|x)

, normalizes

such that they sum to one, draws from a categorical dis-

tribution with weights

s∼Categorical(wi)

, and selects the posterior sample with index

. SIR

requires a ﬁxed sampling budget of

posterior samples per sample from the truncated proposal and

returns exact samples from the truncated proposal for

K→ ∞

. Too low values of

lead to too

narrow proposals and posterior approximations. When run for a number of rounds, this behaviour

reinforces itself and can lead to divergence of TSNPE (Appendix Fig. 13). We, thus, chose a high

value

K= 1024

. In our experiments, we did not observe poor SIR performance, but we emphasise

the importance of using tools to diagnose potential failures of TSNPE (see below) or SIR (e.g. by

inspecting the effective sample size, Appendix Sec. 6.12). When SIR fails, methods such as nested

sampling, adaptive multi-level splitting, or sequential Monte-Carlo sampling could be viable alterna-

tives [Skilling, 2004, Cérou and Guyader, 2007, Doucet et al., 2001]. We discuss computational costs

of rejection sampling and SIR in Appendix Sec. 6.11.

3.3 Coverage diagnostic

In order for the estimated posterior

qφ(θ|xo)

to converge to

p(θ|xo)

, TSNPE requires

supp(p(θ|xo)) ⊆HPR(qφ(θ|xo))

, i.e., the estimated posterior must be broader than the true

posterior (proof in Appendix Sec. 6.2). In order to diagnose whether the posterior is, on average,

sufﬁciently broad, we perform expected coverage tests as proposed in Dalmasso et al. [2020], Miller

et al. [2021], Hermans et al. [2021].

As described in Dalmasso et al. [2020], Rozet et al. [2021] and illustrated in Fig. 3, the coverage of

the approximate posterior can be computed as

1−α=Zqφ(θ|x∗)1(qφ(θ∗|x∗)≥qφ(θ|x∗))dθ

where

θ∗

is sampled from the truncated proposal and

x∗

is the corresponding simulator output.

In order to approximate this integral, one has to either evaluate the approximate posterior on a

grid [Dalmasso et al., 2020, Hermans et al., 2021] or apply a Monte-Carlo average which includes

repeatedly sampling (and evaluating) the (unnormalized) approximate posterior [Miller et al., 2021,

Rozet et al., 2021]. The ﬁrst option does not scale to high-dimensional spaces whereas the second

is computationally expensive for methods estimating likelihood(-ratios) and, thus, require MCMC.

In contrast, the TSNPE-posterior can be sampled from and evaluated in closed-form, leading to a

computationally efﬁcient and scalable diagnostic which can be run after every training round.

Expected coverage can be computed as an average of the coverage across multiple pairs

(θ∗,x∗)

[Miller et al., 2021, Hermans et al., 2021] and should match the conﬁdence level for all conﬁdence

levels

(1 −α)∈[0,1]

(Fig. 3c). We term this procedure of computing the empirical coverage

‘simulation-based coverage calibration‘ (SBCC), due to its close connection with SBC [Cook et al.,

2006, Talts et al., 2018] (identical under certain conditions, Appendix Sec. 6.6). For TSNPE, it is

important that the empirical expected coverage matches the conﬁdence level for high conﬁdence

levels (i.e., for small

), since overconﬁdence in these regions would indicate that ground-truth

parameters θ∗are falsely excluded from the HPR=α. SBCC is summarized in Appendix Alg. 2.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Truncatedproposalsforscalableandhassle-freesimulation-basedinferenceMichaelDeistlerUniversityofTübingenmichael.deistler@uni-tuebingen.dePedroJGonçalvesUniversityofTübingenpedro.goncalves@uni-tuebingen.deJakobHMackeUniversityofTübingenMaxPlanckInstituteforIntelligentSystemsjakob.macke@uni-tuebingen...

展开>> 收起<<

Truncated proposals for scalable and hassle-free simulation-based inference Michael Deistler.pdf

共35页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Truncated proposals for scalable and hassle-free simulation-based inference Michael Deistler

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: