Robust Neural Posterior Estimation and Statistical Model Criticism Daniel Ward1Patrick Cannon2Mark Beaumont3Matteo Fasiolo1

2025-04-24 0 0 4.74MB 22 页 10玖币

侵权投诉

Robust Neural Posterior Estimation and Statistical

Model Criticism

Daniel Ward1Patrick Cannon2Mark Beaumont3Matteo Fasiolo1

Sebastian M. Schmon2,4

1School of Mathematics, Bristol University, UK

2Improbable, UK

3School of Biological Sciences, Bristol University, UK

4Department of Mathematical Sciences, Durham University, UK

Abstract

Computer simulations have proven a valuable tool for understanding complex

phenomena across the sciences. However, the utility of simulators for modelling

and forecasting purposes is often restricted by low data quality, as well as practical

limits to model ﬁdelity. In order to circumvent these difﬁculties, we argue that

modellers must treat simulators as idealistic representations of the true data

generating process, and consequently should thoughtfully consider the risk of

model misspeciﬁcation. In this work we revisit neural posterior estimation (NPE),

a class of algorithms that enable black-box parameter inference in simulation

models, and consider the implication of a simulation-to-reality gap. While recent

works have demonstrated reliable performance of these methods, the analyses

have been performed using synthetic data generated by the simulator model itself,

and have therefore only addressed the well-speciﬁed case. In this paper, we ﬁnd

that the presence of misspeciﬁcation, in contrast, leads to unreliable inference

when NPE is used naïvely. As a remedy we argue that principled scientiﬁc inquiry

with simulators should incorporate a model criticism component, to facilitate

interpretable identiﬁcation of misspeciﬁcation and a robust inference component,

to ﬁt ‘wrong but useful’ models. We propose robust neural posterior estimation

(RNPE), an extension of NPE to simultaneously achieve both these aims, through

explicitly modelling the discrepancies between simulations and the observed

data. We assess the approach on a range of artiﬁcially misspeciﬁed examples, and

ﬁnd RNPE performs well across the tasks, whereas naïvely using NPE leads to

misleading and erratic posteriors.

1 Introduction

Stochastic simulators have become a ubiquitous modelling tool across the sciences and are regularly

applied to some of the most complex and challenging problems of scientiﬁc interest, including climate

change (see e.g. Randall et al., 2007), particle physics (e.g. Brehmer et al., 2020), and the Covid-19

pandemic (e.g. Ferguson et al., 2020). Simulators implicitly deﬁne a likelihood function

p(x|θ)

where

is the simulator output and

are the simulator parameters. Although running the simulator

to sample from the model is straightforward, the inherent complexity of simulators often makes

analytic calculation of the likelihood intractable. As a result, classical inference techniques to ﬁnd the

parameter posterior

p(θ|x)

such as Markov chain Monte Carlo (MCMC) (Metropolis et al., 1953) are

infeasible. To overcome this issue, a large family of simulation-based inference (SBI) methods have

been developed that allow parameter inference to be performed on arbitrary black-box simulators

(see Cranmer et al., 2020). Broadly, these approaches estimate a function that allows access to an

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.06564v1 [stat.ML] 12 Oct 2022

approximate posterior. This can be achieved by approximating the posterior directly (Papamakarios

and Murray, 2016; Greenberg et al., 2019; Lueckmann et al., 2017), or indirectly, via approximating

the likelihood (Papamakarios et al., 2019b) or likelihood-to-evidence ratio (Hermans et al., 2020;

Thomas et al., 2022), from which the posterior can be sampled using MCMC. For estimating the

posterior or likelihood, neural posterior estimation (NPE) and neural likelihood estimation (NLE) have

shown to be powerful approaches (Lueckmann et al., 2021), which rely on neural network-based

conditional density estimators, such as normalising ﬂows, to approximate the likelihood or posterior

(Papamakarios et al., 2019a).

Denoising

Posterior Inference

NPE

Model Criticism

Posterior misspecification probability

0.43

0.94

RNPE

NPE

True

Simulations

Denoised

Observed

Figure 1: Overview of the robust neural posterior estimation (RNPE) framework. Through ‘denoising’

the observed data, we can simultaneously perform model criticism to identify how the model is

misspeciﬁed, and perform inference robust to model misspeciﬁcation. See Gaussian example in

section 4.1 for details of the experiment.

To deﬁne misspeciﬁcation, it is necessary to make a clear distinction between the true data generating

process and the simulator: we will write

for observed data, following some unknown data generat-

ing process with distribution denoted

p∗

, and we will write

x∼p(x|θ)

to denote samples from

the simulator. In many applications of SBI, summary statistics are used to reduce the dimensionality

of the problem. If this is the case, we use

and

to denote the simulated and observed summary

statistics, rather than the raw data, and treat the raw data as an unobserved latent variable of the

model. The simulator is said to be misspeciﬁed if the true data generating process does not fall within

the family of distributions deﬁned by the simulator, i.e.

p∗/∈ {p(x|θ); θ∈Θ}

. Simulators gener-

ally form idealised and simplistic representations of a more complex data generating process, and

hence – like all statistical models – will be wrong to some extent. This misspeciﬁcation often results

in a discrepancy between simulated and observed data, sometimes termed a “simulation-to-reality

gap” (e.g. Miglino et al., 1995). Despite this, the general approach in SBI is to learn a posterior

approximation

q(θ|x)

(either directly, or indirectly), using simulated data,

(x,θ)∼p(x|θ)p(θ)

and to condition on the observed data,

q(θ|x=yo)

. This approach is only theoretically justiﬁed

under the assumption that the simulator is well-speciﬁed; relying on this approach for misspeciﬁed

simulators leads to two issues:

(i)

As the posterior approximation

q(θ|x)

is learned using samples from the simulator, we

can only expect it to form a reasonable approximation of

p(θ|x)

for regions well covered

by simulations. For misspeciﬁed simulators, the observed data

may be very unlikely in

p(x) = Rp(x|θ)p(θ)dθ

, in which case the approximation

q(θ|x=yo)

often becomes

poor (Cannon et al., 2022). In some cases,

may even fall outside the support of

p(x)

which results in NPE attempting to extrapolate to estimate the undeﬁned posterior.

(ii)

It has been demonstrated more generally that Bayesian inference is frequently problematic

under misspeciﬁcation. Even for simple models, misspeciﬁcation can lead to posterior con-

centration around “bad” models (Grünwald and Langford, 2007; Grünwald and Van Ommen,

2017) and poorly calibrated credible regions (Kleijn and van der Vaart, 2012; Syring and

Martin, 2019). These issues will persist even in the unlikely case that

q(θ|x)

perfectly

approximates p(θ|x).

One approach to improve robustness is to incorporate an error model to explicitly model potential

discrepancies between the observed data and simulations. A simple method would be to directly

incorporate the error model within the simulator itself, such as by applying additive noise to the

output to ensure that the observed data has reasonable marginal likelihood under the model (Cranmer

et al., 2020). However, this approach has some limitations: i) it is challenging to interpret how the

model is misspeciﬁed – an obvious choice is to use posterior predictive checks, but poor predictive

performance could result from failed inference, limitations of simulator, or both; ii) retraining of

the approximate posterior would be required to investigate different error models, which can be

computationally expensive; and iii) the error model itself is likely tractable, which can be utilised to

improve inference.

Our contribution.

We propose robust neural posterior estimation (RNPE), an SBI framework that

incorporates an error model,

p(y|x)

to account for discrepancies between simulations and the

observed data, whilst avoiding the limitations of naïvely incorporating it directly into the simulator.

We explicitly learn to invert the error model for the observed data prior to inference of the parameter

posterior, a process we call ‘denoising’. This approach yields the following advantages:

Efﬁcient Model Criticism.

Understanding a model’s limitations is crucial for principled model

development (Box, 1980; Gelman and Shalizi, 2013). By probabilistically inverting the error

process for the observed data, and inferring any associated error model parameters, we can detect

and interpret discrepancies between the simulated and observed data, allowing criticism of the

simulator. This process can be achieved in a manner decoupled from inference, which provides

efﬁciency beneﬁts, whilst avoiding confounding inference failures with problems arising due to

limitations of the simulator.

Robust Inference.

Even after several rounds of model development and model criticism, the best

available model will still frequently be misspeciﬁed to an appreciable degree. However, it is

often still desirable to use ‘wrong but useful’ models for parameter inference and predictions, and

hence the inference process should be robust to misspeciﬁcation. By exploiting the amortisation

property of NPE, we can ensemble posterior estimates over a set of ‘denoised’ observations,

leading to posterior estimates robust to misspeciﬁcation.

An overview of the RNPE framework can be seen in Fig. 1. We examine the practical value of this

approach for model criticism and robust inference on examples in which we artiﬁcially introduce

realistic levels of misspeciﬁcation.

2 Method

2.1 Error Model

As outlined above, we can explicitly model discrepancies between the observed data and simulations

by introducing an error model p(y|x), such that the assumed generative model is

p(θ,x,y) = p(y|x)p(x|θ)p(θ),(1)

where we assume that the error is independent of the simulator parameters

, and we treat

as an

unobserved latent variable of the model. Using the equivalent factorisation

p(θ,x,y) = p(θ|x)p(x|y)p(y),

we can marginalise over the latent xand ﬁnd an expression for the parameter posterior

p(θ,y) = p(y)Zp(x|y)p(θ|x) dx,(2)

p(θ|y) = Zp(x|y)p(θ|x) dx=Ex∼p(x|y)[p(θ|x)].(3)

Equation

(3)

, implies that if we had access to

p(x|y)

and

p(θ|x)

, we could sample from the

parameter posterior by ﬁrst sampling from the posterior over the latent variables

x∼p(x|y)

, and

Amortisation refers to the ability of a conditional distribution approximation to condition on arbitrary

instances of the conditioning variable, rather than being specialised to a particular instance.

then sampling from

p(θ|x)

. This approach reverses the data generating process from Equation

(1)

, whilst propagating uncertainty throughout, ﬁrst removing the error from the observed data by

denoising, and then ﬁnding the associated simulator parameters given the denoised data. A Monte

Carlo approximation of the expectation in Equation

(3)

can also be used to estimate the posterior

density p(θ|y), i.e.

p(θ|y)≈1

m=1

p(θ|˜

xm),˜

x1,...,˜

xM,iid

∼p(x|y),(4)

however, we note that in some scenarios this approximation could have high variance. In practice,

p(x|y)

and

p(θ|x)

are unknown. We can approximate

p(θ|x)

by training a normalising ﬂow

q(θ|x)

on simulated data

(x,θ)∼p(x|θ)p(θ)

, as is commonly done for NPE. To approximately

sample from

p(x|y)

, we i) specify an error model

p(y|x)

,ii) train a normalising ﬂow,

q(x)

ﬁtted to samples from the prior predictive distribution of the simulator

x∼p(x)

, and iii) sample

x∼p(x|y)

, along with any error model parameters, using MCMC, as

ˆp(x|y)∝p(y|x)q(x)

We denote samples from

ˆp(x|y)

and

q(θ|x)

and

to avoid confusion with prior samples and

simulations. Pseudo-code for the overall approach is given in Algorithm 1.

Algorithm 1: Robust neural posterior estimation (RNPE)

for iin 1 : Ndo

1Sample θi∼p(θ)

2Simulate xi∼p(x|θ)

end

3Train NPE q(θ|x)on {(θi,xi)}N

i=1

4Train q(x)on {xi}N

i=1

5Sample ˜

xm∼ˆp(x|yo)∝p(yo|x)q(x), m = 1, . . . , M using MCMC

6Sample ˜

θm∼q(θ|˜

xm), m = 1, . . . , M

return {(˜

θm,˜

xm)}M

m=1, samples drawn approximately from p(θ,x|yo)

Under this framework, the standard SBI approach can be retrieved as the special case in which

p(y|x) = p(x|y) = δ(x−y)

, where

is the Dirac delta distribution, meaning that we revert to

assuming no discrepancy between the simulator and data generating process.

The idealised nature of simulation models implies that frequently there will be some characteristics

of the observed data that are well captured by simulations, and some aspects in which there exists

a discrepancy. Using a set of summary statistics for inference, rather than the raw data naturally

captures an interpretable, low dimensional and diverse set of characteristics of the simulated and

observed data. Because of this, we ﬁnd that in general, reducing the raw data to summary statistics is

advantageous for model criticism, as the practitioner can choose summary statistics which align with

their belief about important model characteristics, and easily interpret any discrepancies (see Section

5). The requirement for domain speciﬁc knowledge to develop summary statistics has been alleviated

by recent work (e.g. Fearnhead and Prangle, 2012; Chan et al., 2018; Chen et al., 2020), which

seek to automatically embed the raw data to summary statistics; however, these embeddings often

lack a tangible interpretation. Additionally, the embedding methods themselves may not be robust

to misspeciﬁcation. One approach is to use both hand-crafted and automatic summary statistics in

tandem. Hereafter, however, we generally perform model criticism and inference using hand-crafted

summary statistics, such that

and

refer to the simulated and observed summary statistics, not

the raw data.

2.2 Spike and Slab

In practice, the true error model is unknown, and hence the error model should describe our prior

belief over the discrepancy between the data generating process and simulations. Additionally, to

facilitate model criticism, it should ideally allow easy assessment of which summary statistics are

approximately well-speciﬁed and which are misspeciﬁed. Inspired by sparsity-inducing priors used

in Bayesian regression literature (George and McCulloch, 1993), we use a “spike and slab” error

model on each summary statistic

x∼q(x),(5)

zj∼Bernoulli(ρ), j = 1, . . . , D (6)

yj|xj, zj∼N(xj, σ2),if zj= 0

Cauchy(xj, τ),if zj= 1 j= 1, . . . , D (7)

where

x∈RD

. As the summary statistics have varying scales, we standardise them to have

mean zero and variance one. The Bernoulli variables indicate whether the

-th summary statistic

is approximately well-speciﬁed; we use the probability of

ρ= 0.5

to express the belief that it is

equally likely a summary statistic will be misspeciﬁed or well-speciﬁed a priori. When a summary

statistic is well-speciﬁed (i.e.,

zj= 0

), we take the error distribution to be a tight Gaussian (a spike)

centred on

, choosing

σ= 0.01

to enforce consistency between the simulator and the observed

data for the

-th statistic. In contrast, in the misspeciﬁed case (i.e.,

zj= 1

), we take the error model

to be the much wider and heavy tailed Cauchy distribution (a slab). Generally, we might expect

misspeciﬁcation to be relatively subtle, but some model inadequacies can lead to catastrophically

misspeciﬁed summary statistics. We chose the Cauchy scale

τ= 0.25

, to reﬂect this, as it places

half the mass within

±0.25

standard deviations of

, but the long tails accommodate summary

statistics that are highly misspeciﬁed. The sparsity-inducing effect of this error model matches what

we consider to be a common scenario, namely that a proportion of the summary statistics are jointly

consistent with the simulator, whereas others may be incompatible. If we expect a priori more or

fewer of the summary statistics to be misspeciﬁed, the prior misspeciﬁcation probability can be

adjusted accordingly. By marginalising over

, the spike and slab error model can also be written as

p(y|x) =

j=1 (1 −ρ)·p(yj|xj, zj= 0) + ρ·p(yj|xj, zj= 1),(8)

and as such can be viewed as an equally weighted mixture of the Gaussian spike,

p(yj|xj, zj= 0)

and the Cauchy slab,

p(yj|xj, zj= 1)

, for each summary statistic. We note that error model

hyperparameter tuning approaches could be considered, e.g. a reasonable heuristic would be to

choose an error model that is broad enough that the denoised data

tend not to be outliers in

q(x)

compared to simulations

. However, we chose to keep the hyperparameters consistent across tasks,

to avoid the risk of overﬁtting to the tasks and to demonstrate that neither strong prior knowledge on

the error model, nor careful hyperparameter tuning is necessary to yield substantial improvements in

performance.

A key advantage of the spike and slab error model is given by the latent variable

. Similar to

posterior inclusion probabilities in Bayesian regression, the posterior frequency of being in the slab,

Pr(zj= 1 |y)

, can be used as an indicator of the posterior misspeciﬁcation probability for the

-th summary statistic. By comparing to the prior misspeciﬁcation probability

, we can say that if

Pr(zj= 1 |y)> ρ

, the model provides evidence of misspeciﬁcation for the

-th summary statistic,

and if

Pr(zj= 1 |y)< ρ

, it provides evidence it is well-speciﬁed (Talbott, 2016). For the purpose

of generality, the latent variable

was not included when describing RNPE thus far. However, we

can jointly infer the posterior

ˆp(x,z|y)

in step 5 of algorithm 1, by using an MCMC algorithm

that supports sampling both continuous and discrete variables. We used mixed Hamiltonian Monte

Carlo (HMC) (Zhou, 2020, 2022) from the NumPyro python package (Phan et al., 2019), which is an

adaptation of HMC (Neal et al., 2011; Duane et al., 1987), for this purpose.

3 Related Work

3.1 Model Criticism in SBI

Posterior predictive checks have been widely used in SBI, to compare the predictive samples to the

observed data (e.g. Durkan et al., 2020; Greenberg et al., 2019; Papamakarios et al., 2019b). Although

this is a form of model criticism, we note a key limitation of this approach is that if a discrepancy is

discovered, it may not be clear whether this is due to the failure of the inference procedure, or due to

simulator misspeciﬁcation. In general, it may be possible to identify the presence of misspeciﬁcation

using anomaly/novelty detection. One such approach was suggested by Schmitt et al. (2021) in the

This could for example be assessed by estimating highest density regions of

q(x)

, using the density quantile

approach from Hyndman (1996).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RobustNeuralPosteriorEstimationandStatisticalModelCriticismDanielWard1PatrickCannon2MarkBeaumont3MatteoFasiolo1SebastianM.Schmon2,41SchoolofMathematics,BristolUniversity,UK2Improbable,UK3SchoolofBiologicalSciences,BristolUniversity,UK4DepartmentofMathematicalSciences,DurhamUniversity,UKAbstractCompu...

收起<<

Robust Neural Posterior Estimation and Statistical Model Criticism Daniel Ward1Patrick Cannon2Mark Beaumont3Matteo Fasiolo1.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Robust Neural Posterior Estimation and Statistical Model Criticism Daniel Ward1Patrick Cannon2Mark Beaumont3Matteo Fasiolo1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: