Robust Neural Posterior Estimation and Statistical Model Criticism Daniel Ward1Patrick Cannon2Mark Beaumont3Matteo Fasiolo1

2025-04-24 0 0 4.74MB 22 页 10玖币
侵权投诉
Robust Neural Posterior Estimation and Statistical
Model Criticism
Daniel Ward1Patrick Cannon2Mark Beaumont3Matteo Fasiolo1
Sebastian M. Schmon2,4
1School of Mathematics, Bristol University, UK
2Improbable, UK
3School of Biological Sciences, Bristol University, UK
4Department of Mathematical Sciences, Durham University, UK
Abstract
Computer simulations have proven a valuable tool for understanding complex
phenomena across the sciences. However, the utility of simulators for modelling
and forecasting purposes is often restricted by low data quality, as well as practical
limits to model fidelity. In order to circumvent these difficulties, we argue that
modellers must treat simulators as idealistic representations of the true data
generating process, and consequently should thoughtfully consider the risk of
model misspecification. In this work we revisit neural posterior estimation (NPE),
a class of algorithms that enable black-box parameter inference in simulation
models, and consider the implication of a simulation-to-reality gap. While recent
works have demonstrated reliable performance of these methods, the analyses
have been performed using synthetic data generated by the simulator model itself,
and have therefore only addressed the well-specified case. In this paper, we find
that the presence of misspecification, in contrast, leads to unreliable inference
when NPE is used naïvely. As a remedy we argue that principled scientific inquiry
with simulators should incorporate a model criticism component, to facilitate
interpretable identification of misspecification and a robust inference component,
to fit ‘wrong but useful’ models. We propose robust neural posterior estimation
(RNPE), an extension of NPE to simultaneously achieve both these aims, through
explicitly modelling the discrepancies between simulations and the observed
data. We assess the approach on a range of artificially misspecified examples, and
find RNPE performs well across the tasks, whereas naïvely using NPE leads to
misleading and erratic posteriors.
1 Introduction
Stochastic simulators have become a ubiquitous modelling tool across the sciences and are regularly
applied to some of the most complex and challenging problems of scientific interest, including climate
change (see e.g. Randall et al., 2007), particle physics (e.g. Brehmer et al., 2020), and the Covid-19
pandemic (e.g. Ferguson et al., 2020). Simulators implicitly define a likelihood function
p(x|θ)
,
where
x
is the simulator output and
θ
are the simulator parameters. Although running the simulator
to sample from the model is straightforward, the inherent complexity of simulators often makes
analytic calculation of the likelihood intractable. As a result, classical inference techniques to find the
parameter posterior
p(θ|x)
such as Markov chain Monte Carlo (MCMC) (Metropolis et al., 1953) are
infeasible. To overcome this issue, a large family of simulation-based inference (SBI) methods have
been developed that allow parameter inference to be performed on arbitrary black-box simulators
(see Cranmer et al., 2020). Broadly, these approaches estimate a function that allows access to an
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.06564v1 [stat.ML] 12 Oct 2022
approximate posterior. This can be achieved by approximating the posterior directly (Papamakarios
and Murray, 2016; Greenberg et al., 2019; Lueckmann et al., 2017), or indirectly, via approximating
the likelihood (Papamakarios et al., 2019b) or likelihood-to-evidence ratio (Hermans et al., 2020;
Thomas et al., 2022), from which the posterior can be sampled using MCMC. For estimating the
posterior or likelihood, neural posterior estimation (NPE) and neural likelihood estimation (NLE) have
shown to be powerful approaches (Lueckmann et al., 2021), which rely on neural network-based
conditional density estimators, such as normalising flows, to approximate the likelihood or posterior
(Papamakarios et al., 2019a).
Denoising
Posterior Inference
NPE
Model Criticism
Posterior misspecification probability
0.43
10
0.94
RNPE
NPE
True
Simulations
Denoised
Observed
Figure 1: Overview of the robust neural posterior estimation (RNPE) framework. Through ‘denoising’
the observed data, we can simultaneously perform model criticism to identify how the model is
misspecified, and perform inference robust to model misspecification. See Gaussian example in
section 4.1 for details of the experiment.
To define misspecification, it is necessary to make a clear distinction between the true data generating
process and the simulator: we will write
yo
for observed data, following some unknown data generat-
ing process with distribution denoted
p
, and we will write
xp(x|θ)
to denote samples from
the simulator. In many applications of SBI, summary statistics are used to reduce the dimensionality
of the problem. If this is the case, we use
x
and
yo
to denote the simulated and observed summary
statistics, rather than the raw data, and treat the raw data as an unobserved latent variable of the
model. The simulator is said to be misspecified if the true data generating process does not fall within
the family of distributions defined by the simulator, i.e.
p/∈ {p(x|θ); θΘ}
. Simulators gener-
ally form idealised and simplistic representations of a more complex data generating process, and
hence – like all statistical models – will be wrong to some extent. This misspecification often results
in a discrepancy between simulated and observed data, sometimes termed a “simulation-to-reality
gap” (e.g. Miglino et al., 1995). Despite this, the general approach in SBI is to learn a posterior
approximation
q(θ|x)
(either directly, or indirectly), using simulated data,
(x,θ)p(x|θ)p(θ)
,
and to condition on the observed data,
q(θ|x=yo)
. This approach is only theoretically justified
under the assumption that the simulator is well-specified; relying on this approach for misspecified
simulators leads to two issues:
(i)
As the posterior approximation
q(θ|x)
is learned using samples from the simulator, we
can only expect it to form a reasonable approximation of
p(θ|x)
for regions well covered
by simulations. For misspecified simulators, the observed data
yo
may be very unlikely in
p(x) = Rp(x|θ)p(θ)dθ
, in which case the approximation
q(θ|x=yo)
often becomes
poor (Cannon et al., 2022). In some cases,
yo
may even fall outside the support of
p(x)
,
which results in NPE attempting to extrapolate to estimate the undefined posterior.
(ii)
It has been demonstrated more generally that Bayesian inference is frequently problematic
under misspecification. Even for simple models, misspecification can lead to posterior con-
centration around “bad” models (Grünwald and Langford, 2007; Grünwald and Van Ommen,
2017) and poorly calibrated credible regions (Kleijn and van der Vaart, 2012; Syring and
2
Martin, 2019). These issues will persist even in the unlikely case that
q(θ|x)
perfectly
approximates p(θ|x).
One approach to improve robustness is to incorporate an error model to explicitly model potential
discrepancies between the observed data and simulations. A simple method would be to directly
incorporate the error model within the simulator itself, such as by applying additive noise to the
output to ensure that the observed data has reasonable marginal likelihood under the model (Cranmer
et al., 2020). However, this approach has some limitations: i) it is challenging to interpret how the
model is misspecified – an obvious choice is to use posterior predictive checks, but poor predictive
performance could result from failed inference, limitations of simulator, or both; ii) retraining of
the approximate posterior would be required to investigate different error models, which can be
computationally expensive; and iii) the error model itself is likely tractable, which can be utilised to
improve inference.
Our contribution.
We propose robust neural posterior estimation (RNPE), an SBI framework that
incorporates an error model,
p(y|x)
to account for discrepancies between simulations and the
observed data, whilst avoiding the limitations of naïvely incorporating it directly into the simulator.
We explicitly learn to invert the error model for the observed data prior to inference of the parameter
posterior, a process we call ‘denoising’. This approach yields the following advantages:
Efficient Model Criticism.
Understanding a model’s limitations is crucial for principled model
development (Box, 1980; Gelman and Shalizi, 2013). By probabilistically inverting the error
process for the observed data, and inferring any associated error model parameters, we can detect
and interpret discrepancies between the simulated and observed data, allowing criticism of the
simulator. This process can be achieved in a manner decoupled from inference, which provides
efficiency benefits, whilst avoiding confounding inference failures with problems arising due to
limitations of the simulator.
Robust Inference.
Even after several rounds of model development and model criticism, the best
available model will still frequently be misspecified to an appreciable degree. However, it is
often still desirable to use ‘wrong but useful’ models for parameter inference and predictions, and
hence the inference process should be robust to misspecification. By exploiting the amortisation
1
property of NPE, we can ensemble posterior estimates over a set of ‘denoised’ observations,
leading to posterior estimates robust to misspecification.
An overview of the RNPE framework can be seen in Fig. 1. We examine the practical value of this
approach for model criticism and robust inference on examples in which we artificially introduce
realistic levels of misspecification.
2 Method
2.1 Error Model
As outlined above, we can explicitly model discrepancies between the observed data and simulations
by introducing an error model p(y|x), such that the assumed generative model is
p(θ,x,y) = p(y|x)p(x|θ)p(θ),(1)
where we assume that the error is independent of the simulator parameters
θ
, and we treat
x
as an
unobserved latent variable of the model. Using the equivalent factorisation
p(θ,x,y) = p(θ|x)p(x|y)p(y),
we can marginalise over the latent xand find an expression for the parameter posterior
p(θ,y) = p(y)Zp(x|y)p(θ|x) dx,(2)
p(θ|y) = Zp(x|y)p(θ|x) dx=Exp(x|y)[p(θ|x)].(3)
Equation
(3)
, implies that if we had access to
p(x|y)
and
p(θ|x)
, we could sample from the
parameter posterior by first sampling from the posterior over the latent variables
xp(x|y)
, and
1
Amortisation refers to the ability of a conditional distribution approximation to condition on arbitrary
instances of the conditioning variable, rather than being specialised to a particular instance.
3
then sampling from
p(θ|x)
. This approach reverses the data generating process from Equation
(1)
, whilst propagating uncertainty throughout, first removing the error from the observed data by
denoising, and then finding the associated simulator parameters given the denoised data. A Monte
Carlo approximation of the expectation in Equation
(3)
can also be used to estimate the posterior
density p(θ|y), i.e.
p(θ|y)1
M
M
X
m=1
p(θ|˜
xm),˜
x1,...,˜
xM,iid
p(x|y),(4)
however, we note that in some scenarios this approximation could have high variance. In practice,
p(x|y)
and
p(θ|x)
are unknown. We can approximate
p(θ|x)
by training a normalising flow
q(θ|x)
on simulated data
(x,θ)p(x|θ)p(θ)
, as is commonly done for NPE. To approximately
sample from
p(x|y)
, we i) specify an error model
p(y|x)
,ii) train a normalising flow,
q(x)
,
fitted to samples from the prior predictive distribution of the simulator
xp(x)
, and iii) sample
˜
xp(x|y)
, along with any error model parameters, using MCMC, as
ˆp(x|y)p(y|x)q(x)
.
We denote samples from
ˆp(x|y)
and
q(θ|x)
as
˜
x
and
˜
θ
to avoid confusion with prior samples and
simulations. Pseudo-code for the overall approach is given in Algorithm 1.
Algorithm 1: Robust neural posterior estimation (RNPE)
for iin 1 : Ndo
1Sample θip(θ)
2Simulate xip(x|θ)
end
3Train NPE q(θ|x)on {(θi,xi)}N
i=1
4Train q(x)on {xi}N
i=1
5Sample ˜
xmˆp(x|yo)p(yo|x)q(x), m = 1, . . . , M using MCMC
6Sample ˜
θmq(θ|˜
xm), m = 1, . . . , M
return {(˜
θm,˜
xm)}M
m=1, samples drawn approximately from p(θ,x|yo)
Under this framework, the standard SBI approach can be retrieved as the special case in which
p(y|x) = p(x|y) = δ(xy)
, where
δ
is the Dirac delta distribution, meaning that we revert to
assuming no discrepancy between the simulator and data generating process.
The idealised nature of simulation models implies that frequently there will be some characteristics
of the observed data that are well captured by simulations, and some aspects in which there exists
a discrepancy. Using a set of summary statistics for inference, rather than the raw data naturally
captures an interpretable, low dimensional and diverse set of characteristics of the simulated and
observed data. Because of this, we find that in general, reducing the raw data to summary statistics is
advantageous for model criticism, as the practitioner can choose summary statistics which align with
their belief about important model characteristics, and easily interpret any discrepancies (see Section
5). The requirement for domain specific knowledge to develop summary statistics has been alleviated
by recent work (e.g. Fearnhead and Prangle, 2012; Chan et al., 2018; Chen et al., 2020), which
seek to automatically embed the raw data to summary statistics; however, these embeddings often
lack a tangible interpretation. Additionally, the embedding methods themselves may not be robust
to misspecification. One approach is to use both hand-crafted and automatic summary statistics in
tandem. Hereafter, however, we generally perform model criticism and inference using hand-crafted
summary statistics, such that
x
and
yo
refer to the simulated and observed summary statistics, not
the raw data.
2.2 Spike and Slab
In practice, the true error model is unknown, and hence the error model should describe our prior
belief over the discrepancy between the data generating process and simulations. Additionally, to
facilitate model criticism, it should ideally allow easy assessment of which summary statistics are
approximately well-specified and which are misspecified. Inspired by sparsity-inducing priors used
in Bayesian regression literature (George and McCulloch, 1993), we use a “spike and slab” error
4
model on each summary statistic
xq(x),(5)
zjBernoulli(ρ), j = 1, . . . , D (6)
yj|xj, zjN(xj, σ2),if zj= 0
Cauchy(xj, τ),if zj= 1 j= 1, . . . , D (7)
where
xRD
. As the summary statistics have varying scales, we standardise them to have
mean zero and variance one. The Bernoulli variables indicate whether the
j
-th summary statistic
is approximately well-specified; we use the probability of
ρ= 0.5
to express the belief that it is
equally likely a summary statistic will be misspecified or well-specified a priori. When a summary
statistic is well-specified (i.e.,
zj= 0
), we take the error distribution to be a tight Gaussian (a spike)
centred on
xj
, choosing
σ= 0.01
to enforce consistency between the simulator and the observed
data for the
j
-th statistic. In contrast, in the misspecified case (i.e.,
zj= 1
), we take the error model
to be the much wider and heavy tailed Cauchy distribution (a slab). Generally, we might expect
misspecification to be relatively subtle, but some model inadequacies can lead to catastrophically
misspecified summary statistics. We chose the Cauchy scale
τ= 0.25
, to reflect this, as it places
half the mass within
±0.25
standard deviations of
xj
, but the long tails accommodate summary
statistics that are highly misspecified. The sparsity-inducing effect of this error model matches what
we consider to be a common scenario, namely that a proportion of the summary statistics are jointly
consistent with the simulator, whereas others may be incompatible. If we expect a priori more or
fewer of the summary statistics to be misspecified, the prior misspecification probability can be
adjusted accordingly. By marginalising over
z
, the spike and slab error model can also be written as
p(y|x) =
D
Y
j=1 (1 ρ)·p(yj|xj, zj= 0) + ρ·p(yj|xj, zj= 1),(8)
and as such can be viewed as an equally weighted mixture of the Gaussian spike,
p(yj|xj, zj= 0)
,
and the Cauchy slab,
p(yj|xj, zj= 1)
, for each summary statistic. We note that error model
hyperparameter tuning approaches could be considered, e.g. a reasonable heuristic would be to
choose an error model that is broad enough that the denoised data
˜
x
tend not to be outliers in
q(x)
,
compared to simulations
2
. However, we chose to keep the hyperparameters consistent across tasks,
to avoid the risk of overfitting to the tasks and to demonstrate that neither strong prior knowledge on
the error model, nor careful hyperparameter tuning is necessary to yield substantial improvements in
performance.
A key advantage of the spike and slab error model is given by the latent variable
z
. Similar to
posterior inclusion probabilities in Bayesian regression, the posterior frequency of being in the slab,
Pr(zj= 1 |y)
, can be used as an indicator of the posterior misspecification probability for the
j
-th summary statistic. By comparing to the prior misspecification probability
ρ
, we can say that if
Pr(zj= 1 |y)> ρ
, the model provides evidence of misspecification for the
j
-th summary statistic,
and if
Pr(zj= 1 |y)< ρ
, it provides evidence it is well-specified (Talbott, 2016). For the purpose
of generality, the latent variable
z
was not included when describing RNPE thus far. However, we
can jointly infer the posterior
ˆp(x,z|y)
in step 5 of algorithm 1, by using an MCMC algorithm
that supports sampling both continuous and discrete variables. We used mixed Hamiltonian Monte
Carlo (HMC) (Zhou, 2020, 2022) from the NumPyro python package (Phan et al., 2019), which is an
adaptation of HMC (Neal et al., 2011; Duane et al., 1987), for this purpose.
3 Related Work
3.1 Model Criticism in SBI
Posterior predictive checks have been widely used in SBI, to compare the predictive samples to the
observed data (e.g. Durkan et al., 2020; Greenberg et al., 2019; Papamakarios et al., 2019b). Although
this is a form of model criticism, we note a key limitation of this approach is that if a discrepancy is
discovered, it may not be clear whether this is due to the failure of the inference procedure, or due to
simulator misspecification. In general, it may be possible to identify the presence of misspecification
using anomaly/novelty detection. One such approach was suggested by Schmitt et al. (2021) in the
2
This could for example be assessed by estimating highest density regions of
q(x)
, using the density quantile
approach from Hyndman (1996).
5
摘要:

RobustNeuralPosteriorEstimationandStatisticalModelCriticismDanielWard1PatrickCannon2MarkBeaumont3MatteoFasiolo1SebastianM.Schmon2,41SchoolofMathematics,BristolUniversity,UK2Improbable,UK3SchoolofBiologicalSciences,BristolUniversity,UK4DepartmentofMathematicalSciences,DurhamUniversity,UKAbstractCompu...

展开>> 收起<<
Robust Neural Posterior Estimation and Statistical Model Criticism Daniel Ward1Patrick Cannon2Mark Beaumont3Matteo Fasiolo1.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:4.74MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注