Truncated proposals for scalable and hassle-free simulation-based inference Michael Deistler

2025-05-06 0 0 6.28MB 35 页 10玖币
侵权投诉
Truncated proposals for scalable and hassle-free
simulation-based inference
Michael Deistler
University of Tübingen
michael.deistler@uni-tuebingen.de
Pedro J Gonçalves
University of Tübingen
pedro.goncalves@uni-tuebingen.de
Jakob H Macke
University of Tübingen
Max Planck Institute for Intelligent Systems
jakob.macke@uni-tuebingen.de
Abstract
Simulation-based inference (SBI) solves statistical inverse problems by repeatedly
running a stochastic simulator and inferring posterior distributions from model-
simulations. To improve simulation efficiency, several inference methods take a
sequential approach and iteratively adapt the proposal distributions from which
model simulations are generated. However, many of these sequential methods are
difficult to use in practice, both because the resulting optimisation problems can be
challenging and efficient diagnostic tools are lacking. To overcome these issues,
we present Truncated Sequential Neural Posterior Estimation (TSNPE). TSNPE
performs sequential inference with truncated proposals, sidestepping the optimi-
sation issues of alternative approaches. In addition, TSNPE allows to efficiently
perform coverage tests that can scale to complex models with many parameters.
We demonstrate that TSNPE performs on par with previous methods on established
benchmark tasks. We then apply TSNPE to two challenging problems from neuro-
science and show that TSNPE can successfully obtain the posterior distributions,
whereas previous methods fail. Overall, our results demonstrate that TSNPE is
an efficient, accurate, and robust inference method that can scale to challenging
scientific models.
1 Introduction
Computational models are an important tool to understand physical processes underlying empirically
observed phenomena. These models, often implemented as numerical simulators, incorporate
mechanistic knowledge about the physical process underlying data generation, and thereby provide an
interpretable model of empirical observations. In many cases, several parameters of the simulator have
to be inferred from data, e.g., with Bayesian inference. However, performing Bayesian inference in
these models can be difficult: Running the simulator may be computationally expensive, evaluating the
likelihood-function might be computationally infeasible, and the model might not be differentiable. In
order to overcome these limitations, Approximate Bayesian Computation (ABC) methods [Beaumont
et al., 2002, 2009], synthetic likelihood approaches [Wood, 2010], and neural network-based methods
[e.g., Papamakarios and Murray, 2016, Hermans et al., 2020, Thomas et al., 2022] have been
developed.
Equal contribution
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.04815v2 [stat.ML] 10 Nov 2022
Figure 1:
APT vs
TSNPE.
Top: Prior
(gray) and true pos-
terior (black). APT
matches true posterior
within the prior bounds
but ‘leaks’ into region
without prior support.
TSNPE (ours) matches
true posterior.
A subset of neural network-based methods, known as neural posterior
estimation (NPE) [Papamakarios and Murray, 2016, Lueckmann et al.,
2017, Greenberg et al., 2019], train a neural density estimator on simulated
data such that the density estimator directly approximates the posterior.
Unlike other methods, NPE does not require any further Markov-chain
Monte-Carlo (MCMC) or variational inference (VI) steps. As it provides
an amortized approximation of the posterior, which can be used to quickly
evaluate and sample the approximate posterior for any observation, NPE
allows the application in time-critical and high-throughput inference scenar-
ios [Gonçalves et al., 2020, Radev et al., 2020, Dax et al., 2021], and fast
application of diagnostic methods which require posterior samples for many
different observations [Cook et al., 2006, Talts et al., 2018]. In addition,
unlike methods targeting the likelihood (e.g., neural likelihood estimation,
NLE [Papamakarios et al., 2019, Lueckmann et al., 2019]), NPE can learn
summary statistics from data and it can use equivariances in the simulations
to improve the quality of inference [Dax et al., 2021, 2022].
If inference is performed for a particular observation
xo
, sampling effi-
ciency of NPE can be improved with sequential training schemes: Instead
of drawing parameters from the prior distribution, they are drawn adaptively
from a proposal (e.g., a posterior estimate obtained with NPE) in order to
optimize the posterior accuracy for a particular
xo
. These procedures are
called Sequential Neural Posterior Estimation (SNPE) [Papamakarios and
Murray, 2016, Lueckmann et al., 2017, Greenberg et al., 2019] and have
been reported to be more simulation-efficient than training the neural net-
work only on parameters sampled from the prior, across a set of benchmark
tasks [Lueckmann et al., 2021].
Despite the potential to improve simulation-efficiency, two limitations have impeded a more
widespread adoption of SNPE by practitioners: First, the sequential scheme of SNPE can be unstable.
SNPE requires a modification of the loss function compared to NPE, which suffers from issues that
can limit its effectiveness on (or even prevent their application to) complex problems (see Sec. 2).
Second, several commonly used diagnostic tools for SBI [Talts et al., 2018, Miller et al., 2021,
Hermans et al., 2021] rely on performing inference across multiple observations. In SNPE (in contrast
to NPE), this requires generating new simulations and network retraining for each observation, which
often prohibits the use of such diagnostic tools [Lueckmann et al., 2021, Hermans et al., 2021].
Here, we introduce Truncated Sequential Neural Posterior Estimation (TSNPE) to overcome these
limitations. TSNPE follows the SNPE formalism, but uses a proposal which is a truncated version
of the prior: TSNPE draws simulations from the prior, but rejects them before simulation if they
lie outside of the support of the approximate posterior. Thus, the proposal is (within its support)
proportional to the prior, which allows us to train the neural network with maximum-likelihood in
every round and, therefore, sidesteps the instabilities (and hence ‘hassle’) of previous SNPE methods.
Our use of truncated proposals is strongly inspired by Blum and François [2010] and Miller et al.
[2020, 2021], who proposed truncated proposals respectively for regression-adjustment approaches
in ABC and for neural ratio estimation (see Discussion). Unlike methods based on likelihood(-ratio)-
estimation [Miller et al., 2021, Hermans et al., 2021], TSNPE allows direct sampling and density
evaluation of the approximate posterior, and thus permits computing expected coverage of the full
posterior quickly (without MCMC) and at every iteration of the algorithm, thus allowing to diagnose
failures of the method even for high-dimensional parameter spaces (we term this ‘simulation-based
coverage calibration’ (SBCC), given its close connection with simulation-based calibration, SBC,
Cook et al. [2006], Talts et al. [2018]).
We show that TSNPE is as efficient as the SNPE method ‘Automatic Posterior Transformation’ (APT,
Greenberg et al. [2019]) on several established benchmark problems (Sec. 4.1). We then demonstrate
that for two challenging neuroscience problems, TSNPE—but not APT—can robustly identify the
posterior distributions (Sec. 4.2).
2
Figure 2:
Truncated Sequential Neural Posterior Estimation (TSNPE).
The method starts by
sampling from the prior, running the simulator, and training a neural density estimator with maximum-
likelihood to approximate the posterior. In subsequent rounds, parameters are sampled from the prior,
but rejected if they lie outside of the support of the approximate posterior. With these proposals, the
neural density estimator can be trained with maximum-likelihood in all rounds.
2 Background
In Neural Posterior Estimation (NPE), parameters are sampled from the prior
p(θ)
and simulated
(i.e.,
x
is sampled from
p(x|θ)
). Then, a neural density estimator
qφ(θ|x)
(in our case a normalizing
flow), with learnable parameters φ, is trained to minimize the loss:
min
φL= min
φ
Eθp(θ)Exp(x|θ)[log qφ(θ|x)],
which is minimized if and only if, for a sufficiently expressive density estimator,
qφ(θ|x) = p(θ|x)
for all
xsupp(p(x))
[Paige and Wood, 2016, Papamakarios and Murray, 2016]. Throughout this
study, we refer to training with this loss function as maximum-likelihood training, although the neural
density estimator targets the posterior directly.
Sequential Neural Posterior Estimation (SNPE) aims to infer the posterior distribution
p(θ|xo)
for a
particular observation
xo
. SNPE initially performs NPE and, thereby, obtains an initial estimate of
the posterior distribution. It then samples parameters from a proposal
˜p(θ)
, which is often chosen to
be the previously obtained estimate of the posterior
˜p(θ) = qφ(θ|xo)
, and retrains the neural density
estimator [Papamakarios and Murray, 2016]. This procedure can be repeated for several rounds.
Importantly, if parameters
θ
are sampled from the proposal
˜p(θ)
rather than from the prior
p(θ)
,
the estimator
qφ(θ|x)
that minimizes the maximum-likelihood loss function no longer converges
to the true posterior. If one used the maximum-likelihood loss on data sampled from
˜p(θ)
, i.e.,
L=Eθ˜p(θ)Exp(x|θ)[log qφ(θ|x)]
, then
L
would be minimized by
qφ(θ|x)p(θ|x)˜p(θ)
p(θ)
,
which is not the true posterior. Multiple schemes have been developed to overcome this [Papamakarios
and Murray, 2016, Lueckmann et al., 2017]. The most recent of these methods, Automatic Posterior
Transformation (APT, or SNPE-C, in its atomic version) [Greenberg et al., 2019, Durkan et al., 2020]
employs a loss that aims to classify the parameter set that generated a particular data point among
other parameter sets (details in Appendix Sec. 6.5).
While APT has been reported to significantly outperform previous methods, several studies have also
described cases in which the approach exhibits performance issues: Both the original APT paper
[Greenberg et al., 2019] and Durkan et al. [2020] reported that APT can show ‘leakage’ of posterior
mass outside of bounded priors. We demonstrate this issue on a simple 1-dimensional simulator with
bounded prior (Fig. 1, Appendix Fig. 7). The posterior estimated by APT is only required to match
the true posterior density within the support of the prior (details in Appendix Sec. 6.5). Thus, after
five rounds of APT, while the approximate posterior matches the true posterior within the bounds of
the prior, a substantial fraction of posterior mass lies in regions with zero prior probability. In simple
models, approximate posterior samples that lie outside of the prior bounds can be efficiently rejected.
However, in models with high numbers of parameters, the rejection rate can become so large that
drawing posterior samples which lie inside of the prior bounds is prohibitive. For example, Glöckler
et al. [2022] reported a rejection rate of more than 99.9999% in a model with 31 parameters, thus
requiring approximately one minute to draw a single posterior sample from within the prior bounds.
3
Algorithm 1: TSNPE
Inputs:
prior
p(θ)
, observation
xo
, simulations per round
N
, number of rounds
R
,
that defines
the highest-probability region (HPR)
Outputs: Approximate posterior qφ.
Initialize: Proposal ˜p(θ) = p(θ), dataset X={}
for r[1, ..., R]do
for i[1, ..., N]do
θi˜p(θ)
simulate xip(x|θi)
add (θi,xi)to X
φ= arg minφ1
NP(θi,xi)∈X log qφ(θi|xi)
Compute expected coverage(˜p(θ), qφ) ; // see Alg. 2
˜p(θ)p(θ)·1θHPR;// see Alg. 3
We overcome these limitations by using ‘truncated’ proposal distributions. This allows us to train
with maximum-likelihood at every round, thereby sidestepping issues of previous SNPE methods.
3 Methodology
3.1 Truncated proposals for SNPE
Given a particular observation
xo
, we suggest to restrict the proposals
˜p(θ)
to be proportional to the
prior
p(θ)
at least in the
1
highest-probability-region (
HPR
, the smallest region that contains
1of the mass) of p(θ|xo), i.e.
˜p(θ)p(θ)·1θ∈M
with
HPR(p(θ|xo)) ⊆ M
. Thus,
˜p(θ)
is a ‘truncated’ proposal. The key insight is that, when using
such a proposal and = 0, one can train qφ(θ|x)with maximum likelihood:
min
φL= min
φ
Eθ˜p(θ)Exp(x|θ)[log qφ(θ|x)],
and qφ(θ|xo)will still converge to p(θ|xo)(Proof in Appendix Sec. 6.2).
We estimate
M
as the
HPR
of the approximate posterior
M=HPR(qφ(θ|xo))
. Since the
maximum-likelihood loss employed to train
qφ(θ|x)
is support-covering, the
HPR
of
qφ(θ|xo)
tends to cover the HPRof p(θ|xo)[Bishop and Nasrabadi, 2006].
In order to obtain the
HPR
of
qφ(θ|xo)
, we define a threshold
τ
on the approximate posterior density
qφ(θ|xo)
. To do so, we use a normalizing flow as
qφ(θ|x)
, which allows for closed-form density
evaluation and fast sampling. We then approximate the HPRof qφ(θ|xo)as
HPR(qφ(θ|xo)) 1qφ(θ|xo)>τ.
We chose
τ
as the
-quantile of approximate posterior densities of samples from
qφ(θ|xo)
, and
evaluated TSNPE for
= 103
,
104
, and
105
. Values of
 > 0
yield a proposal prior which has
smaller support than the current estimate of the posterior, e.g., using
= 103
neglects 0.1% of mass
from the approximate-posterior support. Thus, this approach leads to errors in posterior estimation,
e.g., to ‘under-covered’ posteriors (Appendix Sec. 6.10). However, empirically, the error induced
by this truncation is negligible, as we will demonstrate on several benchmark tasks. We note that
TSNPE can be trained on data pooled from all rounds (Appendix Sec. 6.2). TSNPE is summarized in
Alg. 1 (Fig. 2).
3.2 Sampling from the truncated proposal
To generate training data for subsequent rounds, we have to draw samples from the truncated proposal
˜p(θ)
, and here we explored rejection sampling and sampling importance resampling (SIR) [Rubin,
1988]. For rejection sampling, we sample the prior
θp(θ)
and accept samples only if their
probability under the approximate posterior qφ(θ|x)is above threshold τ.
4
Figure 3
:
Diagnostic tool.
(a)
Parameter
θ
(green) lies within
the 1-
α
confidence region (gray)
of the estimated posterior. (b)
log(p(θ|x))
is above the 1-
α
quantile of posterior samples. (c)
1-
α
versus empirical coverage,
averaged over θ.
This strategy samples from the truncated proposal exactly, but can fail if the rejection rate be-
comes too high. To deal with these situations, we used SIR. For each sample from the truncated
proposal, SIR draws
K
samples from the approximate posterior, computes weights
wi=1...K =
p(θi)1θi∈M/qφ(θi|x)
, normalizes
wi
such that they sum to one, draws from a categorical dis-
tribution with weights
sCategorical(wi)
, and selects the posterior sample with index
s
. SIR
requires a fixed sampling budget of
K
posterior samples per sample from the truncated proposal and
returns exact samples from the truncated proposal for
K→ ∞
. Too low values of
K
lead to too
narrow proposals and posterior approximations. When run for a number of rounds, this behaviour
reinforces itself and can lead to divergence of TSNPE (Appendix Fig. 13). We, thus, chose a high
value
K= 1024
. In our experiments, we did not observe poor SIR performance, but we emphasise
the importance of using tools to diagnose potential failures of TSNPE (see below) or SIR (e.g. by
inspecting the effective sample size, Appendix Sec. 6.12). When SIR fails, methods such as nested
sampling, adaptive multi-level splitting, or sequential Monte-Carlo sampling could be viable alterna-
tives [Skilling, 2004, Cérou and Guyader, 2007, Doucet et al., 2001]. We discuss computational costs
of rejection sampling and SIR in Appendix Sec. 6.11.
3.3 Coverage diagnostic
In order for the estimated posterior
qφ(θ|xo)
to converge to
p(θ|xo)
, TSNPE requires
supp(p(θ|xo)) HPR(qφ(θ|xo))
, i.e., the estimated posterior must be broader than the true
posterior (proof in Appendix Sec. 6.2). In order to diagnose whether the posterior is, on average,
sufficiently broad, we perform expected coverage tests as proposed in Dalmasso et al. [2020], Miller
et al. [2021], Hermans et al. [2021].
As described in Dalmasso et al. [2020], Rozet et al. [2021] and illustrated in Fig. 3, the coverage of
the approximate posterior can be computed as
1α=Zqφ(θ|x)1(qφ(θ|x)qφ(θ|x))dθ
where
θ
is sampled from the truncated proposal and
x
is the corresponding simulator output.
In order to approximate this integral, one has to either evaluate the approximate posterior on a
grid [Dalmasso et al., 2020, Hermans et al., 2021] or apply a Monte-Carlo average which includes
repeatedly sampling (and evaluating) the (unnormalized) approximate posterior [Miller et al., 2021,
Rozet et al., 2021]. The first option does not scale to high-dimensional spaces whereas the second
is computationally expensive for methods estimating likelihood(-ratios) and, thus, require MCMC.
In contrast, the TSNPE-posterior can be sampled from and evaluated in closed-form, leading to a
computationally efficient and scalable diagnostic which can be run after every training round.
Expected coverage can be computed as an average of the coverage across multiple pairs
(θ,x)
[Miller et al., 2021, Hermans et al., 2021] and should match the confidence level for all confidence
levels
(1 α)[0,1]
(Fig. 3c). We term this procedure of computing the empirical coverage
‘simulation-based coverage calibration‘ (SBCC), due to its close connection with SBC [Cook et al.,
2006, Talts et al., 2018] (identical under certain conditions, Appendix Sec. 6.6). For TSNPE, it is
important that the empirical expected coverage matches the confidence level for high confidence
levels (i.e., for small
α
), since overconfidence in these regions would indicate that ground-truth
parameters θare falsely excluded from the HPR=α. SBCC is summarized in Appendix Alg. 2.
5
摘要:

Truncatedproposalsforscalableandhassle-freesimulation-basedinferenceMichaelDeistlerUniversityofTübingenmichael.deistler@uni-tuebingen.dePedroJGonçalvesUniversityofTübingenpedro.goncalves@uni-tuebingen.deJakobHMackeUniversityofTübingenMaxPlanckInstituteforIntelligentSystemsjakob.macke@uni-tuebingen...

展开>> 收起<<
Truncated proposals for scalable and hassle-free simulation-based inference Michael Deistler.pdf

共35页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:35 页 大小:6.28MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 35
客服
关注