Efficient identification of informative features in simulation-based inference Jonas Beck

2025-05-03 0 0 1.92MB 18 页 10玖币
侵权投诉
Efficient identification of informative features in
simulation-based inference
Jonas Beck
University of Tübingen
jonas.beck@uni-tuebingen.de
Michael Deistler
University of Tübingen
michael.deistler@uni-tuebingen.de
Yves Bernaerts
University of Tübingen
yves.bernaerts@uni-tuebingen.de
Jakob H. Macke
University of Tübingen
Max Planck Institute for Intelligent Systems
jakob.macke@uni-tuebingen.de
Philipp Berens
University of Tübingen
philipp.berens@uni-tuebingen.de
Abstract
Simulation-based Bayesian inference (SBI) can be used to estimate the parameters
of complex mechanistic models given observed model outputs without requiring
access to explicit likelihood evaluations. A prime example for the application of
SBI in neuroscience involves estimating the parameters governing the response
dynamics of Hodgkin-Huxley (HH) models from electrophysiological measure-
ments, by inferring a posterior over the parameters that is consistent with a set of
observations. To this end, many SBI methods employ a set of summary statistics or
scientifically interpretable features to estimate a surrogate likelihood or posterior.
However, currently, there is no way to identify how much each summary statistic
or feature contributes to reducing posterior uncertainty. To address this challenge,
one could simply compare the posteriors with and without a given feature included
in the inference process. However, for large or nested feature sets, this would
necessitate repeatedly estimating the posterior, which is computationally expensive
or even prohibitive. Here, we provide a more efficient approach based on the SBI
method neural likelihood estimation (NLE): We show that one can marginalize
the trained surrogate likelihood post-hoc before inferring the posterior to assess
the contribution of a feature. We demonstrate the usefulness of our method by
identifying the most important features for inferring parameters of an example HH
neuron model. Beyond neuroscience, our method is generally applicable to SBI
workflows that rely on data features for inference used in other scientific fields.
1 Introduction
Mechanistic models are an elegant way to encode scientific knowledge about the world in the form of
numerical simulations. They include models such as Kepler’s laws of planetary motion [1], the SEIR
model [2] for describing the spread of infectious diseases or the Hodgkin-Huxley (HH) model for the
dynamics of action potentials in neurons [3]. Efficiently constraining the parameters of such models by
measurements is a key problem in many disciplines [4–7]. Since these models give rise to intractable
likelihood functions, however, classical likelihood-based Bayesian methods such as variational
inference [8] or Markov Chain Monte Carlo [9] cannot be used directly. There are algorithms,
Preprint. Under review.
arXiv:2210.11915v2 [cs.LG] 25 Nov 2022
probability
feature 1
(marginal) likelihood
feature 1
feature 2
probability
feature 2
posterior
parameter 1
parameter 2
parameter 1
parameter 2
parameter 1
parameter 2
MDN
prior
probability
parameter 1
parameter 2
simulated data
mV
ms
mechanistic model observation
mV
ms
FSLM
NLE
1 2 34
Figure 1: Feature Selection Through Likelihood Marginalization (FSLM) is a method to identify
informative features in simulation-based inference (SBI). It builds on Neural Likelihood Estimation
(NLE) with mixture density networks (MDN). This requires a prior over the parameter space, a
mechanistic model to simulate data and an observed data point.
1.
Parameters are sampled from
the prior and used to simulate a synthetic dataset.
2.
A MDN learns the probabilistic relationship
between data (or data features) and underlying parameters, in the form of a tractable surrogate
likelihood.
3.
After the MDN has learned to approximate the likelihood, it can be conditioned on the
observation to yield a likelihood estimate that is parameterised as a mixture of Gaussians.
4.
This
surrogate likelihood is then combined with the prior distribution to obtain the posterior distribution,
i.e. the space of parameters consistent with both prior knowledge and data. Consistent parameters are
assigned a high, inconsistent parameters a low probability. The naive approach for identifying the
importance of features would be to repeat
2.
for different sets of data features (grey). With FSLM,
the likelihood estimate can be marginalized post-hoc and a single estimate is therefore sufficient to
obtain multiple posterior distributions (black).
such as ABC-MCMC [10] or pseudo-marginal methods [11] that deal with this problem, however,
they have potentially slow convergence rates and are computationally expensive. To overcome this
problem, several techniques collectively known as simulation-based inference (SBI) have recently
been developed [12]. Leveraging the ability to simulate the model, these techniques obtain estimates
of the likelihood or posterior from simulated data. The most recent of these algorithms such as neural
likelihood estimation (NLE) [13] or neural posterior estimation (NPE) [14–16] employ state-of-the-art
neural density estimators to learn tractable surrogates of these functions to be evaluated instead of the
real quantities. To this end, summary statistics or features capturing essential aspects of the model’s
dynamics are defined by the scientist to reduce the high dimensional model output to manageable
scale, in turn decreasing the problem complexity and computation costs. This is often done by hand to
emphasize specific aspects of the data or to aid scientific interpretation [12], but can also be automated
[17–23].
In neuroscience, many different approaches have been developed in order to find suitable parameters
of models of neural activity [4, 5, 24–29]. SBI approaches have been used to infer parameters
in biophysical neuron models from measurements of neural activity in a Bayesian way [15, 30,
31]. For inference of HH models from electrophysiological data, features such as action potential
threshold or width and resting membrane potential or spike count can be used, reflecting measures
that electrophysiologists use to characterize recorded neurons.
For scientific interpretation of SBI results, it is often of interest which features, or combinations of
features, have the biggest impact on the posterior and which parameters they affect specifically. To
this end, one could compare the posterior uncertainty estimated with and without including a specific
feature in the SBI method — the increase in uncertainty resulting from not relying on that feature
can be used as a measure of its importance and on average is equivalent to its mutual information.
Of course, this approach can also be applied to whole subsets of features. To evaluate an entire set
of features exhaustively would require re-estimating the posterior many times (Fig. 1, grey), which
would scale prohibitively with the number of features. To address this issue, we here introduce a
method called Feature Selection Through Likelihood Marginalization (FSLM) to compute posteriors
for arbitrary subsets of features, without the need for repeated training (Fig. 1, black). To achieve
this, we here use NLE [32] and exploit the marignalization properties of mixture density networks
2
(MDNs) [33], which can be used with NLE as a density estimator: instead of re-estimating the
surrogate likelihood from scratch, we marginalize it analytically with respect to a given (set of)
features before applying Bayes rule to obtain the posterior estimate. This way, we can efficiently
compare the posterior uncertainty with and without including a certain feature.
For a simple linear Gaussian model and non-linear HH models, we show that the obtained posterior
estimates are as accurate and robust as when repeatedly retraining density estimators from scratch,
despite being much faster to compute. We then apply our algorithm to study which features are the
most useful for constraining posteriors over parameters of the HH model. Our tool allows us also to
study which model parameter is affected by which summary feature. Finally, we suggest a greedy
feature selection strategy to select useful features from a predefined set.
2 Methods
2.1 Neural Likelihood Estimation (NLE)
Neural Likelihood Estimation [32] is a SBI method which approximates the likelihood
p(x|θ)
of
the data
x
given the model parameters
θ
by training a conditional neural density estimator on data
generated from simulations of a mechanistic model. Using Bayes rule, the approximate likelihood
can then be used to obtain an estimate of the posterior (Fig. 1). Unlike NPE [15, 16], NLE requires
an additional sampling or inference step [34] to obtain the posterior distribution. However, it allows
access to the intermediate likelihood approximation, a property we will later exploit to develop our
efficient feature selection algorithm (see Sec. 2.3).
First, a set of N parameters are sampled from the prior distribution
θnp(θ), n ∈ {1, . . . , N}
.
With these parameters, the simulator is run to implicitly sample from the model’s likelihood function
according to
xnp(x|θn)
. Here,
xn= (x1, . . . , xNf)
are feature vectors that are usually taken
to be a function of the simulator output
s
with the individual features
xi=fi(s)
, where
i
{1, . . . , Nf}
, rather than the output directly. For mechanistic models whose output are time series,
f
also reduces the dimensionality of the data to a set of lower dimensional data features. The resulting
training data
{xn,θn}1:Np(x,θ)
can then be used to train a conditional density estimator
qφ(x|θ)
, parameterized by
φ
, to approximate the likelihood function. Here, we use a Mixture
Density Network for this task [33], the output density of which is parameterised by a mixture of
Gaussians, which can be marginalized analytically. Thus,
ˆp(x|θ) = PkπkN(µk,Σk)
with the
parameters
(µk,Σk,π)
being non-linear functions of the inputs
θ
and the network parameters
φ
.
The parameters
φ
are optimized by maximizing the log-likelihood
L(x,θ) = 1
NPnlog qφ(x|θ)
over the training data with respect to
φ
. As the number of training samples goes to infinity, this
is equivalent to maximizing the negative Kullback-Leibler (KL) divergence between the true and
approximate posterior for every xsupp(x)[32]:
Ep(θ,x)[log qφ(x|θ)] = Ep(θ)[DKL(p(x|θ)|| qφ(x|θ))] + const. (1)
After obtaining such a tractable likelihood surrogate, Bayes rule can be used to obtain an estimate
of the posterior conditioned on an observation
xo
(see Eq. 2) for instance via Markov Chain Monte
Carlo (MCMC):
ˆp(θ|xo)qφ(xo|θ)p(θ)(2)
2.2 A naive algorithm for quantifying feature importance
Given this posterior estimate, we would now like to answer the following question: for which feature
xi
from a vector of features
x
does the uncertainty of the posterior estimate
ˆp(θ|x)
increase the
most, when it is ignored? For a single feature, a naive and costly algorithm does the following: iterate
over
x
to obtain
x\i= (x1,...xi1, xi+1, . . . , xN)
, train a total of N+1 density estimators to obtain
the likelihoods
ˆp(x\i|θ)
and
ˆp(x|θ)
, sample their associated posteriors and compare
ˆp(θ|x\i)
to the
reference posterior
ˆp(θ|x)
with every feature present. The same procedure can also be applied to
quantify the contribution of any arbitrary subset of features. As this procedure requires estimating the
likelihood based on the reduced feature set from scratch for each feature (set) (Fig. 1, grey arrows),
it is computationally costly. To quantify the contribution of different features more efficiently, we
would thus need a way to avoid re-estimating the posterior with each feature left out.
3
-5 5
θ0
-5 5
θ1
-5 5
θ2
p(θx , x , x , x )0 1 2 3
Ground truth
NLE+MDN
FSLM
θo
a
-5 5
θ0
-5 5
θ1
-5 5
θ2
p(θx , x , x )123
b
θ0
θ1
θ2
Ground truth
x0x1x2x3
θ0
θ1
θ2
FSLM
1
2
6
14
I
I
Q
Q
R
R
(
(
p
p)
)
̂
̂
\i
Removed feature
Model parameters
Removed feature
Model parameters
c
Figure 2: FSLM can accurately compute posteriors with one feature marginalized out for a linear
model described in Eq. 3:
a.
1D and 2D marginals of the full posterior estimated by NLE and the
ground truth, which can be analytically computed.
b.
Posterior distribution when
x0
is removed from
the feature set. FSLM is as accurate as re-estimating NLE on the reduced feature set.
c.
Increase in
the uncertainty of the marginal posterior distributions measured using the IQR ratio (as defined in
Sec. 2.4) for each feature, for the analytical posteriors and those computed using FSLM. The joint
influence of x1and x2on θ2, as per our model, is clearly visible.
2.3 Efficient quantification of feature importance through post-hoc likelihood
marginalization
To more efficiently estimate posteriors for any subset of features, we propose to marginalize the
likelihood estimate obtained via NLE post-hoc, as opposed to training a separate density estimator
for every new set of features to obtain the marginal estimates (Sec. 2.2, Fig. 1). We refer to this
algorithm as efficient Feature Selection through Likelihood Marginalization (FSLM).
Suppose we want to evaluate the joint increase in posterior uncertainty for a subset of features. We
partition our feature vector
x= [x1,x2]
, such that
x2
contains the features to be removed from the
inference. To avoid the need for training the surrogate likelihood
qφ
for such a feature subset, we can
rewrite the posterior with respect to x1as
ˆp(θ|x1)qφ(x1|θ)p(θ) = Zqφ(x1,x2|θ)dx2p(θ)(3)
Since we parameterize the approximate likelihood
qφ
as an MDN, we can perform this marginalization
analytically: For a K-component MDN, the parameters of each mixture component can be similarly
partitioned to x[35, Chapter 2.3.2]:
µk= [µk,1,µk,2],Σk=Σk,11 Σk,12
Σk,21 Σk,22, k ∈ {1, ..., K}.(4)
Then the marginalization with respect to x2results in the following posterior distribution
ˆp(θ|x1)qφ(x1|θ)p(θ) =
K
X
k=1
πkNk(x1|µk,1,Σk,11)p(θ),(5)
where
π=π(θ)
,
µ=µ(θ)
and
Σ=Σ(θ)
. FSLM can thus sample from the posterior given
arbitrary feature subsets without estimating the surrogate likelihood qφfrom scratch.
We implement FSLM using python 3.8, building on top of the public sbi library [36]. All the code
is available at github.com/berenslab/fslm_repo. All computations were done on an internal cluster
running Intel(R) Xeon(R) Gold 6226R CPUs @ 2.90GHz.
2.4 Measures of posterior uncertainty
Assuming the NLE procedure converges and the density estimator is sufficiently flexible, differences
in posterior uncertainty can be ascribed to differences in information content of features or noise in the
data. Given that that noise in the data remains constant, relative differences in posterior uncertainty
can be used to assess the contributions of individual summary features. Hence, we can identify
informative summary statistics by quantifying the change in uncertainty of ˆp(θ|x\i)with respect to
ˆp(θ|x). For this we use two metrics:
4
摘要:

Efcientidenticationofinformativefeaturesinsimulation-basedinferenceJonasBeckUniversityofTübingenjonas.beck@uni-tuebingen.deMichaelDeistlerUniversityofTübingenmichael.deistler@uni-tuebingen.deYvesBernaertsUniversityofTübingenyves.bernaerts@uni-tuebingen.deJakobH.MackeUniversityofTübingenMaxPlanckIn...

展开>> 收起<<
Efficient identification of informative features in simulation-based inference Jonas Beck.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:1.92MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注