Efﬁcient identiﬁcation of informative features in simulation-based inference Jonas Beck

2025-05-03 0 0 1.92MB 18 页 10玖币

侵权投诉

Efﬁcient identiﬁcation of informative features in

simulation-based inference

Jonas Beck

University of Tübingen

jonas.beck@uni-tuebingen.de

Michael Deistler

University of Tübingen

michael.deistler@uni-tuebingen.de

Yves Bernaerts

University of Tübingen

yves.bernaerts@uni-tuebingen.de

Jakob H. Macke

University of Tübingen

Max Planck Institute for Intelligent Systems

jakob.macke@uni-tuebingen.de

Philipp Berens

University of Tübingen

philipp.berens@uni-tuebingen.de

Abstract

Simulation-based Bayesian inference (SBI) can be used to estimate the parameters

of complex mechanistic models given observed model outputs without requiring

access to explicit likelihood evaluations. A prime example for the application of

SBI in neuroscience involves estimating the parameters governing the response

dynamics of Hodgkin-Huxley (HH) models from electrophysiological measure-

ments, by inferring a posterior over the parameters that is consistent with a set of

observations. To this end, many SBI methods employ a set of summary statistics or

scientiﬁcally interpretable features to estimate a surrogate likelihood or posterior.

However, currently, there is no way to identify how much each summary statistic

or feature contributes to reducing posterior uncertainty. To address this challenge,

one could simply compare the posteriors with and without a given feature included

in the inference process. However, for large or nested feature sets, this would

necessitate repeatedly estimating the posterior, which is computationally expensive

or even prohibitive. Here, we provide a more efﬁcient approach based on the SBI

method neural likelihood estimation (NLE): We show that one can marginalize

the trained surrogate likelihood post-hoc before inferring the posterior to assess

the contribution of a feature. We demonstrate the usefulness of our method by

identifying the most important features for inferring parameters of an example HH

neuron model. Beyond neuroscience, our method is generally applicable to SBI

workﬂows that rely on data features for inference used in other scientiﬁc ﬁelds.

1 Introduction

Mechanistic models are an elegant way to encode scientiﬁc knowledge about the world in the form of

numerical simulations. They include models such as Kepler’s laws of planetary motion [1], the SEIR

model [2] for describing the spread of infectious diseases or the Hodgkin-Huxley (HH) model for the

dynamics of action potentials in neurons [3]. Efﬁciently constraining the parameters of such models by

measurements is a key problem in many disciplines [4–7]. Since these models give rise to intractable

likelihood functions, however, classical likelihood-based Bayesian methods such as variational

inference [8] or Markov Chain Monte Carlo [9] cannot be used directly. There are algorithms,

Preprint. Under review.

arXiv:2210.11915v2 [cs.LG] 25 Nov 2022

probability

feature 1

(marginal) likelihood

feature 1

feature 2

probability

feature 2

posterior

parameter 1

parameter 2

parameter 1

parameter 2

parameter 1

parameter 2

MDN

prior

probability

parameter 1

parameter 2

simulated data

mechanistic model observation

FSLM

NLE

1 2 34

Figure 1: Feature Selection Through Likelihood Marginalization (FSLM) is a method to identify

informative features in simulation-based inference (SBI). It builds on Neural Likelihood Estimation

(NLE) with mixture density networks (MDN). This requires a prior over the parameter space, a

mechanistic model to simulate data and an observed data point.

Parameters are sampled from

the prior and used to simulate a synthetic dataset.

A MDN learns the probabilistic relationship

between data (or data features) and underlying parameters, in the form of a tractable surrogate

likelihood.

After the MDN has learned to approximate the likelihood, it can be conditioned on the

observation to yield a likelihood estimate that is parameterised as a mixture of Gaussians.

This

surrogate likelihood is then combined with the prior distribution to obtain the posterior distribution,

i.e. the space of parameters consistent with both prior knowledge and data. Consistent parameters are

assigned a high, inconsistent parameters a low probability. The naive approach for identifying the

importance of features would be to repeat

for different sets of data features (grey). With FSLM,

the likelihood estimate can be marginalized post-hoc and a single estimate is therefore sufﬁcient to

obtain multiple posterior distributions (black).

such as ABC-MCMC [10] or pseudo-marginal methods [11] that deal with this problem, however,

they have potentially slow convergence rates and are computationally expensive. To overcome this

problem, several techniques collectively known as simulation-based inference (SBI) have recently

been developed [12]. Leveraging the ability to simulate the model, these techniques obtain estimates

of the likelihood or posterior from simulated data. The most recent of these algorithms such as neural

likelihood estimation (NLE) [13] or neural posterior estimation (NPE) [14–16] employ state-of-the-art

neural density estimators to learn tractable surrogates of these functions to be evaluated instead of the

real quantities. To this end, summary statistics or features capturing essential aspects of the model’s

dynamics are deﬁned by the scientist to reduce the high dimensional model output to manageable

scale, in turn decreasing the problem complexity and computation costs. This is often done by hand to

emphasize speciﬁc aspects of the data or to aid scientiﬁc interpretation [12], but can also be automated

[17–23].

In neuroscience, many different approaches have been developed in order to ﬁnd suitable parameters

of models of neural activity [4, 5, 24–29]. SBI approaches have been used to infer parameters

in biophysical neuron models from measurements of neural activity in a Bayesian way [15, 30,

31]. For inference of HH models from electrophysiological data, features such as action potential

threshold or width and resting membrane potential or spike count can be used, reﬂecting measures

that electrophysiologists use to characterize recorded neurons.

For scientiﬁc interpretation of SBI results, it is often of interest which features, or combinations of

features, have the biggest impact on the posterior and which parameters they affect speciﬁcally. To

this end, one could compare the posterior uncertainty estimated with and without including a speciﬁc

feature in the SBI method — the increase in uncertainty resulting from not relying on that feature

can be used as a measure of its importance and on average is equivalent to its mutual information.

Of course, this approach can also be applied to whole subsets of features. To evaluate an entire set

of features exhaustively would require re-estimating the posterior many times (Fig. 1, grey), which

would scale prohibitively with the number of features. To address this issue, we here introduce a

method called Feature Selection Through Likelihood Marginalization (FSLM) to compute posteriors

for arbitrary subsets of features, without the need for repeated training (Fig. 1, black). To achieve

this, we here use NLE [32] and exploit the marignalization properties of mixture density networks

(MDNs) [33], which can be used with NLE as a density estimator: instead of re-estimating the

surrogate likelihood from scratch, we marginalize it analytically with respect to a given (set of)

features before applying Bayes rule to obtain the posterior estimate. This way, we can efﬁciently

compare the posterior uncertainty with and without including a certain feature.

For a simple linear Gaussian model and non-linear HH models, we show that the obtained posterior

estimates are as accurate and robust as when repeatedly retraining density estimators from scratch,

despite being much faster to compute. We then apply our algorithm to study which features are the

most useful for constraining posteriors over parameters of the HH model. Our tool allows us also to

study which model parameter is affected by which summary feature. Finally, we suggest a greedy

feature selection strategy to select useful features from a predeﬁned set.

2 Methods

2.1 Neural Likelihood Estimation (NLE)

Neural Likelihood Estimation [32] is a SBI method which approximates the likelihood

p(x|θ)

the data

given the model parameters

by training a conditional neural density estimator on data

generated from simulations of a mechanistic model. Using Bayes rule, the approximate likelihood

can then be used to obtain an estimate of the posterior (Fig. 1). Unlike NPE [15, 16], NLE requires

an additional sampling or inference step [34] to obtain the posterior distribution. However, it allows

access to the intermediate likelihood approximation, a property we will later exploit to develop our

efﬁcient feature selection algorithm (see Sec. 2.3).

First, a set of N parameters are sampled from the prior distribution

θn∼p(θ), n ∈ {1, . . . , N}

With these parameters, the simulator is run to implicitly sample from the model’s likelihood function

according to

xn∼p(x|θn)

. Here,

xn= (x1, . . . , xNf)

are feature vectors that are usually taken

to be a function of the simulator output

with the individual features

xi=fi(s)

, where

i∈

{1, . . . , Nf}

, rather than the output directly. For mechanistic models whose output are time series,

also reduces the dimensionality of the data to a set of lower dimensional data features. The resulting

training data

{xn,θn}1:N∼p(x,θ)

can then be used to train a conditional density estimator

qφ(x|θ)

, parameterized by

, to approximate the likelihood function. Here, we use a Mixture

Density Network for this task [33], the output density of which is parameterised by a mixture of

Gaussians, which can be marginalized analytically. Thus,

ˆp(x|θ) = PkπkN(µk,Σk)

with the

parameters

(µk,Σk,π)

being non-linear functions of the inputs

and the network parameters

The parameters

are optimized by maximizing the log-likelihood

L(x,θ) = 1

NPnlog qφ(x|θ)

over the training data with respect to

. As the number of training samples goes to inﬁnity, this

is equivalent to maximizing the negative Kullback-Leibler (KL) divergence between the true and

approximate posterior for every x∈supp(x)[32]:

Ep(θ,x)[log qφ(x|θ)] = −Ep(θ)[DKL(p(x|θ)|| qφ(x|θ))] + const. (1)

After obtaining such a tractable likelihood surrogate, Bayes rule can be used to obtain an estimate

of the posterior conditioned on an observation

(see Eq. 2) for instance via Markov Chain Monte

Carlo (MCMC):

ˆp(θ|xo)∝qφ(xo|θ)p(θ)(2)

2.2 A naive algorithm for quantifying feature importance

Given this posterior estimate, we would now like to answer the following question: for which feature

from a vector of features

does the uncertainty of the posterior estimate

ˆp(θ|x)

increase the

most, when it is ignored? For a single feature, a naive and costly algorithm does the following: iterate

over

to obtain

x\i= (x1,...xi−1, xi+1, . . . , xN)

, train a total of N+1 density estimators to obtain

the likelihoods

ˆp(x\i|θ)

and

ˆp(x|θ)

, sample their associated posteriors and compare

ˆp(θ|x\i)

to the

reference posterior

ˆp(θ|x)

with every feature present. The same procedure can also be applied to

quantify the contribution of any arbitrary subset of features. As this procedure requires estimating the

likelihood based on the reduced feature set from scratch for each feature (set) (Fig. 1, grey arrows),

it is computationally costly. To quantify the contribution of different features more efﬁciently, we

would thus need a way to avoid re-estimating the posterior with each feature left out.

-5 5

θ0

-5 5

θ1

-5 5

θ2

p(θx , x , x , x )∣0 1 2 3

Ground truth

NLE+MDN

FSLM

θo

-5 5

θ0

-5 5

θ1

-5 5

θ2

p(θx , x , x )∣123

θ0

θ1

θ2

Ground truth

x0x1x2x3

θ0

θ1

θ2

FSLM

(

)

Removed feature

Model parameters

Removed feature

Model parameters

Figure 2: FSLM can accurately compute posteriors with one feature marginalized out for a linear

model described in Eq. 3:

1D and 2D marginals of the full posterior estimated by NLE and the

ground truth, which can be analytically computed.

Posterior distribution when

is removed from

the feature set. FSLM is as accurate as re-estimating NLE on the reduced feature set.

Increase in

the uncertainty of the marginal posterior distributions measured using the IQR ratio (as deﬁned in

Sec. 2.4) for each feature, for the analytical posteriors and those computed using FSLM. The joint

inﬂuence of x1and x2on θ2, as per our model, is clearly visible.

2.3 Efﬁcient quantiﬁcation of feature importance through post-hoc likelihood

marginalization

To more efﬁciently estimate posteriors for any subset of features, we propose to marginalize the

likelihood estimate obtained via NLE post-hoc, as opposed to training a separate density estimator

for every new set of features to obtain the marginal estimates (Sec. 2.2, Fig. 1). We refer to this

algorithm as efﬁcient Feature Selection through Likelihood Marginalization (FSLM).

Suppose we want to evaluate the joint increase in posterior uncertainty for a subset of features. We

partition our feature vector

x= [x1,x2]

, such that

contains the features to be removed from the

inference. To avoid the need for training the surrogate likelihood

qφ

for such a feature subset, we can

rewrite the posterior with respect to x1as

ˆp(θ|x1)∝qφ(x1|θ)p(θ) = Zqφ(x1,x2|θ)dx2p(θ)(3)

Since we parameterize the approximate likelihood

qφ

as an MDN, we can perform this marginalization

analytically: For a K-component MDN, the parameters of each mixture component can be similarly

partitioned to x[35, Chapter 2.3.2]:

µk= [µk,1,µk,2],Σk=Σk,11 Σk,12

Σk,21 Σk,22, k ∈ {1, ..., K}.(4)

Then the marginalization with respect to x2results in the following posterior distribution

ˆp(θ|x1)∝qφ(x1|θ)p(θ) =

k=1

πkNk(x1|µk,1,Σk,11)p(θ),(5)

where

π=π(θ)

µ=µ(θ)

and

Σ=Σ(θ)

. FSLM can thus sample from the posterior given

arbitrary feature subsets without estimating the surrogate likelihood qφfrom scratch.

We implement FSLM using python 3.8, building on top of the public sbi library [36]. All the code

is available at github.com/berenslab/fslm_repo. All computations were done on an internal cluster

running Intel(R) Xeon(R) Gold 6226R CPUs @ 2.90GHz.

2.4 Measures of posterior uncertainty

Assuming the NLE procedure converges and the density estimator is sufﬁciently ﬂexible, differences

in posterior uncertainty can be ascribed to differences in information content of features or noise in the

data. Given that that noise in the data remains constant, relative differences in posterior uncertainty

can be used to assess the contributions of individual summary features. Hence, we can identify

informative summary statistics by quantifying the change in uncertainty of ˆp(θ|x\i)with respect to

ˆp(θ|x). For this we use two metrics:

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Efcientidenticationofinformativefeaturesinsimulation-basedinferenceJonasBeckUniversityofTübingenjonas.beck@uni-tuebingen.deMichaelDeistlerUniversityofTübingenmichael.deistler@uni-tuebingen.deYvesBernaertsUniversityofTübingenyves.bernaerts@uni-tuebingen.deJakobH.MackeUniversityofTübingenMaxPlanckIn...

展开>> 收起<<

Efﬁcient identiﬁcation of informative features in simulation-based inference Jonas Beck.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Efﬁcient identiﬁcation of informative features in simulation-based inference Jonas Beck

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: