AUTOCORRELATED MEASUREMENT PROCESSES AND INFERENCE FOR ORDINARY DIFFERENTIAL EQUATION MODELS OF BIOLOGICAL SYSTEMS

2025-05-02 0 0 1.89MB 33 页 10玖币

侵权投诉

AUTOCORRELATED MEASUREMENT PROCESSES AND

INFERENCE FOR ORDINARY DIFFERENTIAL EQUATION MODELS

OF BIOLOGICAL SYSTEMS

A PREPRINT

Ben Lambert

Department of Mathematics

University of Exeter

Exeter, UK

ben.c.lambert@gmail.com

Chon Lok Lei

Institute of Translational Medicine

Faculty of Health Sciences

University of Macau

Macau, China

chonloklei@um.edu.mo

Martin Robinson

Department of Computer Science

University of Oxford

martin.robinson@cs.ox.ac.uk

Michael Clerx

School of Mathematical Sciences

University of Nottingham

michael.clerx@nottingham.ac.uk

Richard Creswell

Department of Computer Science

University of Oxford

richard.creswell@hertford.ox.ac.uk

Sanmitra Ghosh

MRC Biostatistics Unit

University of Cambridge

sanmitra.ghosh@mrc-bsu.cam.ac.uk

Simon Tavener

Department of Mathematics

Colorado State University

tavener@math.colostate.edu

David J. Gavaghan

Department of Computer Science

University of Oxford

david.gavaghan@cs.ox.ac.uk

October 5, 2022

1 Abstract

Ordinary differential equation models are used to describe dynamic processes across biology. To perform likelihood-

based parameter inference on these models, it is necessary to specify a statistical process representing the contribution

of factors not explicitly included in the mathematical model. For this, independent Gaussian noise is commonly

chosen, with its use so widespread that researchers typically provide no explicit justification for this choice. This noise

model assumes ‘random’ latent factors affect the system in ephemeral fashion resulting in unsystematic deviation of

observables from their modelled counterparts. However, like the deterministically modelled parts of a system, these

latent factors can have persistent effects on observables. Here, we use experimental data from dynamical systems drawn

from cardiac physiology and electrochemistry to demonstrate that highly persistent differences between observations

and modelled quantities can occur. Considering the case when persistent noise arises due only to measurement

imperfections, we use the Fisher information matrix to quantify how uncertainty in parameter estimates is artificially

reduced when erroneously assuming independent noise. We present a workflow to diagnose persistent noise from model

fits and describe how to remodel accounting for correlated errors.

arXiv:2210.01592v1 [stat.ME] 4 Oct 2022

Autocorrelated measurement processes and inference for ODEs in biology A PREPRINT

2 Introduction

Ordinary differential equation (ODE) models are used throughout biology, typically to describe dynamic processes.

Amidst a huge range of applications, ODEs are used to describe the transmission dynamics of infectious diseases [1];

they can represent the dynamics of enzyme-catalysed reactions [2]; and can explain the formation of action potentials in

neurons [3]. In ODE models, the evolution of a system depends only on its current state and a set of input parameters,

which determine how individual components of the system interact. The parameters of ODE models in biological

systems are typically not directly measurable and must be inferred from data. In this paper, we consider the assumptions

underpinning inference of parameters from biological data.

A typical ODE model for modelling a dynamic process may be written:

dt =h(t, x, θ), t ∈(0, T ],

x(0; θ) = x0,

(1)

where

x(t;θ)∈Rn

is the state of the system,

θ∈Rm

are the parameters of the system,

denotes time,

h(t, x, θ)

can

be a function of time, state and parameters, and x0∈Rnis the initial state.

We suppose that an ODE model is proposed to explain a dataset:

{˜y(ti)}N

i=1

, where

˜y(ti)∈Rl

and

l≤n

. By fitting

the model to these data, an analyst hopes to recover estimates of the parameters,

, which incorporate uncertainty.

ODE models typically do not explain all variation within a dataset because they are approximations of the underlying

processes, meant only to capture the most dominate characteristics of variation. Particularly in biology, the measurement

of the system itself is also imperfect: measurement apparatus has a finite resolution and may provide indirect measures

of the quantity of interest, and human errors may also contribute noise to observations. Because of these factors, a

random error process is hypothesised to connect noisy observations with the ODE solution. This may be written:

y(ti) = g(x(ti)) + (ti),(2)

where

g:Rn→Rl

allows a measured quantity to be a function of the ODE solution. In eq.

(2)

(ti)

is a random

variable that represents both the effects of model misspecification and measurement noise.

The canonical assumption for the error terms is that they represent independent and identically distributed (IID) draws

from a normal distribution [4

–

9]:

(ti)IID

∼ N (0, σ)

, where

σ > 0

characterises the width of this distribution. The IID

normality assumption is so widespread that it is typically stated without justification.

The normality assumption may be justified on the basis of a central limit theorem if it is thought that a series of

independent or weakly dependent random variables – representing different characteristics of measurement and

misspecification processes – contribute additively to the overall errors; it may also be reasonable since the normal

distribution emerges from a disparate range of processes representing measurement imperfections [10, chapter 7]. But,

if there is strong correlation between these constituent parts, then a distribution with heavier tails, such as a Student-t

distribution or a Huber distribution is more appropriate [11].

An IID normal distribution can also be justified by invoking the principle of maximum entropy [10, 12]. This principle

roughly states that a probability distribution representing the outcomes of a process of interest should be chosen to

include as little possible information about a process subject to known constraints. If only the mean and variance of the

outcomes of a process are known, and there is thought to be zero correlation between errors, then it can be shown that

an IID normal distribution is the probability distribution that makes the fewest additional assumptions [10, chapter 7].

But it is unclear how applicable this is to the error distribution for ODEs, since we typically know only that the mean

of the error distribution is zero, and our empirical examples indicate that the independence assumption may be an

unreasonable null hypothesis. In particular, if there is thought to be autocorrelation in the noise, then a multivariate

normal over the errors is the distribution with maximum entropy.

There are two general causes of autocorrelation in the errors: misspecification of the model and poor temporal resolution

of the measurement process [12]. In Fig. 1A, we illustrate how misspecifying an ODE model can lead to autocorrelated

errors. This figure shows the outputs of two dynamic models as solid (model A) and dashed (model B) lines. We

suppose that there is no measurement noise and that the data (arrow tips) is generated by model A. In attempting to fit

these data, suppose model B is mistakenly chosen, and its best fitting line is as shown in this panel. There are manifold

ways in which a model can be misspecified: the assumed functional form governing interactions between variables

can be incorrect; important variables can be left out of the model entirely; a deterministic model may be used when

a stochastic one is more appropriate; and so on. In this example, any of these issues could conceivably result in the

differences between model A and model B, and, by choosing model B, this misspecification results in residuals (shown

as arrows) exhibiting positive autocorrelation.

Autocorrelated measurement processes and inference for ODEs in biology A PREPRINT

There is a huge literature devoted to accounting for model misspecification during inference (see, for example, [13

–

16]),

and this remains an active area of research. In this paper, however, we focus only on the impact of assumptions around

measurement noise, since, as we demonstrate, these can have dramatic effects on inference even in the absence of model

misspecification. To exemplify how measurement process imperfections can lead to autocorrelation, suppose again that

model A is the true model of nature, and that we (correctly) use it as part of our model of the data generating process.

Also, suppose that the measuring apparatus is imperfect, producing noisy observations that may differ from the true

underlying state, and has finite temporal resolution meaning it struggles to capture changes in output over shorter time

scales. In Figs. 1B&C, we show the model solutions (solid lines) and the values that would be measured if using a very

fine temporal gridding (dashed lines). A consequence of this smooth measurement process is that the more observations

per unit time are taken, the greater the degree of autocorrelation in residuals. In Fig. 1B, we show coarse observations

of the system of interest as indicated by the horizontal positioning of the vertical arrows. In this case, since observations

are sufficiently separated in time, there is relatively low persistence in residuals. In Fig. 1C, we take more observations

of the same process, which produces positively autocorrelated residuals.

Intuitively, when the measurement process is positively autocorrelated, each observation conveys less information

about the system than when the observations are uncorrelated. So misrepresenting an autocorrelated error process

with one assuming independence can lead to overly confident parameter estimates. This is a well-known result in

regression modelling [17], and, since fitting ODE models to data is just nonlinear regression, these results should also

apply to inference for these model types. We show this in the inset panels in Figs. 1B&C: here, the orange lines show

(illustrative) posterior distributions resultant from modelling the measurement process correctly; the green lines show

the distributions when modelling the measurements assuming independence amongst them. In Fig. 1B, where the

measurements are widely spaced, there is little difference in the recovered posteriors due to the limited autocorrelation.

In Fig. 1C, failure to account for autocorrelation results in a posterior with too little variance.

We originally became interested in the impact of measurement autocorrelation on parameter estimation when attempting

inference for a model of an electrochemistry experiment. Specifically, we noticed that the estimates obtained were

unrealistically precise when assuming an IID normal error model, and the errors were autocorrelated. This led us to

consider how this phenomena might be more generally applicable and whether there were guiding principles of how the

degree of overconfidence depends on measurement autocorrelation. Thus, in this paper, we explore how measurement

autocorrelation affects the precision of estimates. Previous work, in the context of modelling physical systems, has

derived straightforward expressions for parameter uncertainty for a dataset consisting only of two time points with an

accordingly simple error model [12]. Here, we consider a much more general setting where the models are nonlinear

ODEs, which is typical in biological systems analysis, and the measurement process can be any one of a wide class of

stochastic processes. We also account for the bias in the estimates of the standard deviation of the noise when fitting a

model assuming IID Gaussian errors, which is important to ensure correct estimates of the degree of overconfidence.

Using simulated data from ODE models, we demonstrate the validity of our analytical results. Using experimental data

from cardiac physiology and electrochemistry, we show that highly persistent differences between observations and

modelled quantities can occur. Whilst only illustrative, these results hint that overconfidence in parameter estimates

may not be uncommon. In addition, we provide a workflow for diagnosing and accounting for autocorrelated errors

when fitting an ODE model to data.

3 Effect of autocorrelated noise on parameter estimate uncertainty

In this section, we use mathematical analysis to evaluate the effect on parameter estimates of not accounting for

autocorrelation when present. To do so, we first calculate “true” parameter uncertainties obtained when specifying

a persistent error model. We then calculate “false” uncertainties obtained when assuming independent errors. To

derive these quantities, we calculate the Fisher Information matrix (FIM) in both circumstances. This analysis shows

that uncertainty in parameter estimates is understated when (falsely) assuming independent errors, with the degree of

overconfidence increasing along with the persistence of the true errors. We call the ratio of true parameter estimate

variance to that estimated assuming independent errors the “variance inflation ratio” (VIR).

3.1, we estimate the VIR for the mean parameter of a simple model with constant mean, when the actual error

process is persistent and described by an autoregressive order-one (AR(1)) process. Calculating the VIR for the constant

mean model is straightforward but provides a useful guide when examining more realistic cases. In

3.2, we consider

a nonlinear ODE model with AR(1) measurement noise. In

3.3, we explore the consequences of more ephemeral

autocorrelations by calculating the VIR for the constant mean model with moving average order-one (MA(1)) errors.

Realistic noise processes are likely, in fact, to be combinations of persistent and transient correlated noise, and in

3.4,

we give formulae for computation of VIRs in this, more general, case.

Autocorrelated measurement processes and inference for ODEs in biology A PREPRINT

A. ODE model misspecification

Model A

Model B

Time

B. Measurement persistence: coarse grid

Time

C. Measurement persistence: fine grid

Time

Autocorrelated

Independent

Figure 1:

Causes of autocorrelated noise.

Panel A. shows how using a logistic model when, in fact, a Gompertz

model is correct, results in autocorrelated noise; Panels B. and C. show how an imperfect measurement process can lead

to different characteristic residual noise processes: in panel B., the measurements are taken using a coarse grid; in panel

C., the measurements are taken using a fine grid. In both, residuals are depicted by black arrows. The inset plots show

representative posterior distributions under different assumptions about the measurement process.

3.1 Constant mean model

In what follows, we assume a time series framework where, at time

, observed data,

x(t)

, differs from its true constant

value, µ, by an additive random component,

x(t) = µ+(t),(3)

where (t)is a zero-mean error random process such that E[x(t)] = µ.

There are a number of ways that measurement errors may be autocorrelated, and, in this paper, we consider a range.

To begin, we consider AR(1) errors, in which there are persistent deviations between the observations and the true

values of a process. This could occur, for instance, if a measurement apparatus responds slowly to changes in a system,

meaning observations taken closer together are likely to be correlated due to measurement imperfections. An AR(1)

process can be represented mathematically by:

(t) = ρ(t−1) + ν(t),(4)

Autocorrelated measurement processes and inference for ODEs in biology A PREPRINT

where

ν(t)IID

∼ N (0, σ)

, and

−1< ρ < 1

characterises the degree of autocorrelation: positive values indicating positive

autocorrelation; and similarly so for negative values.

We first derive the true (asymptotic) variance of the maximum likelihood estimator of

when assuming an AR(1) error

process in accordance with the true generating process. To do so, we use the log-likelihood to determine the diagonal

element of the FIM corresponding to

when we assume

is known. To write down the log-likelihood, we require an

expression for

ν(t)

in terms of the observables and parameters of the system, which can be obtained by multiplying

x(t−1)

given by eq. (3) by

and subtracting it from

x(t)

, resulting in:

ν(t) = x(t)−ρx(t−1) −µ(1 −ρ)

Since

ν(t)

is distributed as an independent Gaussian, the log-likelihood of the model for a sample of observations

x(t) : ∀t∈[0,1,2, ..., T ]is given by,

L=−T

2log 2π−T

2log σ2−1

2σ2

t=1

(x(t)−ρx(t−1) −µ(1 −ρ))2.(5)

Where, for simplicity, we have assumed that

ν(0) = 0

is fixed and known –

4.3 describes an alternative likelihood that

does not make this assumption.

The second derivative of eq. (5) with respect to µyields the relevant diagonal element of the FIM,

Iµ,µ =−E∂2L

∂µ2=T(1 −ρ)2

σ2.(6)

The Cram

er-Rao lower bound (CRLB) is the asymptotic variance of the maximum likelihood estimator of

. Because

the off-diagonal elements of the FIM are zero, the CRLB is then given by the reciprocal of the RHS of eq. (6),

var(ˆµ) = σ2

T(1 −ρ)2.(7)

We next derive the variance of the maximum likelihood estimator of µwhen incorrectly assuming independent errors:

(t)IID

∼ N (0, σ0)

. Under this false model, eq.

(7)

indicates that the variance of maximum likelihood estimators is given

by,

var(˜µ) = σ02

T.(8)

To meaningfully compare

var(˜µ)

with

var(ˆµ)

, it is necessary to compare estimates of

σ0

, the standard deviation of noise

for the false error model, with

, the standard deviation of

ν(t)

in eq. (4). To do so, we first compute the variance of the

(true) AR(1) errors. This can be done by taking the variance of both sides of eq. (4),

var((t)) = ρ2var((t−1)) + var(ν(t)).(9)

Assuming the error process has a constant variance, eq. (9) can be rearranged to yield:

var((t)) = σ2

1−ρ2.(10)

The false error model variance will broadly match the true process variance (otherwise there would be a mismatch

between the width of the true and estimated error process) meaning

σ02≈σ2/(1 −ρ2)

. Substituting this expression

into eq. (8) and comparing with eq. (7), we see that true model parameter uncertainty exceeds that obtained from the

false model, whenever,

σ2

T(1 −ρ)2>σ2

T(1 −ρ2),(11)

which is true when

0<ρ<1

. The VIR is given by the ratio of the true error uncertainty to that estimated under the

false model,

VIR(ρ) = 1 + ρ

1−ρ

= 1 + 2ρ

1−ρ,

(12)

which is monotonically-increasing with

throughout

0<ρ<1

(see Figure 2A), and

limρ→1VIR(ρ) = ∞

. Intuitively,

as autocorrelation increases, each sample conveys less information about the underlying process, and parameter

estimates have higher variance. Mischaracterising data as independent, therefore, leads to overly precise estimates.

In our experience, and through the results we present in

5, positive autocorrelation (where

ρ > 0

) seems to more

commonly occur in systems. If negative autocorrelation does, however, occur, eq.

(12)

indicates that assuming

independent noise will produce estimators with inflated variance, and, hence, VIR <1(see Figure 2A).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AUTOCORRELATEDMEASUREMENTPROCESSESANDINFERENCEFORORDINARYDIFFERENTIALEQUATIONMODELSOFBIOLOGICALSYSTEMSAPREPRINTBenLambertDepartmentofMathematicsUniversityofExeterExeter,UKben.c.lambert@gmail.comChonLokLeiInstituteofTranslationalMedicineFacultyofHealthSciencesUniversityofMacauMacau,Chinachonloklei@um...

展开>> 收起<<

AUTOCORRELATED MEASUREMENT PROCESSES AND INFERENCE FOR ORDINARY DIFFERENTIAL EQUATION MODELS OF BIOLOGICAL SYSTEMS.pdf

共33页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

AUTOCORRELATED MEASUREMENT PROCESSES AND INFERENCE FOR ORDINARY DIFFERENTIAL EQUATION MODELS OF BIOLOGICAL SYSTEMS

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: