Nonlinear System Identiﬁcation Learning while respecting physical models using a sequential Monte Carlo method

2025-05-02 1 0 1.43MB 52 页 10玖币

侵权投诉

Nonlinear System Identiﬁcation

Learning while respecting physical models

using a sequential Monte Carlo method

Anna Wigren, Johan Wågberg, Fredrik Lindsten, Adrian G. Wills, Thomas B. Schön

Please cite this version:

Anna Wigren, Johan Wågberg, Fredrik Lindsten, Adrian G. Wills, Thomas B. Schön. “Non-

linear System Identiﬁcation: Learning While Respecting Physical Models Using a Sequential

75–102

@article { Wigren2022 ,

author ={ W ig ren , Anna and W {\ aa } gberg , J oh an and Lin dst en ,

Fre drik and Wills , Adrian G . and Sch {\" o}n , Thomas B .} ,

jou rnal ={ IEEE Co n trol Systems Magazine },

title ={ Nonl i n ear System Ident i ficati o n : Learning While

Respecting Physical Models Using a Sequenti a l Monte Carlo

Method},

year={2022},

volume ={42} ,

number ={1} ,

pages ={75 -102} ,

doi ={ htt ps :// doi . org /1 0.1109 / MCS . 2021.3122269} ,

}

A note on the structure of the article:

The published version of this article consists of a main text that provides the essential content

and multiple sidebars with additional information, either in the form of examples or a back-

ground with further technical details. The same structure has been adopted in this version of

the article. Sidebars are indicated by grey boxes and are referenced from the main text using

double quotation marks, i.e. “Background: Markov chain Monte Carlo” refers to the sidebar

on Markov chain Monte Carlo. The sidebars are placed at the end of the section where they

are ﬁrst referenced.

arXiv:2210.14684v1 [stat.CO] 26 Oct 2022

Nonlinear System Identiﬁcation

Learning while respecting physical models

using a sequential Monte Carlo method

Anna Wigren∗1, Johan Wågberg†2, Fredrik Lindsten‡3, Adrian Wills§4, and Thomas

B. Schön¶5

1,2,5Department of Information Technology, Uppsala University

3Department of Computer and Information Science, Linköping University

4School of Engineering, University of Newcastle

Abstract

Identiﬁcation of nonlinear systems is a challenging problem. Physical knowledge of

the system can be used in the identiﬁcation process to signiﬁcantly improve the predictive

performance by restricting the space of possible mappings from the input to the output.

Typically, the physical models contain unknown parameters that must be learned from

data. Classical methods often restrict the possible models or have to resort to approxima-

tions of the model that introduce biases. Sequential Monte Carlo methods enable learning

without introducing any bias for a more general class of models. In addition, they can also

be used to approximate a posterior distribution of the model parameters in a Bayesian

setting. This article provides a general introduction to sequential Monte Carlo and shows

how it naturally ﬁts in system identiﬁcation by giving examples of speciﬁc algorithms.

The methods are illustrated on two systems: a system with two cascaded water tanks

with possible overﬂow in both tanks and a compartmental model for the spreading of a

disease.

1 Introduction

The modern world contains an immense number of diﬀerent and interacting systems, from the

evolution of weather systems to variations in the stock market, autonomous vehicles interacting

with their environment and the spread of diseases. For society to function, it is essential to

understand the behavior of the world, so that informed decisions can be made that are based on

likely future outcomes. For instance, consider the spread of a new disease like the coronavirus.

It is of great importance to be able to predict the number of people that will be infected

at diﬀerent points in time to ensure that appropriate healthcare facilities are available. It is

∗anna.wigren@it.uu.se

†johan.wagberg@it.uu.se

‡fredrik.lindsten@liu.se

§adrian.wills@newcastle.edu.au

¶thomas.schon@it.uu.se

also of interest to be able to make decisions based on accurate information to best attenuate

the spread of disease. Moreover, understanding speciﬁc attributes of the disease, such as the

incubation time, the number of unreported cases, and how certain we are about this knowledge

are also crucial.

These types of applications are examples of so-called dynamic systems, which are the focus

of this article. Dynamic systems have the property that the future system response depends on

the past system response [1]. Capturing these types of dynamic phenomena can be achieved

using mathematical models, which oﬀer a concrete mechanism for making predictions and

supporting decisions. The extreme ﬂexibility and versatility of mathematics aﬀords modeling

of highly disparate dynamic behavior. However, it also creates a challenge, since it is not

always obvious how to choose an appropriate model. This diversity is perhaps best illustrated

by contrasting examples.

Consider the modeling of rigid-body vehicle dynamics, such as the motion of a car or a

plane. In this case, it is possible to exploit prior knowledge of the system and adopt a classical

mechanics approach to derive Newton-Euler equations of motion for each application [2]. The

mathematical model structure is largely determined by knowledge of the physical system, and

the model will depend on certain parameters such as mass and inertia terms, and damping and

friction coeﬃcients. In many cases, these parameter values can be diﬃcult to obtain based on

ﬁrst principles approaches alone. It is important to also note that some parameters may have

feasible ranges, such as mass terms being nonnegative, which is also a form of prior knowledge.

Contrasting this type of model, it is also possible to employ highly ﬂexible and general

model structures to describe dynamic systems, such as deep neural networks (DNNs) or Gaus-

sian processes (GPs) [3, 4, 5]. The ﬂexibility of the DNN model class stems from the general

construction of the model, which involves potentially many layers of interacting nonlinear func-

tions. Importantly, these interactions are allowed to adapt for each new application, since they

rely on coeﬃcients/parameters that are free to change values. In the case of GP, the model

structure is also highly ﬂexible, nonparametric and adapted based on available data. For ei-

ther of these ﬂexible model classes, it is more challenging to impose prior system knowledge.

However, some progress is being made along these lines [6, 7].

Irrespective of the type of model, there are unknown quantities that must be determined,

which are often inferred from observations from the system that is being modeled. There are

many diﬀerent approaches for extracting or estimating these unknown values from observed

system data [8, 9, 10, 11, 12]. Among the many possibilities, this article concentrates on two

commonly used and complimentary approaches. In particular, the presented inference methods

are grouped according to two main attributes: 1) the assumptions made about how to model

unknown parameters, and, 2) what should be estimated in addition to the parameters.

More speciﬁcally, if the model parameters are assumed to be deterministic variables, then

this results in a frequentist inference perspective, where the so-called maximum likelihood

(ML) approach has proven to be highly successful in providing accurate point estimates of

the parameters [9, 10]. Alternately, if uncertainty about the parameter values is incorporated

by treating them as random variables, then this results in the so-called Bayesian perspective,

where the posterior distribution of the parameters is the object of interest [11]. An attractive

attribute of the Bayesian approach is that it provides quantiﬁcation of uncertainty, which is

essential when making decisions based on the associated models. Otherwise, decisions may be

executed based on misplaced conﬁdence. It is also worth mentioning that there is a connection

between these two approaches by considering so-called maximum a posteriori methods [13].

Regardless of adopting the frequentist or Bayesian perspective, it is rare that the estimates

can be provided analytically. This article provides computational tools for calculating these

estimates in the remaining cases where analytical solutions are not available. Towards comput-

ing them, it is essential to both the frequentist and Bayesian approaches that certain integrals

can be evaluated. While the details will be explained in subsequent sections, it suﬃces for

now to mention that computing these integrals is generally intractable [14].

An overarching theme of this article is to approximate intractable integrals by employing

carefully tailored Monte-Carlo integration techniques that result in tractable weighted sums.

More precisely, the sequential nature of the dynamic models lends itself to the so-called se-

quential Monte Carlo (SMC) methods [15, 16], which will be explored in much more detail as

the article progresses. Furthermore, these SMC methods are employed both within frequentist

and Bayesian approaches, resulting in algorithms that are applicable to a wide range of mod-

eling problems. An attractive property of the SMC methods is that they also oﬀer asymptotic

convergence guarantees, which are not oﬀered by other approximation methods in general [14].

These SMC techniques are also highly suitable to the situation where prior knowledge of

the system is available, such as knowledge of the physical system, model structure, and possi-

bly feasible ranges for unknown parameter values. The main beneﬁt of using SMC techniques

is that they are applicable to general nonlinear systems, without modiﬁcation of the prior

knowledge or assumptions. This allows the separation of modeling from the inference method,

which provides the modeler more freedom in adding domain-speciﬁc prior knowledge. In con-

trast, many alternative approaches require—either explicitly or implicitly—that the problem

satisﬁes certain restrictive assumptions, such as Gaussian noise corruption. The aim of the

article is to present computational tools for estimating general nonlinear dynamic systems,

while adhering to prior system knowledge without modiﬁcation.

The article ﬁrst presents the type of models considered and provides two examples that

illustrate how physical insight about the system can be transformed to a mathematical model

suited for statistical inference. These two examples are used throughout to illustrate the vari-

ous methods. Following this, the identiﬁcation problem is introduced, and the key expressions

needed for learning are highlighted. This leads to an introduction of SMC speciﬁcally targeted

for oﬄine system identiﬁcation. The remainder of the article presents identiﬁcation algorithms

where SMC plays an integral part. Both optimization-based learning methods and probabilis-

tic methods where posterior distributions are computed are considered and applied to the

example models. The article also gives a short introduction to probabilistic programming, a

tool that can signiﬁcantly reduce the complexity of trying out diﬀerent models and inference

methods.

2 Modeling

Mathematical modeling is applicable to a wide range of problems spanning many areas of

science and engineering. As such, it is important to restrict attention to the particular model

class of interest to this article, namely, discrete-time state-space models for dynamic systems.

These types of models have a long and fruitful history in the ﬁelds of physics and engineering,

originating in the phase-space ideas from physics [17]. The essential idea is that the dynamic

behavior of the model is determined by the current state of the model, which is a vector

belonging to a so-called state space. It is important to mention that the states should be

associated with the model, rather than the real-world phenomena. The latter has no particular

concern for states or any other modeling choices, including the model structure and associated

parameters.

It is essential to connect observations from the real-world phenomena to the state-space

model, since this is the primary purpose of modeling, and so that the model can be adapted

to best match observations. These ideas are made more concrete in the subsequent section,

which introduces the state-space model of interest in this article, and presents two concrete

examples to illustrate this modeling approach.

2.1 Probabilistic formulation of the state-space model

To make the modeling ideas discussed above concrete, it is necessary to introduce some nota-

tion. To that end, the model state is denoted xt, where the subscript tindicates the current

discrete time instant. Observations from the system are denoted yt, and inputs to the system

are denoted ut. It is typical to express the connection between model and observations via the

state-space equations

xt=ft(xt−1, ut, wt, θ),(1a)

yt=gt(xt, ut, et, θ).(1b)

In the above, the function ftexplains how the state evolves over time, and gtrelates the model

state to the system observations. The parameter vector θallows the functions to depend on

some possibly unknown parameters, and wtand etare noise terms to account for uncertainty.

As an example, the functional form of a linear-Gaussian state-space model is

xt=Axt−1+But+wt, wt∼ N (wt|0, Q),(2a)

yt=Cxt+Dut+et, et∼ N (et|0, R),(2b)

where Ais a transition matrix, Bis an input matrix, Cis an observation matrix, Dis a

feedforward matrix, and wtand etare independent and identically distributed Gaussian noise

with zero mean and covariance matrices Qand R, respectively. The unknown parameters of

this model are the transition matrix A, the input matrix B, the observation matrix C, the

feedforward matrix D, and the covariance matrices Qand R. The notation N(z|µ, Σ) is used

to denote a multivariate Gaussian distribution with mean µand covariance matrix Σfor the

variable z.

This article uses a more general, probabilistic, form of the state-space model, where the

essential idea remains the same: The state holds the information required to determine the

state evolution. The main diﬀerence is the manner in which this is expressed. For probabilistic

state-space models, the time evolution and measurement relationships are captured via the

conditional probability distributions

xt∼p(xt|xt−1, ut, θ),(3a)

yt∼p(yt|xt, ut, θ),(3b)

with transition density p(xt|xt−1, ut, θ)and observation density p(yt|xt, ut, θ), parameterized

by an unknown parameter θ. A probabilistic state-space model can, equivalently, be rep-

resented graphically as a probabilistic graphical model. Figure 1 illustrates the graphical

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NonlinearSystemIdenticationLearningwhilerespectingphysicalmodelsusingasequentialMonteCarlomethodAnnaWigren,JohanWågberg,FredrikLindsten,AdrianG.Wills,ThomasB.SchönPleasecitethisversion:AnnaWigren,JohanWågberg,FredrikLindsten,AdrianG.Wills,ThomasB.Schön.Non-linearSystemIdentication:LearningWhileRe...

展开>> 收起<<

Nonlinear System Identiﬁcation Learning while respecting physical models using a sequential Monte Carlo method.pdf

共52页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Nonlinear System Identiﬁcation Learning while respecting physical models using a sequential Monte Carlo method

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: