Nonlinear System Identification Learning while respecting physical models using a sequential Monte Carlo method

2025-05-02 0 0 1.43MB 52 页 10玖币
侵权投诉
Nonlinear System Identification
Learning while respecting physical models
using a sequential Monte Carlo method
Anna Wigren, Johan Wågberg, Fredrik Lindsten, Adrian G. Wills, Thomas B. Schön
Please cite this version:
Anna Wigren, Johan Wågberg, Fredrik Lindsten, Adrian G. Wills, Thomas B. Schön. “Non-
linear System Identification: Learning While Respecting Physical Models Using a Sequential
Monte Carlo Method.” In: IEEE Control Systems Magazine 42.1 (2022). ©2022 IEEE, pp.
75–102
@article { Wigren2022 ,
author ={ W ig ren , Anna and W {\ aa } gberg , J oh an and Lin dst en ,
Fre drik and Wills , Adrian G . and Sch {\" o}n , Thomas B .} ,
jou rnal ={ IEEE Co n trol Systems Magazine },
title ={ Nonl i n ear System Ident i ficati o n : Learning While
Respecting Physical Models Using a Sequenti a l Monte Carlo
Method},
year={2022},
volume ={42} ,
number ={1} ,
pages ={75 -102} ,
doi ={ htt ps :// doi . org /1 0.1109 / MCS . 2021.3122269} ,
}
A note on the structure of the article:
The published version of this article consists of a main text that provides the essential content
and multiple sidebars with additional information, either in the form of examples or a back-
ground with further technical details. The same structure has been adopted in this version of
the article. Sidebars are indicated by grey boxes and are referenced from the main text using
double quotation marks, i.e. “Background: Markov chain Monte Carlo” refers to the sidebar
on Markov chain Monte Carlo. The sidebars are placed at the end of the section where they
are first referenced.
arXiv:2210.14684v1 [stat.CO] 26 Oct 2022
Nonlinear System Identification
Learning while respecting physical models
using a sequential Monte Carlo method
Anna Wigren1, Johan Wågberg2, Fredrik Lindsten3, Adrian Wills§4, and Thomas
B. Schön5
1,2,5Department of Information Technology, Uppsala University
3Department of Computer and Information Science, Linköping University
4School of Engineering, University of Newcastle
Abstract
Identification of nonlinear systems is a challenging problem. Physical knowledge of
the system can be used in the identification process to significantly improve the predictive
performance by restricting the space of possible mappings from the input to the output.
Typically, the physical models contain unknown parameters that must be learned from
data. Classical methods often restrict the possible models or have to resort to approxima-
tions of the model that introduce biases. Sequential Monte Carlo methods enable learning
without introducing any bias for a more general class of models. In addition, they can also
be used to approximate a posterior distribution of the model parameters in a Bayesian
setting. This article provides a general introduction to sequential Monte Carlo and shows
how it naturally fits in system identification by giving examples of specific algorithms.
The methods are illustrated on two systems: a system with two cascaded water tanks
with possible overflow in both tanks and a compartmental model for the spreading of a
disease.
1 Introduction
The modern world contains an immense number of different and interacting systems, from the
evolution of weather systems to variations in the stock market, autonomous vehicles interacting
with their environment and the spread of diseases. For society to function, it is essential to
understand the behavior of the world, so that informed decisions can be made that are based on
likely future outcomes. For instance, consider the spread of a new disease like the coronavirus.
It is of great importance to be able to predict the number of people that will be infected
at different points in time to ensure that appropriate healthcare facilities are available. It is
anna.wigren@it.uu.se
johan.wagberg@it.uu.se
fredrik.lindsten@liu.se
§adrian.wills@newcastle.edu.au
thomas.schon@it.uu.se
1
also of interest to be able to make decisions based on accurate information to best attenuate
the spread of disease. Moreover, understanding specific attributes of the disease, such as the
incubation time, the number of unreported cases, and how certain we are about this knowledge
are also crucial.
These types of applications are examples of so-called dynamic systems, which are the focus
of this article. Dynamic systems have the property that the future system response depends on
the past system response [1]. Capturing these types of dynamic phenomena can be achieved
using mathematical models, which offer a concrete mechanism for making predictions and
supporting decisions. The extreme flexibility and versatility of mathematics affords modeling
of highly disparate dynamic behavior. However, it also creates a challenge, since it is not
always obvious how to choose an appropriate model. This diversity is perhaps best illustrated
by contrasting examples.
Consider the modeling of rigid-body vehicle dynamics, such as the motion of a car or a
plane. In this case, it is possible to exploit prior knowledge of the system and adopt a classical
mechanics approach to derive Newton-Euler equations of motion for each application [2]. The
mathematical model structure is largely determined by knowledge of the physical system, and
the model will depend on certain parameters such as mass and inertia terms, and damping and
friction coefficients. In many cases, these parameter values can be difficult to obtain based on
first principles approaches alone. It is important to also note that some parameters may have
feasible ranges, such as mass terms being nonnegative, which is also a form of prior knowledge.
Contrasting this type of model, it is also possible to employ highly flexible and general
model structures to describe dynamic systems, such as deep neural networks (DNNs) or Gaus-
sian processes (GPs) [3, 4, 5]. The flexibility of the DNN model class stems from the general
construction of the model, which involves potentially many layers of interacting nonlinear func-
tions. Importantly, these interactions are allowed to adapt for each new application, since they
rely on coefficients/parameters that are free to change values. In the case of GP, the model
structure is also highly flexible, nonparametric and adapted based on available data. For ei-
ther of these flexible model classes, it is more challenging to impose prior system knowledge.
However, some progress is being made along these lines [6, 7].
Irrespective of the type of model, there are unknown quantities that must be determined,
which are often inferred from observations from the system that is being modeled. There are
many different approaches for extracting or estimating these unknown values from observed
system data [8, 9, 10, 11, 12]. Among the many possibilities, this article concentrates on two
commonly used and complimentary approaches. In particular, the presented inference methods
are grouped according to two main attributes: 1) the assumptions made about how to model
unknown parameters, and, 2) what should be estimated in addition to the parameters.
More specifically, if the model parameters are assumed to be deterministic variables, then
this results in a frequentist inference perspective, where the so-called maximum likelihood
(ML) approach has proven to be highly successful in providing accurate point estimates of
the parameters [9, 10]. Alternately, if uncertainty about the parameter values is incorporated
by treating them as random variables, then this results in the so-called Bayesian perspective,
where the posterior distribution of the parameters is the object of interest [11]. An attractive
attribute of the Bayesian approach is that it provides quantification of uncertainty, which is
essential when making decisions based on the associated models. Otherwise, decisions may be
executed based on misplaced confidence. It is also worth mentioning that there is a connection
between these two approaches by considering so-called maximum a posteriori methods [13].
2
Regardless of adopting the frequentist or Bayesian perspective, it is rare that the estimates
can be provided analytically. This article provides computational tools for calculating these
estimates in the remaining cases where analytical solutions are not available. Towards comput-
ing them, it is essential to both the frequentist and Bayesian approaches that certain integrals
can be evaluated. While the details will be explained in subsequent sections, it suffices for
now to mention that computing these integrals is generally intractable [14].
An overarching theme of this article is to approximate intractable integrals by employing
carefully tailored Monte-Carlo integration techniques that result in tractable weighted sums.
More precisely, the sequential nature of the dynamic models lends itself to the so-called se-
quential Monte Carlo (SMC) methods [15, 16], which will be explored in much more detail as
the article progresses. Furthermore, these SMC methods are employed both within frequentist
and Bayesian approaches, resulting in algorithms that are applicable to a wide range of mod-
eling problems. An attractive property of the SMC methods is that they also offer asymptotic
convergence guarantees, which are not offered by other approximation methods in general [14].
These SMC techniques are also highly suitable to the situation where prior knowledge of
the system is available, such as knowledge of the physical system, model structure, and possi-
bly feasible ranges for unknown parameter values. The main benefit of using SMC techniques
is that they are applicable to general nonlinear systems, without modification of the prior
knowledge or assumptions. This allows the separation of modeling from the inference method,
which provides the modeler more freedom in adding domain-specific prior knowledge. In con-
trast, many alternative approaches require—either explicitly or implicitly—that the problem
satisfies certain restrictive assumptions, such as Gaussian noise corruption. The aim of the
article is to present computational tools for estimating general nonlinear dynamic systems,
while adhering to prior system knowledge without modification.
The article first presents the type of models considered and provides two examples that
illustrate how physical insight about the system can be transformed to a mathematical model
suited for statistical inference. These two examples are used throughout to illustrate the vari-
ous methods. Following this, the identification problem is introduced, and the key expressions
needed for learning are highlighted. This leads to an introduction of SMC specifically targeted
for offline system identification. The remainder of the article presents identification algorithms
where SMC plays an integral part. Both optimization-based learning methods and probabilis-
tic methods where posterior distributions are computed are considered and applied to the
example models. The article also gives a short introduction to probabilistic programming, a
tool that can significantly reduce the complexity of trying out different models and inference
methods.
2 Modeling
Mathematical modeling is applicable to a wide range of problems spanning many areas of
science and engineering. As such, it is important to restrict attention to the particular model
class of interest to this article, namely, discrete-time state-space models for dynamic systems.
These types of models have a long and fruitful history in the fields of physics and engineering,
originating in the phase-space ideas from physics [17]. The essential idea is that the dynamic
behavior of the model is determined by the current state of the model, which is a vector
belonging to a so-called state space. It is important to mention that the states should be
3
associated with the model, rather than the real-world phenomena. The latter has no particular
concern for states or any other modeling choices, including the model structure and associated
parameters.
It is essential to connect observations from the real-world phenomena to the state-space
model, since this is the primary purpose of modeling, and so that the model can be adapted
to best match observations. These ideas are made more concrete in the subsequent section,
which introduces the state-space model of interest in this article, and presents two concrete
examples to illustrate this modeling approach.
2.1 Probabilistic formulation of the state-space model
To make the modeling ideas discussed above concrete, it is necessary to introduce some nota-
tion. To that end, the model state is denoted xt, where the subscript tindicates the current
discrete time instant. Observations from the system are denoted yt, and inputs to the system
are denoted ut. It is typical to express the connection between model and observations via the
state-space equations
xt=ft(xt1, ut, wt, θ),(1a)
yt=gt(xt, ut, et, θ).(1b)
In the above, the function ftexplains how the state evolves over time, and gtrelates the model
state to the system observations. The parameter vector θallows the functions to depend on
some possibly unknown parameters, and wtand etare noise terms to account for uncertainty.
As an example, the functional form of a linear-Gaussian state-space model is
xt=Axt1+But+wt, wt∼ N (wt|0, Q),(2a)
yt=Cxt+Dut+et, et∼ N (et|0, R),(2b)
where Ais a transition matrix, Bis an input matrix, Cis an observation matrix, Dis a
feedforward matrix, and wtand etare independent and identically distributed Gaussian noise
with zero mean and covariance matrices Qand R, respectively. The unknown parameters of
this model are the transition matrix A, the input matrix B, the observation matrix C, the
feedforward matrix D, and the covariance matrices Qand R. The notation N(z|µ, Σ) is used
to denote a multivariate Gaussian distribution with mean µand covariance matrix Σfor the
variable z.
This article uses a more general, probabilistic, form of the state-space model, where the
essential idea remains the same: The state holds the information required to determine the
state evolution. The main difference is the manner in which this is expressed. For probabilistic
state-space models, the time evolution and measurement relationships are captured via the
conditional probability distributions
xtp(xt|xt1, ut, θ),(3a)
ytp(yt|xt, ut, θ),(3b)
with transition density p(xt|xt1, ut, θ)and observation density p(yt|xt, ut, θ), parameterized
by an unknown parameter θ. A probabilistic state-space model can, equivalently, be rep-
resented graphically as a probabilistic graphical model. Figure 1 illustrates the graphical
4
摘要:

NonlinearSystemIdenticationLearningwhilerespectingphysicalmodelsusingasequentialMonteCarlomethodAnnaWigren,JohanWågberg,FredrikLindsten,AdrianG.Wills,ThomasB.SchönPleasecitethisversion:AnnaWigren,JohanWågberg,FredrikLindsten,AdrianG.Wills,ThomasB.Schön.Non-linearSystemIdentication:LearningWhileRe...

展开>> 收起<<
Nonlinear System Identification Learning while respecting physical models using a sequential Monte Carlo method.pdf

共52页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:52 页 大小:1.43MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 52
客服
关注