Deep Subspace Encoders for Nonlinear System Identification Gerben I. Beintemaa Maarten Schoukensa Roland Tothab aDepartment of Electrical Engineering Eindhoven University of Technology Eindhoven The Netherlands

2025-05-06 0 0 3.42MB 15 页 10玖币
侵权投诉
Deep Subspace Encoders for Nonlinear System Identification
Gerben I. Beintemaa, Maarten Schoukens a, Roland Toth a,b
aDepartment of Electrical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
bSystems and Control Laboratory, Institute for Computer Science and Control, Budapest, Hungary.
Abstract
Using Artificial Neural Networks (ANN) for nonlinear system identification has proven to be a promising approach, but despite
of all recent research efforts, many practical and theoretical problems still remain open. Specifically, noise handling and models,
issues of consistency and reliable estimation under minimisation of the prediction error are the most severe problems. The
latter comes with numerous practical challenges such as explosion of the computational cost in terms of the number of data
samples and the occurrence of instabilities during optimization. In this paper, we aim to overcome these issues by proposing
a method which uses a truncated prediction loss and a subspace encoder for state estimation. The truncated prediction loss
is computed by selecting multiple truncated subsections from the time series and computing the average prediction loss.
To obtain a computationally efficient estimation method that minimizes the truncated prediction loss, a subspace encoder
represented by an artificial neural network is introduced. This encoder aims to approximate the state reconstructability map
of the estimated model to provide an initial state for each truncated subsection given past inputs and outputs. By theoretical
analysis, we show that, under mild conditions, the proposed method is locally consistent, increases optimization stability, and
achieves increased data efficiency by allowing for overlap between the subsections. Lastly, we provide practical insights and
user guidelines employing a numerical example and state-of-the-art benchmark results.
Key words: System identification, Nonlinear state-space modeling, Subspace identification, Deep learning.
1 Introduction
While linear system identification offers both a strongly
developed theoretical framework and broadly applica-
ble computational tools, identification of nonlinear sys-
tems remains challenging. The wide range of nonlinear
behaviours that appear in engineering, reaching from
mechatronic systems to chemical and biological systems,
poses a challenge in developing generically applicable
model structures and identification methods [1]. Hence,
numerous nonlinear system identification methods have
been proposed over the last decades. Amongst the
most popular ones are, linear parameter-varying [2, 3],
Volterra [4,5], NAR(MA)X [6], block-oriented [7,8], and
nonlinear state-space [9–16] approaches.
In this paper, we consider the problem of identifying
Email addresses: g.i.beintema@tue.nl (Gerben I.
Beintema), m.schoukens@tue.nl (Maarten Schoukens),
r.toth@tue.nl (Roland Toth).
1Implementation of the proposed SUBNET method is avail-
able at https://github.com/GerbenBeintema/deepSI and
the implementation of the simulation study is available at
GerbenBeintema/encoder-automatica-experiments.
2The research was partly funded by the E¨otv¨os Lor´and
Research Network (grant number: SA-77/2021).
nonlinear systems using nonlinear state-space (NL-SS)
models since they can represent a broad range of dy-
namic behaviours and are well applicable for multiple-
input multiple-output (MIMO) systems [1]. However, es-
timation of NL-SS models is rather challenging as the
state-variables are often not measurable (hidden Markov
model) and the associated optimisation-based training
process is prone to local minima and model/gradient
instability [17]. Furthermore, the associated nonlinear
state-transition and output functions rapidly grow in
complexity with a growing number of states and inputs.
If these are parametrized as a linear combination of basis
functions, e.g., polynomials as in [10,18], then this often
leads to an explosion of parameters to be able to cap-
ture the system dynamics. Also, probabilistic methods
such as [11] can become computationally burdensome
with increasing numbers of states and inputs or train-
ing sequence lengths. Hence, an efficient representation
approach for the nonlinearities and a novel estimation
concept is required for NL-SS identification.
Deep learning and artificial neural networks (ANNs) are
uniquely suited to approach the NL-SS identification
challenges as they have been shown theoretically and
practically to be able to model complex data relations
while being computationally scalable to large datasets.
Preprint submitted to Automatica 6 July 2023
arXiv:2210.14816v2 [eess.SY] 5 Jul 2023
Although these benefits inspired the use of state-space
neural network models two decades ago [19], fully ex-
ploiting these properties in NL-SS identification without
major downsides is still an open problem. For instance,
careful initialization of the neural network weights and
biases partially mitigates the risk of local minima dur-
ing optimization, but requires additional information,
e.g., estimating of a linear approximate model of the
system [12]. Additionally, [20] has shown that multiple
shooting smooths the cost function, reducing the num-
ber of local minima and improving optimization stabil-
ity, which has given rise to the use of truncated simu-
lation error cost for ANN based NL-SS estimation [21].
However, the use of multiple shooting approaches comes
with the challenge of estimating a potentially large num-
ber of unknown initial states for each subsection, re-
sulting in a complexity increase of the optimisation. To
overcome this problem, auto-encoders have been investi-
gated to jointly estimate the model state and the under-
lying state-space functions using one-step-ahead predic-
tion cost [13]. However, these approaches fall short of giv-
ing accurate long-term predictions due to incorrect noise
handling, they need for tuning sensitive hyperparame-
ters in the composite auto-encoder/prediction-error loss
function, and they lack of consistency guarantees.
To overcome these challenges, this paper enhances the
subspace encoder-based method for identification of
state-space (SS) neural networks first introduced in [14]
with an innovation noise model and prove consistency
properties. The nonlinear SS model is parametrized
with ANNs for flexibility and efficiency in representing
the often complex and high-dimensional state-transition
and output functions. The model is estimated under
atruncated prediction loss, evaluated on short subsec-
tions. Similarly to multiple shooting, these subsections
further improve computational scalability and opti-
mization stability, thereby reducing the importance of
parameter initialization. The internal state at the start
of each subsection is obtained using a nonlinear subspace
encoder which approximates the reconstructability map
of the SS model and further improves computational
scalability and data efficiency. The state-transition and
output functions of the SS model and the encoder are
simultaneously estimated based on the aforementioned
truncated prediction loss function. Finally, batch opti-
mization and early stopping are employed to further
improve the performance of the proposed identification
scheme. We demonstrate that the resulting nonlinear
state-space identification method is robust w.r.t. model
and gradient instability during training, has a rela-
tively small number of hyperparameters, and obtains
state-of-the-art results on benchmark examples.
To summarize, our main contributions are
A novel ANN-based NL-SS identification algorithm
that even in the presence of innovation noise distur-
bances provides reliable and computationally efficient
data-driven modelling;
Efficient use of multiple-shooting based formulation
of the prediction loss via co-estimation of an encoder
function representing the reconstructability map of
the nonlinear model (computational efficiency);
Proving that the proposed estimator is consistent (sta-
tistical validity) and enhances smoothness of the costs
function (optimisation efficiency);
Guidelines for the choice of hyperparameters and a
detailed comparison of the proposed method to the
state-of-the-art on a widely used identification bench-
mark.
The paper is structured as follows: Section 2 introduces
the considered data-generating system and identifica-
tion problem. Section 3 discusses the proposed subspace
encoder method in detail and provides some user guide-
lines. We theoretically prove multiple key properties of
the proposed method in Section 4, and demonstrate
state-of-the-art performance of the method on a simula-
tion example and the Wiener–Hammerstein benchmark
in Sections 5-6, followed by the conclusions in Section 7.
2 Problem setting and preliminaries
2.1 Data-generating system
Consider a discrete-time system with innovation noise
that can be represented by the state-space description:
xk+1 =f(xk, uk, ek),(1a)
yk=h(xk) + ek,(1b)
where kZis the discrete-time, eis an i.i.d. white
noise process with finite variance ΣeRny×ny, and u
is a quasi-stationary input process independent of eand
taking values in Rnuat each time moment k. Addition-
ally, xand yare the state and output processes, tak-
ing values in Rnxand Rnyrespectively. The functions
f:Rnx×nu×nyRnxand h:RnxRny, i.e. the
state transition and output functions, are considered to
be bounded, deterministic maps. Without loss of gen-
erality we can assume that hdoes not contain a direct
feedthrough term. By assuming various structures for
fand h, many well-known noise structures can be ob-
tained such as nonlinear output noise (NOE), nonlinear
auto-regressive with exogenous input (NARX), nonlin-
ear auto-regressive with moving average exogenous input
(NARMAX) and nonlinear Box-Jenkins (NBJ) [22]. For
instance, if fdoes not depend on ek, then a NL-SS model
with an OE noise structure is obtained.
For a given sampled excitation sequence {uk}N
k=1 and
potentially unknown initial state x1Rnx, the obtained
response of the considered system (1) in terms of a sam-
ple path realisation is collected into an ordered input-
output (IO) data set DN={(uk, yk)}N
k=1 used for iden-
tification. To avoid unnecessary clutter, we will not use
different notation for random variables such as ykdefined
by (1) and their sampled values, but at places where con-
fusion might arise, we will specify which notion is used.
2
2.2 Identification problem
Based on the given data sequence DN, our objective is
to identify the dynamic relation (1), which boils down to
the estimation of fand h. Note that these functions can
not be estimated directly as xand eare not measured.
To accomplish our objective, notice that ek=ykh(xk)
based on (1), hence, by substitution, we get
xk+1 =f(xk, uk, ykh(xk)) = ˜
f(xk, uk, yk).(2)
Then, for n1, we can write
yk=h(xk) + ek,(3a)
yk+1 = (h˜
f)(xk, uk
k, yk
k) + ek+1,(3b)
.
.
.
yk+n= (hn˜
f)(xk, uk+n1
k, yk+n1
k) + ek+n,(3c)
where stands for function concatenation on the state
argument, nmeans n-times recursive repetition of
(e.g., h2˜
f=h˜
f˜
f), and uk+n1
k= [ u
k··· u
k+n1]
with yk+n1
ksimilarly defined. More compactly:
yk+n
k= Γn(xk, uk+n1
k, yk+n1
k) + ek+n
k.(4)
Note that the noise sequence ek+n
kis not available in
practice, hence, Eq. (4) cannot be directly used in esti-
mation. To overcome this problem, we can exploit the
i.i.d. white noise assumption on ekand calculate the ex-
pectation of (4) w.r.t. econditioned on the available past
data and the initial state xk:
ˆyk+n
k=Ee[yk+n
k|uk+n1
k, yk+n1
k, xk] =
Γn(xk, uk+n1
k, yk+n1
k),(5)
which is the so called one-step-ahead predictor associ-
ated with (1) and can be computed for the entire sample
path realisation in DN, i.e., ˆyN
1= ΓN(x1, uN1
1, yN1
1)
or, for a specific sample, as ˆyn=γn(x1, un1
1, yn1
1) with
γn= (hn˜
f). We can exploit (5) to define the estimator
by introducing a parametrized form ΓNof the predictor
in terms of fθ:Rnx×nu×nyRnxand hθ:RnxRny
defined by the parameters θΘRnθ. The classical
way to estimate the parameter vector θbased on a given
data set DNand ensure that fθand hθaccurately repre-
sent (1) is to minimize the 2loss of the prediction error
ˆek=ykˆykbetween the measured samples ykand the
predicted response ˆykby ΓN,θ :
Vpred
DN(θ) = 1
N
N
X
k=1 ykˆyk2
2,(6)
where the initial state x1is a parameter which is co-
estimated with θ. In case fθdoes not depend on ˆek, which
corresponds to an OE noise structure, then (6) is equal
to the well-known simulation error loss function.
The parametrized predictor ΓN, can also be written in
a state-space form
ˆxk+1 =fθ(ˆxk, uk,ˆek),(7a)
ˆyk=hθ(ˆxk),(7b)
where ˆxand ˆyare the predicted state and predicted out-
put taking values from Rnxand Rnyrespectively, while ˆe
is the prediction error. In fact, (7) qualifies as the model
structure used to estimate (1) through the minimization
of the identification criterion (6).
In the sequel, we will consider fθand hθto be multi-
layer artificial neural networks (ANNs), parametrized
in θ, where each hidden layer is composed from m
activation functions ϕ:RRin the form of
zi,j =ϕ(Pmi1
l=1 θw,i,j,lzi1,l +θb,i,j ) where zi=
col(zi,1, . . . , zi,mi) is the latent variable represent-
ing the output of layer 1 iq. Here, col() de-
notes composition of a column vector. For fθwith
qhidden-layers and linear input and output layers,
this means fθ(ˆxk, uk,ˆek) = θw,q+1zq(k) + θb,q+1 and
z0(k) = col(ˆxk, uk,ˆek). The parameters of the state
transition and output functions of (7) are collected in
θ. Furthermore, for the remainder of this paper we will
assume that fθand hθare Lipschitz continuous. Note
that assumption is not restrictive for commonly used
neural network structures since the activation functions
(ReLu, tanh, sigmoid, ...) used for ϕare Lipschitz con-
tinuous. Under these considerations, model structure
(7) represents a recurrent neural network and it is also
called state-space (SS) ANN in the literature [12, 19].
By using the ANNs fθand hθ, one can directly com-
pose the feedforward predictor network ΓNand at-
tempt to solve minimisation of (6) directly. However, this
blunt approach can meet with considerable difficulties.
In ANN-based identification, minimizing the simulation
error, which is a special case of (6) under an OE noise
structure, has been observed to result in accurate mod-
els [1], but its major shortcoming is that the computa-
tional cost scales at least linearly with N. Furthermore,
optimization of this cost function is sensitive to local
minima and gradient-based methods commonly display
unstable behaviour [20]. Hence, the problem that we aim
to solve in this paper is twofold: (i) achieve consistent
estimation of (1) under innovation noise conditions us-
ing the parametrized SS-ANN model (7) and one-step-
ahead prediction (6) and (ii) to provide a consistent es-
timator that drastically reduces the involved computa-
tional cost and ensures implementability.
3 The subspace encoder method
This section introduces the proposed subspace encoder
method that addresses many of the challenges encoun-
tered when using classical prediction or simulation error
identification approaches for nonlinear state-space mod-
els. The proposed approach builds on the introduction of
two main ingredients: a truncated prediction loss based
cost function and a subspace encoder which is linked to
the concept of state reconstructability.
3
摘要:

DeepSubspaceEncodersforNonlinearSystemIdentificationGerbenI.Beintema∗a,MaartenSchoukensa,RolandTotha,baDepartmentofElectricalEngineering,EindhovenUniversityofTechnology,Eindhoven,TheNetherlandsbSystemsandControlLaboratory,InstituteforComputerScienceandControl,Budapest,Hungary.AbstractUsingArtificial...

展开>> 收起<<
Deep Subspace Encoders for Nonlinear System Identification Gerben I. Beintemaa Maarten Schoukensa Roland Tothab aDepartment of Electrical Engineering Eindhoven University of Technology Eindhoven The Netherlands.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:15 页 大小:3.42MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注