Deep Subspace Encoders for Nonlinear System Identification Gerben I. Beintemaa Maarten Schoukensa Roland Tothab aDepartment of Electrical Engineering Eindhoven University of Technology Eindhoven The Netherlands

2025-05-06 0 0 3.42MB 15 页 10玖币

侵权投诉

Deep Subspace Encoders for Nonlinear System Identiﬁcation

Gerben I. Beintema∗a, Maarten Schoukens a, Roland Toth a,b

aDepartment of Electrical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands

bSystems and Control Laboratory, Institute for Computer Science and Control, Budapest, Hungary.

Abstract

Using Artiﬁcial Neural Networks (ANN) for nonlinear system identiﬁcation has proven to be a promising approach, but despite

of all recent research eﬀorts, many practical and theoretical problems still remain open. Speciﬁcally, noise handling and models,

issues of consistency and reliable estimation under minimisation of the prediction error are the most severe problems. The

latter comes with numerous practical challenges such as explosion of the computational cost in terms of the number of data

samples and the occurrence of instabilities during optimization. In this paper, we aim to overcome these issues by proposing

a method which uses a truncated prediction loss and a subspace encoder for state estimation. The truncated prediction loss

is computed by selecting multiple truncated subsections from the time series and computing the average prediction loss.

To obtain a computationally eﬃcient estimation method that minimizes the truncated prediction loss, a subspace encoder

represented by an artiﬁcial neural network is introduced. This encoder aims to approximate the state reconstructability map

of the estimated model to provide an initial state for each truncated subsection given past inputs and outputs. By theoretical

analysis, we show that, under mild conditions, the proposed method is locally consistent, increases optimization stability, and

achieves increased data eﬃciency by allowing for overlap between the subsections. Lastly, we provide practical insights and

user guidelines employing a numerical example and state-of-the-art benchmark results.

Key words: System identiﬁcation, Nonlinear state-space modeling, Subspace identiﬁcation, Deep learning.

1 Introduction

While linear system identiﬁcation oﬀers both a strongly

developed theoretical framework and broadly applica-

ble computational tools, identiﬁcation of nonlinear sys-

tems remains challenging. The wide range of nonlinear

behaviours that appear in engineering, reaching from

mechatronic systems to chemical and biological systems,

poses a challenge in developing generically applicable

model structures and identiﬁcation methods [1]. Hence,

numerous nonlinear system identiﬁcation methods have

been proposed over the last decades. Amongst the

most popular ones are, linear parameter-varying [2, 3],

Volterra [4,5], NAR(MA)X [6], block-oriented [7,8], and

nonlinear state-space [9–16] approaches.

In this paper, we consider the problem of identifying

Email addresses: g.i.beintema@tue.nl (Gerben I.

Beintema∗), m.schoukens@tue.nl (Maarten Schoukens),

r.toth@tue.nl (Roland Toth).

1Implementation of the proposed SUBNET method is avail-

able at https://github.com/GerbenBeintema/deepSI and

the implementation of the simulation study is available at

GerbenBeintema/encoder-automatica-experiments.

2The research was partly funded by the E¨otv¨os Lor´and

Research Network (grant number: SA-77/2021).

nonlinear systems using nonlinear state-space (NL-SS)

models since they can represent a broad range of dy-

namic behaviours and are well applicable for multiple-

input multiple-output (MIMO) systems [1]. However, es-

timation of NL-SS models is rather challenging as the

state-variables are often not measurable (hidden Markov

model) and the associated optimisation-based training

process is prone to local minima and model/gradient

instability [17]. Furthermore, the associated nonlinear

state-transition and output functions rapidly grow in

complexity with a growing number of states and inputs.

If these are parametrized as a linear combination of basis

functions, e.g., polynomials as in [10,18], then this often

leads to an explosion of parameters to be able to cap-

ture the system dynamics. Also, probabilistic methods

such as [11] can become computationally burdensome

with increasing numbers of states and inputs or train-

ing sequence lengths. Hence, an eﬃcient representation

approach for the nonlinearities and a novel estimation

concept is required for NL-SS identiﬁcation.

Deep learning and artiﬁcial neural networks (ANNs) are

uniquely suited to approach the NL-SS identiﬁcation

challenges as they have been shown theoretically and

practically to be able to model complex data relations

while being computationally scalable to large datasets.

Preprint submitted to Automatica 6 July 2023

arXiv:2210.14816v2 [eess.SY] 5 Jul 2023

Although these beneﬁts inspired the use of state-space

neural network models two decades ago [19], fully ex-

ploiting these properties in NL-SS identiﬁcation without

major downsides is still an open problem. For instance,

careful initialization of the neural network weights and

biases partially mitigates the risk of local minima dur-

ing optimization, but requires additional information,

e.g., estimating of a linear approximate model of the

system [12]. Additionally, [20] has shown that multiple

shooting smooths the cost function, reducing the num-

ber of local minima and improving optimization stabil-

ity, which has given rise to the use of truncated simu-

lation error cost for ANN based NL-SS estimation [21].

However, the use of multiple shooting approaches comes

with the challenge of estimating a potentially large num-

ber of unknown initial states for each subsection, re-

sulting in a complexity increase of the optimisation. To

overcome this problem, auto-encoders have been investi-

gated to jointly estimate the model state and the under-

lying state-space functions using one-step-ahead predic-

tion cost [13]. However, these approaches fall short of giv-

ing accurate long-term predictions due to incorrect noise

handling, they need for tuning sensitive hyperparame-

ters in the composite auto-encoder/prediction-error loss

function, and they lack of consistency guarantees.

To overcome these challenges, this paper enhances the

subspace encoder-based method for identiﬁcation of

state-space (SS) neural networks ﬁrst introduced in [14]

with an innovation noise model and prove consistency

properties. The nonlinear SS model is parametrized

with ANNs for ﬂexibility and eﬃciency in representing

the often complex and high-dimensional state-transition

and output functions. The model is estimated under

atruncated prediction loss, evaluated on short subsec-

tions. Similarly to multiple shooting, these subsections

further improve computational scalability and opti-

mization stability, thereby reducing the importance of

parameter initialization. The internal state at the start

of each subsection is obtained using a nonlinear subspace

encoder which approximates the reconstructability map

of the SS model and further improves computational

scalability and data eﬃciency. The state-transition and

output functions of the SS model and the encoder are

simultaneously estimated based on the aforementioned

truncated prediction loss function. Finally, batch opti-

mization and early stopping are employed to further

improve the performance of the proposed identiﬁcation

scheme. We demonstrate that the resulting nonlinear

state-space identiﬁcation method is robust w.r.t. model

and gradient instability during training, has a rela-

tively small number of hyperparameters, and obtains

state-of-the-art results on benchmark examples.

To summarize, our main contributions are

•A novel ANN-based NL-SS identiﬁcation algorithm

that even in the presence of innovation noise distur-

bances provides reliable and computationally eﬃcient

data-driven modelling;

•Eﬃcient use of multiple-shooting based formulation

of the prediction loss via co-estimation of an encoder

function representing the reconstructability map of

the nonlinear model (computational eﬃciency);

•Proving that the proposed estimator is consistent (sta-

tistical validity) and enhances smoothness of the costs

function (optimisation eﬃciency);

•Guidelines for the choice of hyperparameters and a

detailed comparison of the proposed method to the

state-of-the-art on a widely used identiﬁcation bench-

mark.

The paper is structured as follows: Section 2 introduces

the considered data-generating system and identiﬁca-

tion problem. Section 3 discusses the proposed subspace

encoder method in detail and provides some user guide-

lines. We theoretically prove multiple key properties of

the proposed method in Section 4, and demonstrate

state-of-the-art performance of the method on a simula-

tion example and the Wiener–Hammerstein benchmark

in Sections 5-6, followed by the conclusions in Section 7.

2 Problem setting and preliminaries

2.1 Data-generating system

Consider a discrete-time system with innovation noise

that can be represented by the state-space description:

xk+1 =f(xk, uk, ek),(1a)

yk=h(xk) + ek,(1b)

where k∈Zis the discrete-time, eis an i.i.d. white

noise process with ﬁnite variance Σe∈Rny×ny, and u

is a quasi-stationary input process independent of eand

taking values in Rnuat each time moment k. Addition-

ally, xand yare the state and output processes, tak-

ing values in Rnxand Rnyrespectively. The functions

f:Rnx×nu×ny→Rnxand h:Rnx→Rny, i.e. the

state transition and output functions, are considered to

be bounded, deterministic maps. Without loss of gen-

erality we can assume that hdoes not contain a direct

feedthrough term. By assuming various structures for

fand h, many well-known noise structures can be ob-

tained such as nonlinear output noise (NOE), nonlinear

auto-regressive with exogenous input (NARX), nonlin-

ear auto-regressive with moving average exogenous input

(NARMAX) and nonlinear Box-Jenkins (NBJ) [22]. For

instance, if fdoes not depend on ek, then a NL-SS model

with an OE noise structure is obtained.

For a given sampled excitation sequence {uk}N

k=1 and

potentially unknown initial state x1∈Rnx, the obtained

response of the considered system (1) in terms of a sam-

ple path realisation is collected into an ordered input-

output (IO) data set DN={(uk, yk)}N

k=1 used for iden-

tiﬁcation. To avoid unnecessary clutter, we will not use

diﬀerent notation for random variables such as ykdeﬁned

by (1) and their sampled values, but at places where con-

fusion might arise, we will specify which notion is used.

2.2 Identiﬁcation problem

Based on the given data sequence DN, our objective is

to identify the dynamic relation (1), which boils down to

the estimation of fand h. Note that these functions can

not be estimated directly as xand eare not measured.

To accomplish our objective, notice that ek=yk−h(xk)

based on (1), hence, by substitution, we get

xk+1 =f(xk, uk, yk−h(xk)) = ˜

f(xk, uk, yk).(2)

Then, for n≥1, we can write

yk=h(xk) + ek,(3a)

yk+1 = (h◦˜

f)(xk, uk

k, yk

k) + ek+1,(3b)

yk+n= (h◦n˜

f)(xk, uk+n−1

k, yk+n−1

k) + ek+n,(3c)

where ◦stands for function concatenation on the state

argument, ◦nmeans n-times recursive repetition of ◦

(e.g., h◦2˜

f=h◦˜

f◦˜

f), and uk+n−1

k= [ u⊤

k··· u⊤

k+n−1]⊤

with yk+n−1

ksimilarly deﬁned. More compactly:

yk+n

k= Γn(xk, uk+n−1

k, yk+n−1

k) + ek+n

k.(4)

Note that the noise sequence ek+n

kis not available in

practice, hence, Eq. (4) cannot be directly used in esti-

mation. To overcome this problem, we can exploit the

i.i.d. white noise assumption on ekand calculate the ex-

pectation of (4) w.r.t. econditioned on the available past

data and the initial state xk:

ˆyk+n

k=Ee[yk+n

k|uk+n−1

k, yk+n−1

k, xk] =

Γn(xk, uk+n−1

k, yk+n−1

k),(5)

which is the so called one-step-ahead predictor associ-

ated with (1) and can be computed for the entire sample

path realisation in DN, i.e., ˆyN

1= ΓN(x1, uN−1

1, yN−1

or, for a speciﬁc sample, as ˆyn=γn(x1, un−1

1, yn−1

1) with

γn= (h◦n˜

f). We can exploit (5) to deﬁne the estimator

by introducing a parametrized form ΓN,θ of the predictor

in terms of fθ:Rnx×nu×ny→Rnxand hθ:Rnx→Rny

deﬁned by the parameters θ∈Θ⊆Rnθ. The classical

way to estimate the parameter vector θbased on a given

data set DNand ensure that fθand hθaccurately repre-

sent (1) is to minimize the ℓ2loss of the prediction error

ˆek=yk−ˆykbetween the measured samples ykand the

predicted response ˆykby ΓN,θ :

Vpred

DN(θ) = 1

k=1 ∥yk−ˆyk∥2

2,(6)

where the initial state x1is a parameter which is co-

estimated with θ. In case fθdoes not depend on ˆek, which

corresponds to an OE noise structure, then (6) is equal

to the well-known simulation error loss function.

The parametrized predictor ΓN,θ, can also be written in

a state-space form

ˆxk+1 =fθ(ˆxk, uk,ˆek),(7a)

ˆyk=hθ(ˆxk),(7b)

where ˆxand ˆyare the predicted state and predicted out-

put taking values from Rnxand Rnyrespectively, while ˆe

is the prediction error. In fact, (7) qualiﬁes as the model

structure used to estimate (1) through the minimization

of the identiﬁcation criterion (6).

In the sequel, we will consider fθand hθto be multi-

layer artiﬁcial neural networks (ANNs), parametrized

in θ, where each hidden layer is composed from m

activation functions ϕ:R→Rin the form of

zi,j =ϕ(Pmi−1

l=1 θw,i,j,lzi−1,l +θb,i,j ) where zi=

col(zi,1, . . . , zi,mi) is the latent variable represent-

ing the output of layer 1 ≤i≤q. Here, col() de-

notes composition of a column vector. For fθwith

qhidden-layers and linear input and output layers,

this means fθ(ˆxk, uk,ˆek) = θw,q+1zq(k) + θb,q+1 and

z0(k) = col(ˆxk, uk,ˆek). The parameters of the state

transition and output functions of (7) are collected in

θ. Furthermore, for the remainder of this paper we will

assume that fθand hθare Lipschitz continuous. Note

that assumption is not restrictive for commonly used

neural network structures since the activation functions

(ReLu, tanh, sigmoid, ...) used for ϕare Lipschitz con-

tinuous. Under these considerations, model structure

(7) represents a recurrent neural network and it is also

called state-space (SS) ANN in the literature [12, 19].

By using the ANNs fθand hθ, one can directly com-

pose the feedforward predictor network ΓN,θ and at-

tempt to solve minimisation of (6) directly. However, this

blunt approach can meet with considerable diﬃculties.

In ANN-based identiﬁcation, minimizing the simulation

error, which is a special case of (6) under an OE noise

structure, has been observed to result in accurate mod-

els [1], but its major shortcoming is that the computa-

tional cost scales at least linearly with N. Furthermore,

optimization of this cost function is sensitive to local

minima and gradient-based methods commonly display

unstable behaviour [20]. Hence, the problem that we aim

to solve in this paper is twofold: (i) achieve consistent

estimation of (1) under innovation noise conditions us-

ing the parametrized SS-ANN model (7) and one-step-

ahead prediction (6) and (ii) to provide a consistent es-

timator that drastically reduces the involved computa-

tional cost and ensures implementability.

3 The subspace encoder method

This section introduces the proposed subspace encoder

method that addresses many of the challenges encoun-

tered when using classical prediction or simulation error

identiﬁcation approaches for nonlinear state-space mod-

els. The proposed approach builds on the introduction of

two main ingredients: a truncated prediction loss based

cost function and a subspace encoder which is linked to

the concept of state reconstructability.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DeepSubspaceEncodersforNonlinearSystemIdentificationGerbenI.Beintema∗a,MaartenSchoukensa,RolandTotha,baDepartmentofElectricalEngineering,EindhovenUniversityofTechnology,Eindhoven,TheNetherlandsbSystemsandControlLaboratory,InstituteforComputerScienceandControl,Budapest,Hungary.AbstractUsingArtificial...

展开>> 收起<<

Deep Subspace Encoders for Nonlinear System Identification Gerben I. Beintemaa Maarten Schoukensa Roland Tothab aDepartment of Electrical Engineering Eindhoven University of Technology Eindhoven The Netherlands.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Deep Subspace Encoders for Nonlinear System Identification Gerben I. Beintemaa Maarten Schoukensa Roland Tothab aDepartment of Electrical Engineering Eindhoven University of Technology Eindhoven The Netherlands

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: