Although these benefits inspired the use of state-space
neural network models two decades ago [19], fully ex-
ploiting these properties in NL-SS identification without
major downsides is still an open problem. For instance,
careful initialization of the neural network weights and
biases partially mitigates the risk of local minima dur-
ing optimization, but requires additional information,
e.g., estimating of a linear approximate model of the
system [12]. Additionally, [20] has shown that multiple
shooting smooths the cost function, reducing the num-
ber of local minima and improving optimization stabil-
ity, which has given rise to the use of truncated simu-
lation error cost for ANN based NL-SS estimation [21].
However, the use of multiple shooting approaches comes
with the challenge of estimating a potentially large num-
ber of unknown initial states for each subsection, re-
sulting in a complexity increase of the optimisation. To
overcome this problem, auto-encoders have been investi-
gated to jointly estimate the model state and the under-
lying state-space functions using one-step-ahead predic-
tion cost [13]. However, these approaches fall short of giv-
ing accurate long-term predictions due to incorrect noise
handling, they need for tuning sensitive hyperparame-
ters in the composite auto-encoder/prediction-error loss
function, and they lack of consistency guarantees.
To overcome these challenges, this paper enhances the
subspace encoder-based method for identification of
state-space (SS) neural networks first introduced in [14]
with an innovation noise model and prove consistency
properties. The nonlinear SS model is parametrized
with ANNs for flexibility and efficiency in representing
the often complex and high-dimensional state-transition
and output functions. The model is estimated under
atruncated prediction loss, evaluated on short subsec-
tions. Similarly to multiple shooting, these subsections
further improve computational scalability and opti-
mization stability, thereby reducing the importance of
parameter initialization. The internal state at the start
of each subsection is obtained using a nonlinear subspace
encoder which approximates the reconstructability map
of the SS model and further improves computational
scalability and data efficiency. The state-transition and
output functions of the SS model and the encoder are
simultaneously estimated based on the aforementioned
truncated prediction loss function. Finally, batch opti-
mization and early stopping are employed to further
improve the performance of the proposed identification
scheme. We demonstrate that the resulting nonlinear
state-space identification method is robust w.r.t. model
and gradient instability during training, has a rela-
tively small number of hyperparameters, and obtains
state-of-the-art results on benchmark examples.
To summarize, our main contributions are
•A novel ANN-based NL-SS identification algorithm
that even in the presence of innovation noise distur-
bances provides reliable and computationally efficient
data-driven modelling;
•Efficient use of multiple-shooting based formulation
of the prediction loss via co-estimation of an encoder
function representing the reconstructability map of
the nonlinear model (computational efficiency);
•Proving that the proposed estimator is consistent (sta-
tistical validity) and enhances smoothness of the costs
function (optimisation efficiency);
•Guidelines for the choice of hyperparameters and a
detailed comparison of the proposed method to the
state-of-the-art on a widely used identification bench-
mark.
The paper is structured as follows: Section 2 introduces
the considered data-generating system and identifica-
tion problem. Section 3 discusses the proposed subspace
encoder method in detail and provides some user guide-
lines. We theoretically prove multiple key properties of
the proposed method in Section 4, and demonstrate
state-of-the-art performance of the method on a simula-
tion example and the Wiener–Hammerstein benchmark
in Sections 5-6, followed by the conclusions in Section 7.
2 Problem setting and preliminaries
2.1 Data-generating system
Consider a discrete-time system with innovation noise
that can be represented by the state-space description:
xk+1 =f(xk, uk, ek),(1a)
yk=h(xk) + ek,(1b)
where k∈Zis the discrete-time, eis an i.i.d. white
noise process with finite variance Σe∈Rny×ny, and u
is a quasi-stationary input process independent of eand
taking values in Rnuat each time moment k. Addition-
ally, xand yare the state and output processes, tak-
ing values in Rnxand Rnyrespectively. The functions
f:Rnx×nu×ny→Rnxand h:Rnx→Rny, i.e. the
state transition and output functions, are considered to
be bounded, deterministic maps. Without loss of gen-
erality we can assume that hdoes not contain a direct
feedthrough term. By assuming various structures for
fand h, many well-known noise structures can be ob-
tained such as nonlinear output noise (NOE), nonlinear
auto-regressive with exogenous input (NARX), nonlin-
ear auto-regressive with moving average exogenous input
(NARMAX) and nonlinear Box-Jenkins (NBJ) [22]. For
instance, if fdoes not depend on ek, then a NL-SS model
with an OE noise structure is obtained.
For a given sampled excitation sequence {uk}N
k=1 and
potentially unknown initial state x1∈Rnx, the obtained
response of the considered system (1) in terms of a sam-
ple path realisation is collected into an ordered input-
output (IO) data set DN={(uk, yk)}N
k=1 used for iden-
tification. To avoid unnecessary clutter, we will not use
different notation for random variables such as ykdefined
by (1) and their sampled values, but at places where con-
fusion might arise, we will specify which notion is used.
2