Catch-22s of Reservoir Computing Yuanzhao Zhang1and Sean P. Cornelius2 1Santa Fe Institute 1399 Hyde Park Road Santa Fe NM 87501 USA

2025-04-30 0 0 7.87MB 22 页 10玖币
侵权投诉
Catch-22s of Reservoir Computing
Yuanzhao Zhang1and Sean P. Cornelius2
1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
2Department of Physics, Toronto Metropolitan University, Toronto, ON, M5B 2K3, Canada
Reservoir Computing (RC) is a simple and efficient model-free framework for forecasting the
behavior of nonlinear dynamical systems from data. Here, we show that there exist commonly-
studied systems for which leading RC frameworks struggle to learn the dynamics unless key in-
formation about the underlying system is already known. We focus on the important problem of
basin prediction—determining which attractor a system will converge to from its initial conditions.
First, we show that the predictions of standard RC models (echo state networks) depend critically
on warm-up time, requiring a warm-up trajectory containing almost the entire transient in order
to identify the correct attractor. Accordingly, we turn to Next-Generation Reservoir Computing
(NGRC), an attractive variant of RC that requires negligible warm-up time. By incorporating the
exact nonlinearities in the original equations, we show that NGRC can accurately reconstruct intri-
cate and high-dimensional basins of attraction, even with sparse training data (e.g., a single transient
trajectory). Yet, a tiny uncertainty in the exact nonlinearity can render prediction accuracy no bet-
ter than chance. Our results highlight the challenges faced by data-driven methods in learning
the dynamics of multistable systems and suggest potential avenues to make these approaches more
robust.
I. INTRODUCTION
Reservoir Computing (RC) [112] is a machine learning
framework for time-series predictions based on recurrent
neural networks. Because only the output layer needs to
be modified, RC is extremely efficient to train. Despite
its simplicity, recent studies have shown that RC can be
extremely powerful when it comes to learning unknown
dynamical systems from data [13]. Specifically, RC has
been used to reconstruct attractors [14,15], calculate
Lyapunov exponents [16], infer bifurcation diagrams [17],
and even predict the basins of unseen attractors [18,19].
These advances open the possibilities of using RC to im-
prove climate modeling [20], create digital twins [21], an-
ticipate synchronization [22,23], predict tipping points
[24,25], and infer network connections [26].
Since the landmark paper demonstrating RC’s ability
to predict spatiotemporally chaotic systems from data
[13], there has been a flurry of efforts to understand the
success as well as identify limitations of RC [2736]. As a
result, more sophisticated architectures have been devel-
oped to extend the capability of the original framework,
such as hybrid [37], parallel [38,39], and symmetry-aware
[40] RC schemes.
One particularly promising variant of RC was proposed
in 2021 and named Next Generation Reservoir Comput-
ing (NGRC) [41]. There, instead of having a nonlinear
reservoir and a linear output layer, one has a linear reser-
voir and a nonlinear output layer [42]. These differences,
though subtle, confer several advantages: First, NGRC
requires no random matrices and thus has much fewer
hyperparameters that need to be optimized. Moreover,
each NGRC prediction needs exceedingly few data points
to initiate (as opposed to thousands of data points in
standard RC), which is especially useful when predicting
the basins of attraction in multistable dynamical system
[43].
Understanding the basin structure is of fundamental
importance for dynamical systems with multiple attrac-
tors. Such systems include neural networks [44,45], gene
regulatory networks [46,47], differentiating cells [48,49],
and power grids [50,51]. Basins of attraction provide
a mapping from initial conditions to attractors and, in
the face of noise or perturbations, tell us the robustness
of each stable state. Despite their importance, basins
have not been well studied from a machine learning per-
spective, with most methods for data-driven modeling of
dynamical systems currently focusing on systems with a
single attractor.
In this Article, we show that the success of standard
RC in predicting the dynamics of multistable systems
can depend critically on having access to long initializa-
tion trajectories, while the performance of NGRC can
be extremely sensitive to the choice of readout nonlin-
earity. It has been observed that, for each new initial
condition, a standard RC model needs to be “warmed
up” with thousands of data points before it can start
making predictions [43]. In practice, such data will not
exist for most initial conditions. Even when they do ex-
ist, we demonstrate that the warm-up time series would
often have already approached the attractor, rendering
predictions unnecessary [52]. In contrast, NGRC can eas-
ily reproduce highly intermingled and high-dimensional
basins with minimal warm-up, provided the exact non-
linearity in the underlying equations is known. However,
a 1% uncertainty on that nonlinearity can already make
the NGRC basin predictions barely outperform random
guesses. Given this extreme sensitivity, even if one had
partial (but imprecise) knowledge of the underlying sys-
tem, a hybrid scheme combining NGRC and such knowl-
edge would still struggle in making reliable predictions.
The rest of the paper is organized as follows. In
Section II, we introduce the first model system under
study—the magnetic pendulum, which is representative
arXiv:2210.10211v3 [cs.LG] 25 Sep 2023
2
of the difficulties of basin prediction in real nonlinear sys-
tems. In Sections III-IV, we apply standard RC to this
system, showing that accurate predictions rely heavily
on the length of the warm-up trajectory. We thus turn
to Next Generation Reservoir Computing, giving a brief
overview of its implementation in Section V. We present
our main results in Section VI, where we characterize the
effect of readout nonlinearity on NGRC’s ability to pre-
dict the basins of the magnetic pendulum. We further
support our findings using coupled Kuramoto oscillators
in Section VII, which can have a large number of coex-
isting high-dimensional basins. Finally, we discuss the
implications of our results and suggest avenues for future
research in Section VIII.
II. THE MAGNETIC PENDULUM
For concreteness, we focus on the magnetic pendu-
lum [53] as a representative model. It is mechanistically
simple—being low-dimensional and generated by simple
physical laws—and yet captures all characteristics of the
basin prediction problem in general: the system is multi-
stable and predicting which attractor a given initial con-
dition will go to is nontrivial.
The system consists of an iron bob suspended by a
massless rod above three identical magnets, located at
the vertices of an equilateral triangle in the (x, y) plane
(Fig. 1). The bob moves under the influence of gravity,
drag due to air friction, and the attractive forces of the
magnets. For simplicity, we treat the magnets as mag-
netic point charges and assume that the length of the
pendulum rod is much greater than the distance between
the magnets, allowing us to describe the dynamics using
a small-angle approximation.
The resulting dimensionless equations of motion for the
pendulum bob are
¨x=ω2
0xa˙x+
3
X
i=1
˜xix
D(˜xi,˜yi)3,(1)
¨y=ω2
0ya˙y+
3
X
i=1
˜yiy
D(˜xi,˜yi)3,(2)
where (˜xi,˜yi) are the coordinates of the ith magnet, ω0is
the pendulum’s natural frequency, and ais the damping
coefficient. Here, D(˜x, ˜y) denotes the distance between
the bob and a given point (˜x, ˜y) in the magnets’ plane:
D(˜x, ˜y) = q(˜xx)2+ (˜yy)2+h2,(3)
where his the bob’s height above the plane. The system’s
four-dimensional state is thus x= (x, y, ˙x, ˙y)T.
We take the (x, y) coordinates of the magnets to be
(1
/3,0), (1
/23,1
/2), and (1
/23,1
/2). Unless stated
otherwise, we set ω0= 0.5, a= 0.2, and h= 0.2 in our
simulations. These values are representative of all cases
for which the magnetic pendulum has exactly three stable
FIG. 1. Magnetic pendulum with three fixed-point
attractors and the corresponding basins of attraction.
(Left) Illustration of the magnetic pendulum system. Three
magnets are placed on a flat surface, each drawn in the color
we use to denote the corresponding basin of attraction. The
hollow circle indicates the (x, y) coordinates of the pendulum
bob, which together with the velocity ( ˙x, ˙y) fully specify the
system’s state. (Right) Basins of attraction for the region of
initial conditions under study, namely states of zero initial
velocity with 1.5x0, y01.5.
fixed points, corresponding to the bob being at rest and
pointed toward one of the three magnets.
Previous studies have largely focused on chaotic dy-
namics as a stress test of RC’s capabilities [2,6,8,9,
13,16,17,25,41]. Here we take a different approach.
With non-zero damping, the magnetic pendulum dynam-
ics is autonomous and dissipative, meaning all trajecto-
ries must eventually converge to a fixed point. Except
on a set of initial conditions of measure zero, this will be
one of the three stable fixed points identified earlier. Yet
predicting which attractor a given initial condition will
go to can be far from straightforward, with the pendu-
lum wandering in an erratic transient before eventually
settling to one of the three magnets [53]. This manifests
as complicated basins of attraction with a “pseudo” (fat)
fractal structure (Fig. 1). We can control the “fractal-
ness” of the basins by, for example, varying the height
of the pendulum h. This generates basins with tunable
complexity to test the performance of (NG)RC.
III. IMPLEMENTATION OF STANDARD RC
Consider a dynamical system whose n-dimensional
state xobeys a set of nautonomous differential equa-
tions of the form
˙
x=f(x).(4)
3
In general, the goal of reservoir computing is to approxi-
mate the flow of Eq. (4) in discrete time by a map of the
form
xt+1 =F(xt).(5)
Here, the index truns over a set of discrete times sepa-
rated by ∆ttime units of the real system, where ∆tis a
timescale hyperparameter generally chosen to be smaller
than the characteristic timescale(s) of Eq. (4).
In standard RC, one views the state of the real system
as a linear readout from an auxiliary reservoir system,
whose state is an Nr-dimensional vector rt. Specifically:
xt=W·rt,(6)
where Wis an n×Nrmatrix of trainable output
weights. The reservoir system is generally much higher-
dimensional (Nrn), and its dynamics obey
rt+1 = (1 α)rt+αf (Wr·rt+Win ·ut+b).(7)
Here Wris the Nr×Nrreservoir matrix, Win is the
Nr×ninput matrix, and bis an Nr-dimensional bias
vector. The input utis an n-dimensional vector that
represents either a state of the real system (ut=xt)
during training or the model’s own output (ut=W·rt)
during prediction. The nonlinear activation function fis
applied elementwise, where we adopt the standard choice
of f(·) = tanh(·). Finally, 0 < α 1 is the so-called leaky
coefficient, which controls the inertia of the reservoir dy-
namics.
In general, only the output matrix Wis trained, with
Wr,Win, and bgenerated randomly from appropriate
ensembles. We follow best practices [54] and previous
studies in generating these latter components, specifi-
cally:
Wris the weighted adjacency matrix of a directed
Erd˝os-R´enyi graph on Nrnodes. The link prob-
ability is 0 < q 1, and we allow for self-loops.
We first draw the link weights uniformly and inde-
pendently from [1,1], and then normalize them
so that Wrhas a specified spectral radius ρ > 0.
Here, qand ρare hyperparameters.
Win is a dense matrix, whose entries are initially
drawn uniformly and independently from [1,1].
In the magnetic pendulum, the state xt(and hence
the input term utin Eq. (7)) is of the form
(x, y, ˙x, ˙y)T. To allow for different characteristic
scales of the position vs. velocity dynamics, we scale
the first two columns of Win by sxand the last two
columns by sv, where sx, sv>0 are scale hyperpa-
rameters.
bhas its entries drawn uniformly and indepen-
dently from [sb, sb], where sb>0 is a scale hy-
perparameter.
Training. To train an RC model from a given initial
condition x0, we first integrate the real dynamics (4) to
obtain Ntrain additional states {xt}t=1,...,Ntrain . We then
iterate the reservoir dynamics (7) for Ntrain times from
r0=0, using the training data as inputs (ut=xt). This
produces a corresponding sequence of reservoir states,
{rt}t=1,...,Ntrain . Finally, we solve for the output weights
Wthat render Eq. (6) the best fit to the training data
using Ridge Regression with Tikhonov regularization:
W=XRTRRT+λI1.(8)
Here, X(R) is a matrix whose columns are the xt(rt)
for t= 1, . . . , Ntrain,Iis the identity matrix, and λ > 0 is
a regularization coefficient that prevents ill-conditioning
of the weights, which can be symptomatic of overfitting
the data.
Prediction. To simulate a trained RC model from a
given initial condition x0, we first integrate the true dy-
namics (4) forward in time to obtain a total of Nwarmup
0 states {xt}t=1,...,Nwarmup . During the first Nwarmup iter-
ations of the discrete dynamics (7), the input term comes
from the real trajectory, i.e., ut=xt. Thereafter, we re-
place the input with the model’s own output at the pre-
vious iteration (ut=W·rt). This creates a closed-loop
system from Eq. (7), which we iterate without further
input from the real system.
IV. CRITICAL DEPENDENCE OF STANDARD
RC ON WARMUP TIME
Though standard RC is extremely powerful, it is known
to demand large warmup periods (Nwarmup) in certain
problems in order to be stable [18]. In principle, this
could create a dilemma for the problem of basin predic-
tion, as long warmup trajectories from the real system
will generally be unavailable for initial conditions unseen
during training. And even if such data were available, the
problem could be rendered moot if the required warmup
exceeds the transient period of the given initial condition
[43]. Here, we systematically test RC’s sensitivity to the
warmup time using the magnetic pendulum system.
Our aim is to test standard RC under the most fa-
vorable conditions. Accordingly, we will train each RC
model on a single initial condition x0= (x0, y0,0,0)T
of the magnetic pendulum, and ask it to reproduce only
the trajectory from that initial condition. Likewise, be-
fore training, we systematically optimize the RC hyper-
parameters for that initial condition via Bayesian opti-
mization, seeking to minimize an objective function that
combines both training and validation error. For details
of this process, we refer the reader to Appendix S2.
In initial tests of our optimization procedure, we
found it largely insensitive to the reservoir connectivity
(q), with equally good training/validation performances
achievable across a range of qfrom 0.01 to 1. We like-
wise found little impact of the regularization coefficient
4
0 5 10 15 20
x
-1
0
1
t
0 5 10 15 20
y
-0.5
0.0
0.5
E
0
1
2
3
4
5
6
U*
twarmup
0 5 10 15 20
NRMSE
10
-2
10-1
100
101
(a)
(b)
FIG. 2. Forecastability transition of standard RC. The
initial condition used for both training and prediction was
(x0, y0) = (1.3,0.75). The optimized RC hyperparameters
for this initial condition are listed in Table S2. We train an
ensemble of 100 different RC models with these hyperparam-
eters, then simulate each from the training initial condition
using varying numbers of warmup points. In (a), the red
line/bands denote the median/interquartile range of the re-
sulting prediction NRMSE as a function of warmup time. The
blue curves show the total mechanical energy Eof the training
trajectory at the same times. We see a sharp drop in model
error at twarmup 6, only shortly before Efalls below the
height of the potential barriers (U, dashed line) separating
the three wells of the magnetic pendulum. In (b), we overlay
the xand ydynamics of the real system for comparison. This
confirms that the “critical” warmup time in (a) aligns closely
with the end of the transient. The arrows in (a) denote the
warmup times used for the example simulations in Fig. 3.
over several orders of magnitude, with the optimizer fre-
quently pinning λat the supplied lower bound of 108.
Thus, in the interest of more fully exploring the most im-
portant hyperparameters, we fix q= 0.03 and λ= 108.
We then optimize the remaining five continuous hyper-
parameters (ρ,sx,sv,sb,α) over the ranges specified in
Table S1.
Throughout this section, we set ∆t= 0.02, which is
smaller than the characteristic timescales of the magnetic
pendulum. We train each RC model on Ntrain = 4000
data points of the real system starting from the given
initial condition, which when paired with the chosen ∆t
encompass both the transient dynamics and convergence
to one of the attractors. We fix the reservoir size at
Nr= 300, and we show that larger reservoir sizes do not
alter our results in Supplemental Material.
Figure 2shows the performance of an ensemble of RC
x
-1
0
1
twarmup = 6.1
y
-0.5
0.0
0.5
ẋ
-2
0
2
t
0 5 10 15 20
ẏ
-2
0
2
-2
0
2
twarmup = 5.0
-1
0
1
-20
0
20
t
0 5 10 15 20
-20
0
20
Real RC
FIG. 3. Sensitivity of standard RC performance to
warmup time. We show example simulations from one RC
realization in Fig. 2with two different warmup times (dashed
lines). The initial condition and optimized RC hyperparame-
ters are the same as in Fig. 2.
realizations with optimized hyperparameters for the ini-
tial condition (x0, y0) = (1.2,0.75). Specifically, we
show the normalized root-mean-square error (NRMSE,
see Appendix C) between the real and RC-predicted
trajectory as a function of warmup time (twarmup =
Nwarmup ·t). In Fig. 2(a) we observe a sharp transi-
tion around twarmup = 6. Before this point, we consis-
tently have NRMSE = O(1), meaning the RC error is
comparable to the scale of the real trajectory. But after
the transition, the error is always quite small (NRMSE
1).
We can gain physical insight about this “forecastability
transition” by analyzing the total mechanical energy of
the training trajectory:
E=1
2˙x2+ ˙y2+U(x, y).(9)
Here U(x, y) is the potential corresponding to Eqs. 1-2,
where we set U= 0 at the minima corresponding to the
three attractors. Strikingly, the critical warmup time oc-
curs only shortly before the energy drops below a critical
value U—defined as the height of the potential barriers
between the three wells [Fig. 2(a)]. By this time, the sys-
tem is unambiguously “trapped” near a specific magnet,
making only damped oscillations thereafter [Fig. 2(b)].
This suggests that even highly optimized RC models will
fail to reproduce convergence to the correct attractor un-
less they have already been guided there by data from the
real system.
We illustrate this further in Fig. 3, showing example
predictions from one RC realization considered above un-
der two different warmup times: one above the critical
value in Fig. 2, and one below. Indeed, with sufficient
warmup (left), the RC trajectory is a near-perfect match
5
to the real one, both before and after the warmup period.
But if the warmup time is even slightly less than the crit-
ical value (right), the model quickly diverges once the au-
tonomous prediction begins. In this case, the model fails
to reproduce convergence to any fixed-point attractor,
let alone the correct one, instead oscillating wildly.
This pattern holds when we repeat our experiment
for other initial conditions, re-optimizing hyperparam-
eters and re-training an ensemble of RC models for each
(Figs. S1S4). In all cases, we see the same sharp drop
in RC prediction error at a particular warmup time
(Figs. S1,S3). Without at least this much warmup time,
the models fail to capture the real dynamics even qual-
itatively, often converging to an unphysical state with
non-zero final velocity (Figs. S2,S4). Though there ex-
ist initial conditions that require shorter warmups—such
as (x0, y0) = (1.0,0.5)—this is only because those ini-
tial conditions have shorter transients. Indeed, there are
other initial conditions—such as (x0, y0) = (1.75,1.6)—
that have longer transients and demand commensurately
larger warmup times (Figs. S3,S4). In no case have we
observed the RC dynamics staying faithful to the real
system unless the warmup is comparable to the transient
period.
Note that the breakdown of RC with insufficient
warmup time cannot be attributed to an insufficiently
complex model vis-`a-vis the only hyperparameter we
have not optimized: the reservoir size (Nr). Indeed, we
have repeated our experiment with reservoirs twice as
large (Nr= 600). Even with optimized values of the
other hyperparameters, we still see a sharp transition in
the NRMSE at a warmup time comparable to the tran-
sient time (Fig. S5).
In sum, we have shown that standard RC is unsuitable
for basin prediction in this representative multistable sys-
tem. Specifically, RC models can only reliably reproduce
convergence to the correct attractor when they have been
guided to its vicinity. This is true even with the bene-
fit of highly tuned hyperparameters (Appendix S2), and
validation on only initial conditions seen during training.
For the remainder of the paper, we instead turn to
Next Generation Reservoir Computing (NGRC). Though
it is known that every NGRC model implicitly defines
the connectivity matrix and other parameters of a stan-
dard RC model [41,42], there is no guarantee that the
two architectures would perform similarly in practice. In
particular, NGRC is known to demand substantially less
warmup time [41], potentially avoiding the “catch-22”
identified here for standard RC. Can this cutting-edge
framework succeed in learning the magnetic pendulum
and other paradigmatic multistable systems?
V. IMPLEMENTATION OF NGRC
We implement the NGRC framework following
Refs. [41,43]. In NGRC, the update rule for the dis-
crete dynamics is taken as:
xt+1 =xt+W·gt,(10)
where gtis an m-dimensional feature vector, calculated
from the current state and k1 past states, namely
gt=g(xt,xt1,...,xtk+1).(11)
Here, k1 is a hyperparameter that governs the amount
of memory in the NGRC model, and Wis an n×m
matrix of trainable weights.
We elaborate on the functional form of the feature
embedding gbelow. But in general, the features can
be divided into three groups: (i) one constant (bias)
feature; (ii) mlin =nk linear features, corresponding to
the components of {xt,xt1,...,xtk+1}; and finally
(iii) mnonlin nonlinear features, each a nonlinear trans-
formation of the linear features. The total number of
features is thus m= 1 + mlin +mnonlin.
Training. Per Eq. (10), training an NGRC model
amounts to finding values for the weights Wthat give
the best fit for the discrete update rule
yt=W·gt,(12)
where yt=xt+1 xt. Accordingly, we calculate pairs
of inputs (gt) and next-step targets (yt) over Ntraj 1
training trajectories from the real system (4), each of
length Ntrain +k. We then solve for the values of Wthat
best fit Eq. (12) in the least-squares sense via regularized
Ridge regression, namely
W=YGTGGT+λI1.(13)
Here Y(G) is a matrix whose columns are the yt(gt).
The regularization coefficient λplays the same role as in
standard RC [cf. Eq. (8)].
Prediction. To simulate a trained NGRC model
from a given initial condition x0, we first integrate
the true dynamics (4) forward in time to obtain the
additional k1 states needed to perform the first
discrete update according to Eqs. (10)-(11). This is
the warmup period for the NGRC model. Thereafter,
we iterate Eqs. (10)-(11) as an autonomous dynamical
system, with each output becoming part of the model’s
input at the next time step. Thus in contrast to
training, the model receives no data from the real sys-
tem during prediction except the k1 “warm-up” states.
There is a clear parallel between NGRC [41,42] and
statistical forecasting methods [55] such as nonlinear
vector-autoregression (NVAR). However, as noted in
Ref. [56], the feature vectors of a typical NGRC model
usually have far more terms than NVAR methods, as the
latter was designed with interpretability in mind. It is
the use of a library of many candidate features—in ad-
dition to other details like the typical training methods
摘要:

Catch-22sofReservoirComputingYuanzhaoZhang1andSeanP.Cornelius21SantaFeInstitute,1399HydeParkRoad,SantaFe,NM87501,USA2DepartmentofPhysics,TorontoMetropolitanUniversity,Toronto,ON,M5B2K3,CanadaReservoirComputing(RC)isasimpleandefficientmodel-freeframeworkforforecastingthebehaviorofnonlineardynamicalsy...

展开>> 收起<<
Catch-22s of Reservoir Computing Yuanzhao Zhang1and Sean P. Cornelius2 1Santa Fe Institute 1399 Hyde Park Road Santa Fe NM 87501 USA.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:7.87MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注