Catch-22s of Reservoir Computing Yuanzhao Zhang1and Sean P. Cornelius2 1Santa Fe Institute 1399 Hyde Park Road Santa Fe NM 87501 USA

2025-04-30 0 0 7.87MB 22 页 10玖币

侵权投诉

Catch-22s of Reservoir Computing

Yuanzhao Zhang1and Sean P. Cornelius2

1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA

2Department of Physics, Toronto Metropolitan University, Toronto, ON, M5B 2K3, Canada

Reservoir Computing (RC) is a simple and eﬃcient model-free framework for forecasting the

behavior of nonlinear dynamical systems from data. Here, we show that there exist commonly-

studied systems for which leading RC frameworks struggle to learn the dynamics unless key in-

formation about the underlying system is already known. We focus on the important problem of

basin prediction—determining which attractor a system will converge to from its initial conditions.

First, we show that the predictions of standard RC models (echo state networks) depend critically

on warm-up time, requiring a warm-up trajectory containing almost the entire transient in order

to identify the correct attractor. Accordingly, we turn to Next-Generation Reservoir Computing

(NGRC), an attractive variant of RC that requires negligible warm-up time. By incorporating the

exact nonlinearities in the original equations, we show that NGRC can accurately reconstruct intri-

cate and high-dimensional basins of attraction, even with sparse training data (e.g., a single transient

trajectory). Yet, a tiny uncertainty in the exact nonlinearity can render prediction accuracy no bet-

ter than chance. Our results highlight the challenges faced by data-driven methods in learning

the dynamics of multistable systems and suggest potential avenues to make these approaches more

robust.

I. INTRODUCTION

Reservoir Computing (RC) [1–12] is a machine learning

framework for time-series predictions based on recurrent

neural networks. Because only the output layer needs to

be modiﬁed, RC is extremely eﬃcient to train. Despite

its simplicity, recent studies have shown that RC can be

extremely powerful when it comes to learning unknown

dynamical systems from data [13]. Speciﬁcally, RC has

been used to reconstruct attractors [14,15], calculate

Lyapunov exponents [16], infer bifurcation diagrams [17],

and even predict the basins of unseen attractors [18,19].

These advances open the possibilities of using RC to im-

prove climate modeling [20], create digital twins [21], an-

ticipate synchronization [22,23], predict tipping points

[24,25], and infer network connections [26].

Since the landmark paper demonstrating RC’s ability

to predict spatiotemporally chaotic systems from data

[13], there has been a ﬂurry of eﬀorts to understand the

success as well as identify limitations of RC [27–36]. As a

result, more sophisticated architectures have been devel-

oped to extend the capability of the original framework,

such as hybrid [37], parallel [38,39], and symmetry-aware

[40] RC schemes.

One particularly promising variant of RC was proposed

in 2021 and named Next Generation Reservoir Comput-

ing (NGRC) [41]. There, instead of having a nonlinear

reservoir and a linear output layer, one has a linear reser-

voir and a nonlinear output layer [42]. These diﬀerences,

though subtle, confer several advantages: First, NGRC

requires no random matrices and thus has much fewer

hyperparameters that need to be optimized. Moreover,

each NGRC prediction needs exceedingly few data points

to initiate (as opposed to thousands of data points in

standard RC), which is especially useful when predicting

the basins of attraction in multistable dynamical system

[43].

Understanding the basin structure is of fundamental

importance for dynamical systems with multiple attrac-

tors. Such systems include neural networks [44,45], gene

regulatory networks [46,47], diﬀerentiating cells [48,49],

and power grids [50,51]. Basins of attraction provide

a mapping from initial conditions to attractors and, in

the face of noise or perturbations, tell us the robustness

of each stable state. Despite their importance, basins

have not been well studied from a machine learning per-

spective, with most methods for data-driven modeling of

dynamical systems currently focusing on systems with a

single attractor.

In this Article, we show that the success of standard

RC in predicting the dynamics of multistable systems

can depend critically on having access to long initializa-

tion trajectories, while the performance of NGRC can

be extremely sensitive to the choice of readout nonlin-

earity. It has been observed that, for each new initial

condition, a standard RC model needs to be “warmed

up” with thousands of data points before it can start

making predictions [43]. In practice, such data will not

exist for most initial conditions. Even when they do ex-

ist, we demonstrate that the warm-up time series would

often have already approached the attractor, rendering

predictions unnecessary [52]. In contrast, NGRC can eas-

ily reproduce highly intermingled and high-dimensional

basins with minimal warm-up, provided the exact non-

linearity in the underlying equations is known. However,

a 1% uncertainty on that nonlinearity can already make

the NGRC basin predictions barely outperform random

guesses. Given this extreme sensitivity, even if one had

partial (but imprecise) knowledge of the underlying sys-

tem, a hybrid scheme combining NGRC and such knowl-

edge would still struggle in making reliable predictions.

The rest of the paper is organized as follows. In

Section II, we introduce the ﬁrst model system under

study—the magnetic pendulum, which is representative

arXiv:2210.10211v3 [cs.LG] 25 Sep 2023

of the diﬃculties of basin prediction in real nonlinear sys-

tems. In Sections III-IV, we apply standard RC to this

system, showing that accurate predictions rely heavily

on the length of the warm-up trajectory. We thus turn

to Next Generation Reservoir Computing, giving a brief

overview of its implementation in Section V. We present

our main results in Section VI, where we characterize the

eﬀect of readout nonlinearity on NGRC’s ability to pre-

dict the basins of the magnetic pendulum. We further

support our ﬁndings using coupled Kuramoto oscillators

in Section VII, which can have a large number of coex-

isting high-dimensional basins. Finally, we discuss the

implications of our results and suggest avenues for future

research in Section VIII.

II. THE MAGNETIC PENDULUM

For concreteness, we focus on the magnetic pendu-

lum [53] as a representative model. It is mechanistically

simple—being low-dimensional and generated by simple

physical laws—and yet captures all characteristics of the

basin prediction problem in general: the system is multi-

stable and predicting which attractor a given initial con-

dition will go to is nontrivial.

The system consists of an iron bob suspended by a

massless rod above three identical magnets, located at

the vertices of an equilateral triangle in the (x, y) plane

(Fig. 1). The bob moves under the inﬂuence of gravity,

drag due to air friction, and the attractive forces of the

magnets. For simplicity, we treat the magnets as mag-

netic point charges and assume that the length of the

pendulum rod is much greater than the distance between

the magnets, allowing us to describe the dynamics using

a small-angle approximation.

The resulting dimensionless equations of motion for the

pendulum bob are

¨x=−ω2

0x−a˙x+

i=1

˜xi−x

D(˜xi,˜yi)3,(1)

¨y=−ω2

0y−a˙y+

i=1

˜yi−y

D(˜xi,˜yi)3,(2)

where (˜xi,˜yi) are the coordinates of the ith magnet, ω0is

the pendulum’s natural frequency, and ais the damping

coeﬃcient. Here, D(˜x, ˜y) denotes the distance between

the bob and a given point (˜x, ˜y) in the magnets’ plane:

D(˜x, ˜y) = q(˜x−x)2+ (˜y−y)2+h2,(3)

where his the bob’s height above the plane. The system’s

four-dimensional state is thus x= (x, y, ˙x, ˙y)T.

We take the (x, y) coordinates of the magnets to be

/√3,0), (−1

/2√3,−1

/2), and (−1

/2√3,1

/2). Unless stated

otherwise, we set ω0= 0.5, a= 0.2, and h= 0.2 in our

simulations. These values are representative of all cases

for which the magnetic pendulum has exactly three stable

FIG. 1. Magnetic pendulum with three ﬁxed-point

attractors and the corresponding basins of attraction.

(Left) Illustration of the magnetic pendulum system. Three

magnets are placed on a ﬂat surface, each drawn in the color

we use to denote the corresponding basin of attraction. The

hollow circle indicates the (x, y) coordinates of the pendulum

bob, which together with the velocity ( ˙x, ˙y) fully specify the

system’s state. (Right) Basins of attraction for the region of

initial conditions under study, namely states of zero initial

velocity with −1.5≤x0, y0≤1.5.

ﬁxed points, corresponding to the bob being at rest and

pointed toward one of the three magnets.

Previous studies have largely focused on chaotic dy-

namics as a stress test of RC’s capabilities [2,6,8,9,

13,16,17,25,41]. Here we take a diﬀerent approach.

With non-zero damping, the magnetic pendulum dynam-

ics is autonomous and dissipative, meaning all trajecto-

ries must eventually converge to a ﬁxed point. Except

on a set of initial conditions of measure zero, this will be

one of the three stable ﬁxed points identiﬁed earlier. Yet

predicting which attractor a given initial condition will

go to can be far from straightforward, with the pendu-

lum wandering in an erratic transient before eventually

settling to one of the three magnets [53]. This manifests

as complicated basins of attraction with a “pseudo” (fat)

fractal structure (Fig. 1). We can control the “fractal-

ness” of the basins by, for example, varying the height

of the pendulum h. This generates basins with tunable

complexity to test the performance of (NG)RC.

III. IMPLEMENTATION OF STANDARD RC

Consider a dynamical system whose n-dimensional

state xobeys a set of nautonomous diﬀerential equa-

tions of the form

x=f(x).(4)

In general, the goal of reservoir computing is to approxi-

mate the ﬂow of Eq. (4) in discrete time by a map of the

form

xt+1 =F(xt).(5)

Here, the index truns over a set of discrete times sepa-

rated by ∆ttime units of the real system, where ∆tis a

timescale hyperparameter generally chosen to be smaller

than the characteristic timescale(s) of Eq. (4).

In standard RC, one views the state of the real system

as a linear readout from an auxiliary reservoir system,

whose state is an Nr-dimensional vector rt. Speciﬁcally:

xt=W·rt,(6)

where Wis an n×Nrmatrix of trainable output

weights. The reservoir system is generally much higher-

dimensional (Nr≫n), and its dynamics obey

rt+1 = (1 −α)rt+αf (Wr·rt+Win ·ut+b).(7)

Here Wris the Nr×Nrreservoir matrix, Win is the

Nr×ninput matrix, and bis an Nr-dimensional bias

vector. The input utis an n-dimensional vector that

represents either a state of the real system (ut=xt)

during training or the model’s own output (ut=W·rt)

during prediction. The nonlinear activation function fis

applied elementwise, where we adopt the standard choice

of f(·) = tanh(·). Finally, 0 < α ≤1 is the so-called leaky

coeﬃcient, which controls the inertia of the reservoir dy-

namics.

In general, only the output matrix Wis trained, with

Wr,Win, and bgenerated randomly from appropriate

ensembles. We follow best practices [54] and previous

studies in generating these latter components, speciﬁ-

cally:

•Wris the weighted adjacency matrix of a directed

Erd˝os-R´enyi graph on Nrnodes. The link prob-

ability is 0 < q ≤1, and we allow for self-loops.

We ﬁrst draw the link weights uniformly and inde-

pendently from [−1,1], and then normalize them

so that Wrhas a speciﬁed spectral radius ρ > 0.

Here, qand ρare hyperparameters.

•Win is a dense matrix, whose entries are initially

drawn uniformly and independently from [−1,1].

In the magnetic pendulum, the state xt(and hence

the input term utin Eq. (7)) is of the form

(x, y, ˙x, ˙y)T. To allow for diﬀerent characteristic

scales of the position vs. velocity dynamics, we scale

the ﬁrst two columns of Win by sxand the last two

columns by sv, where sx, sv>0 are scale hyperpa-

rameters.

•bhas its entries drawn uniformly and indepen-

dently from [−sb, sb], where sb>0 is a scale hy-

perparameter.

Training. To train an RC model from a given initial

condition x0, we ﬁrst integrate the real dynamics (4) to

obtain Ntrain additional states {xt}t=1,...,Ntrain . We then

iterate the reservoir dynamics (7) for Ntrain times from

r0=0, using the training data as inputs (ut=xt). This

produces a corresponding sequence of reservoir states,

{rt}t=1,...,Ntrain . Finally, we solve for the output weights

Wthat render Eq. (6) the best ﬁt to the training data

using Ridge Regression with Tikhonov regularization:

W=XRTRRT+λI−1.(8)

Here, X(R) is a matrix whose columns are the xt(rt)

for t= 1, . . . , Ntrain,Iis the identity matrix, and λ > 0 is

a regularization coeﬃcient that prevents ill-conditioning

of the weights, which can be symptomatic of overﬁtting

the data.

Prediction. To simulate a trained RC model from a

given initial condition x0, we ﬁrst integrate the true dy-

namics (4) forward in time to obtain a total of Nwarmup ≥

0 states {xt}t=1,...,Nwarmup . During the ﬁrst Nwarmup iter-

ations of the discrete dynamics (7), the input term comes

from the real trajectory, i.e., ut=xt. Thereafter, we re-

place the input with the model’s own output at the pre-

vious iteration (ut=W·rt). This creates a closed-loop

system from Eq. (7), which we iterate without further

input from the real system.

IV. CRITICAL DEPENDENCE OF STANDARD

RC ON WARMUP TIME

Though standard RC is extremely powerful, it is known

to demand large warmup periods (Nwarmup) in certain

problems in order to be stable [18]. In principle, this

could create a dilemma for the problem of basin predic-

tion, as long warmup trajectories from the real system

will generally be unavailable for initial conditions unseen

during training. And even if such data were available, the

problem could be rendered moot if the required warmup

exceeds the transient period of the given initial condition

[43]. Here, we systematically test RC’s sensitivity to the

warmup time using the magnetic pendulum system.

Our aim is to test standard RC under the most fa-

vorable conditions. Accordingly, we will train each RC

model on a single initial condition x0= (x0, y0,0,0)T

of the magnetic pendulum, and ask it to reproduce only

the trajectory from that initial condition. Likewise, be-

fore training, we systematically optimize the RC hyper-

parameters for that initial condition via Bayesian opti-

mization, seeking to minimize an objective function that

combines both training and validation error. For details

of this process, we refer the reader to Appendix S2.

In initial tests of our optimization procedure, we

found it largely insensitive to the reservoir connectivity

(q), with equally good training/validation performances

achievable across a range of qfrom 0.01 to 1. We like-

wise found little impact of the regularization coeﬃcient

0 5 10 15 20

-1

0 5 10 15 20

-0.5

0.0

0.5

twarmup

0 5 10 15 20

NRMSE

-2

10-1

100

101

⬆

⇧

(a)

(b)

FIG. 2. Forecastability transition of standard RC. The

initial condition used for both training and prediction was

(x0, y0) = (−1.3,0.75). The optimized RC hyperparameters

for this initial condition are listed in Table S2. We train an

ensemble of 100 diﬀerent RC models with these hyperparam-

eters, then simulate each from the training initial condition

using varying numbers of warmup points. In (a), the red

line/bands denote the median/interquartile range of the re-

sulting prediction NRMSE as a function of warmup time. The

blue curves show the total mechanical energy Eof the training

trajectory at the same times. We see a sharp drop in model

error at twarmup ≈6, only shortly before Efalls below the

height of the potential barriers (U∗, dashed line) separating

the three wells of the magnetic pendulum. In (b), we overlay

the xand ydynamics of the real system for comparison. This

conﬁrms that the “critical” warmup time in (a) aligns closely

with the end of the transient. The arrows in (a) denote the

warmup times used for the example simulations in Fig. 3.

over several orders of magnitude, with the optimizer fre-

quently pinning λat the supplied lower bound of 10−8.

Thus, in the interest of more fully exploring the most im-

portant hyperparameters, we ﬁx q= 0.03 and λ= 10−8.

We then optimize the remaining ﬁve continuous hyper-

parameters (ρ,sx,sv,sb,α) over the ranges speciﬁed in

Table S1.

Throughout this section, we set ∆t= 0.02, which is

smaller than the characteristic timescales of the magnetic

pendulum. We train each RC model on Ntrain = 4000

data points of the real system starting from the given

initial condition, which when paired with the chosen ∆t

encompass both the transient dynamics and convergence

to one of the attractors. We ﬁx the reservoir size at

Nr= 300, and we show that larger reservoir sizes do not

alter our results in Supplemental Material.

Figure 2shows the performance of an ensemble of RC

-1

twarmup = 6.1

-0.5

0.0

0.5

ẋ

-2

0 5 10 15 20

ẏ

-2

twarmup = 5.0

-1

-20

0 5 10 15 20

-20

Real RC

FIG. 3. Sensitivity of standard RC performance to

warmup time. We show example simulations from one RC

realization in Fig. 2with two diﬀerent warmup times (dashed

lines). The initial condition and optimized RC hyperparame-

ters are the same as in Fig. 2.

realizations with optimized hyperparameters for the ini-

tial condition (x0, y0) = (−1.2,0.75). Speciﬁcally, we

show the normalized root-mean-square error (NRMSE,

see Appendix C) between the real and RC-predicted

trajectory as a function of warmup time (twarmup =

Nwarmup ·∆t). In Fig. 2(a) we observe a sharp transi-

tion around twarmup = 6. Before this point, we consis-

tently have NRMSE = O(1), meaning the RC error is

comparable to the scale of the real trajectory. But after

the transition, the error is always quite small (NRMSE

≪1).

We can gain physical insight about this “forecastability

transition” by analyzing the total mechanical energy of

the training trajectory:

E=1

2˙x2+ ˙y2+U(x, y).(9)

Here U(x, y) is the potential corresponding to Eqs. 1-2,

where we set U= 0 at the minima corresponding to the

three attractors. Strikingly, the critical warmup time oc-

curs only shortly before the energy drops below a critical

value U∗—deﬁned as the height of the potential barriers

between the three wells [Fig. 2(a)]. By this time, the sys-

tem is unambiguously “trapped” near a speciﬁc magnet,

making only damped oscillations thereafter [Fig. 2(b)].

This suggests that even highly optimized RC models will

fail to reproduce convergence to the correct attractor un-

less they have already been guided there by data from the

real system.

We illustrate this further in Fig. 3, showing example

predictions from one RC realization considered above un-

der two diﬀerent warmup times: one above the critical

value in Fig. 2, and one below. Indeed, with suﬃcient

warmup (left), the RC trajectory is a near-perfect match

to the real one, both before and after the warmup period.

But if the warmup time is even slightly less than the crit-

ical value (right), the model quickly diverges once the au-

tonomous prediction begins. In this case, the model fails

to reproduce convergence to any ﬁxed-point attractor,

let alone the correct one, instead oscillating wildly.

This pattern holds when we repeat our experiment

for other initial conditions, re-optimizing hyperparam-

eters and re-training an ensemble of RC models for each

(Figs. S1–S4). In all cases, we see the same sharp drop

in RC prediction error at a particular warmup time

(Figs. S1,S3). Without at least this much warmup time,

the models fail to capture the real dynamics even qual-

itatively, often converging to an unphysical state with

non-zero ﬁnal velocity (Figs. S2,S4). Though there ex-

ist initial conditions that require shorter warmups—such

as (x0, y0) = (1.0,−0.5)—this is only because those ini-

tial conditions have shorter transients. Indeed, there are

other initial conditions—such as (x0, y0) = (1.75,1.6)—

that have longer transients and demand commensurately

larger warmup times (Figs. S3,S4). In no case have we

observed the RC dynamics staying faithful to the real

system unless the warmup is comparable to the transient

period.

Note that the breakdown of RC with insuﬃcient

warmup time cannot be attributed to an insuﬃciently

complex model vis-`a-vis the only hyperparameter we

have not optimized: the reservoir size (Nr). Indeed, we

have repeated our experiment with reservoirs twice as

large (Nr= 600). Even with optimized values of the

other hyperparameters, we still see a sharp transition in

the NRMSE at a warmup time comparable to the tran-

sient time (Fig. S5).

In sum, we have shown that standard RC is unsuitable

for basin prediction in this representative multistable sys-

tem. Speciﬁcally, RC models can only reliably reproduce

convergence to the correct attractor when they have been

guided to its vicinity. This is true even with the bene-

ﬁt of highly tuned hyperparameters (Appendix S2), and

validation on only initial conditions seen during training.

For the remainder of the paper, we instead turn to

Next Generation Reservoir Computing (NGRC). Though

it is known that every NGRC model implicitly deﬁnes

the connectivity matrix and other parameters of a stan-

dard RC model [41,42], there is no guarantee that the

two architectures would perform similarly in practice. In

particular, NGRC is known to demand substantially less

warmup time [41], potentially avoiding the “catch-22”

identiﬁed here for standard RC. Can this cutting-edge

framework succeed in learning the magnetic pendulum

and other paradigmatic multistable systems?

V. IMPLEMENTATION OF NGRC

We implement the NGRC framework following

Refs. [41,43]. In NGRC, the update rule for the dis-

crete dynamics is taken as:

xt+1 =xt+W·gt,(10)

where gtis an m-dimensional feature vector, calculated

from the current state and k−1 past states, namely

gt=g(xt,xt−1,...,xt−k+1).(11)

Here, k≥1 is a hyperparameter that governs the amount

of memory in the NGRC model, and Wis an n×m

matrix of trainable weights.

We elaborate on the functional form of the feature

embedding gbelow. But in general, the features can

be divided into three groups: (i) one constant (bias)

feature; (ii) mlin =nk linear features, corresponding to

the components of {xt,xt−1,...,xt−k+1}; and ﬁnally

(iii) mnonlin nonlinear features, each a nonlinear trans-

formation of the linear features. The total number of

features is thus m= 1 + mlin +mnonlin.

Training. Per Eq. (10), training an NGRC model

amounts to ﬁnding values for the weights Wthat give

the best ﬁt for the discrete update rule

yt=W·gt,(12)

where yt=xt+1 −xt. Accordingly, we calculate pairs

of inputs (gt) and next-step targets (yt) over Ntraj ≥1

training trajectories from the real system (4), each of

length Ntrain +k. We then solve for the values of Wthat

best ﬁt Eq. (12) in the least-squares sense via regularized

Ridge regression, namely

W=YGTGGT+λI−1.(13)

Here Y(G) is a matrix whose columns are the yt(gt).

The regularization coeﬃcient λplays the same role as in

standard RC [cf. Eq. (8)].

Prediction. To simulate a trained NGRC model

from a given initial condition x0, we ﬁrst integrate

the true dynamics (4) forward in time to obtain the

additional k−1 states needed to perform the ﬁrst

discrete update according to Eqs. (10)-(11). This is

the warmup period for the NGRC model. Thereafter,

we iterate Eqs. (10)-(11) as an autonomous dynamical

system, with each output becoming part of the model’s

input at the next time step. Thus in contrast to

training, the model receives no data from the real sys-

tem during prediction except the k−1 “warm-up” states.

There is a clear parallel between NGRC [41,42] and

statistical forecasting methods [55] such as nonlinear

vector-autoregression (NVAR). However, as noted in

Ref. [56], the feature vectors of a typical NGRC model

usually have far more terms than NVAR methods, as the

latter was designed with interpretability in mind. It is

the use of a library of many candidate features—in ad-

dition to other details like the typical training methods

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Catch-22sofReservoirComputingYuanzhaoZhang1andSeanP.Cornelius21SantaFeInstitute,1399HydeParkRoad,SantaFe,NM87501,USA2DepartmentofPhysics,TorontoMetropolitanUniversity,Toronto,ON,M5B2K3,CanadaReservoirComputing(RC)isasimpleandefficientmodel-freeframeworkforforecastingthebehaviorofnonlineardynamicalsy...

展开>> 收起<<

Catch-22s of Reservoir Computing Yuanzhao Zhang1and Sean P. Cornelius2 1Santa Fe Institute 1399 Hyde Park Road Santa Fe NM 87501 USA.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Catch-22s of Reservoir Computing Yuanzhao Zhang1and Sean P. Cornelius2 1Santa Fe Institute 1399 Hyde Park Road Santa Fe NM 87501 USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: