3
In general, the goal of reservoir computing is to approxi-
mate the flow of Eq. (4) in discrete time by a map of the
form
xt+1 =F(xt).(5)
Here, the index truns over a set of discrete times sepa-
rated by ∆ttime units of the real system, where ∆tis a
timescale hyperparameter generally chosen to be smaller
than the characteristic timescale(s) of Eq. (4).
In standard RC, one views the state of the real system
as a linear readout from an auxiliary reservoir system,
whose state is an Nr-dimensional vector rt. Specifically:
xt=W·rt,(6)
where Wis an n×Nrmatrix of trainable output
weights. The reservoir system is generally much higher-
dimensional (Nr≫n), and its dynamics obey
rt+1 = (1 −α)rt+αf (Wr·rt+Win ·ut+b).(7)
Here Wris the Nr×Nrreservoir matrix, Win is the
Nr×ninput matrix, and bis an Nr-dimensional bias
vector. The input utis an n-dimensional vector that
represents either a state of the real system (ut=xt)
during training or the model’s own output (ut=W·rt)
during prediction. The nonlinear activation function fis
applied elementwise, where we adopt the standard choice
of f(·) = tanh(·). Finally, 0 < α ≤1 is the so-called leaky
coefficient, which controls the inertia of the reservoir dy-
namics.
In general, only the output matrix Wis trained, with
Wr,Win, and bgenerated randomly from appropriate
ensembles. We follow best practices [54] and previous
studies in generating these latter components, specifi-
cally:
•Wris the weighted adjacency matrix of a directed
Erd˝os-R´enyi graph on Nrnodes. The link prob-
ability is 0 < q ≤1, and we allow for self-loops.
We first draw the link weights uniformly and inde-
pendently from [−1,1], and then normalize them
so that Wrhas a specified spectral radius ρ > 0.
Here, qand ρare hyperparameters.
•Win is a dense matrix, whose entries are initially
drawn uniformly and independently from [−1,1].
In the magnetic pendulum, the state xt(and hence
the input term utin Eq. (7)) is of the form
(x, y, ˙x, ˙y)T. To allow for different characteristic
scales of the position vs. velocity dynamics, we scale
the first two columns of Win by sxand the last two
columns by sv, where sx, sv>0 are scale hyperpa-
rameters.
•bhas its entries drawn uniformly and indepen-
dently from [−sb, sb], where sb>0 is a scale hy-
perparameter.
Training. To train an RC model from a given initial
condition x0, we first integrate the real dynamics (4) to
obtain Ntrain additional states {xt}t=1,...,Ntrain . We then
iterate the reservoir dynamics (7) for Ntrain times from
r0=0, using the training data as inputs (ut=xt). This
produces a corresponding sequence of reservoir states,
{rt}t=1,...,Ntrain . Finally, we solve for the output weights
Wthat render Eq. (6) the best fit to the training data
using Ridge Regression with Tikhonov regularization:
W=XRTRRT+λI−1.(8)
Here, X(R) is a matrix whose columns are the xt(rt)
for t= 1, . . . , Ntrain,Iis the identity matrix, and λ > 0 is
a regularization coefficient that prevents ill-conditioning
of the weights, which can be symptomatic of overfitting
the data.
Prediction. To simulate a trained RC model from a
given initial condition x0, we first integrate the true dy-
namics (4) forward in time to obtain a total of Nwarmup ≥
0 states {xt}t=1,...,Nwarmup . During the first Nwarmup iter-
ations of the discrete dynamics (7), the input term comes
from the real trajectory, i.e., ut=xt. Thereafter, we re-
place the input with the model’s own output at the pre-
vious iteration (ut=W·rt). This creates a closed-loop
system from Eq. (7), which we iterate without further
input from the real system.
IV. CRITICAL DEPENDENCE OF STANDARD
RC ON WARMUP TIME
Though standard RC is extremely powerful, it is known
to demand large warmup periods (Nwarmup) in certain
problems in order to be stable [18]. In principle, this
could create a dilemma for the problem of basin predic-
tion, as long warmup trajectories from the real system
will generally be unavailable for initial conditions unseen
during training. And even if such data were available, the
problem could be rendered moot if the required warmup
exceeds the transient period of the given initial condition
[43]. Here, we systematically test RC’s sensitivity to the
warmup time using the magnetic pendulum system.
Our aim is to test standard RC under the most fa-
vorable conditions. Accordingly, we will train each RC
model on a single initial condition x0= (x0, y0,0,0)T
of the magnetic pendulum, and ask it to reproduce only
the trajectory from that initial condition. Likewise, be-
fore training, we systematically optimize the RC hyper-
parameters for that initial condition via Bayesian opti-
mization, seeking to minimize an objective function that
combines both training and validation error. For details
of this process, we refer the reader to Appendix S2.
In initial tests of our optimization procedure, we
found it largely insensitive to the reservoir connectivity
(q), with equally good training/validation performances
achievable across a range of qfrom 0.01 to 1. We like-
wise found little impact of the regularization coefficient