
evaluation and training dynamics (Le et al.,2015;Laurent and von Brecht,2016;Miller and Hardt,
2018). Roughly speaking, because the hidden units of an RNN are applied to the input over and over
again, the final output can quickly explode or vanish, depending on whether its Jacobian’s spectral
norm is greater or smaller than one respectively. Similar issues also arise during backpropagation
that hinder the learning process (Allen-Zhu et al.,2018).
Besides the hurdles with their implementation, recurrent architectures pose significant theoretical
challenges. Several basic questions include how to properly initialize RNNs, what is their expressivity
power (also known as representation capabilities), and why do they converge or diverge, all of which
require further investigations. In this paper, we take a closer look at randomly initialized RNNs:
Can we get a better understanding of the behavior of RNNs at initialization using dynamical systems?
We draw on the extensive dynamical systems literature—which has long asked similar questions
about the topological behavior of iterated compositions of functions—to study the properties of RNNs
with standard random initializations. We prove that under common initialization strategies, e.g., He
or Xavier (He et al.,2015,2016), RNNs can produce dynamics that are characterized by chaos, even
in innocuous settings and even in the absence of external input. Most importantly, chaos arises with
constant probability which is independent of the network’s width. Our theoretical findings explain
empirically observed behavior of RNNs from prior works, and are also validated in our experiments.
1
More broadly, our work builds on recent works that aim at understanding neural networks through the
lens of dynamical systems; for example, Chatziafratis et al. (2020b,a) use Sharkovsky’s theorem from
discrete dynamical systems to provide depth-width tradeoffs for the representation capabilities of
neural networks, and Sanford and Chatziafratis (2022) further give more fine-grained lower bounds
based on the notion of “chaotic itineraries”.
1.1 Two Motivating Behaviors of RNNs
Before stating our main result, we illustrate two concrete behaviors of RNNs that inspired our work.
The first example demonstrates that randomly initialized RNNs can lead to what is perhaps most
commonly perceived as “chaos”, while the second example demonstrates a qualitatively different
behavior of RNNs compared to FNNs. Our main result unifies the conclusions drawn from these two.
Scrambling Trajectories at Initialization
Prior works have empirically demonstrated that RNNs
can behave chaotically when their weights and biases are chosen according to a certain scheme.
For example, Laurent and von Brecht (2016) consider a simple 4-dimensional RNN with specific
parameters in the absence of input data. They plot the trajectories of two nearby points
x, y
with
kx−yk ≤ 10−7as they are propagated through many iterations of the RNN. They observe that the
long-term behavior (e.g., after 200 iterations) of the trajectories is highly sensitive to the initial states,
because distances may become small, then large again and so on. We ask:
Are RNNs (provably) chaotic even under standard heuristics for random initialization?
We answer this question in the affirmative both experimentally and theoretically. This question is
helpful to understand for multiple reasons. First, it informs us about the behavior of most RNNs,
as we start with a random setting of the parameters. Second, proving that a system is chaotic is
qualitatively much stronger than simply knowing its gradient to be exploding; this will become
evident below, where we describe the phenomenon of scrambling from dynamical systems. Finally,
understanding why and how often an RNN is chaotic can lead to even better methods for initialization.
To begin, we empirically verify the above statement by examining randomly initialized RNNs.
Figure 1demonstrates that trajectories of different points may be close together during some timesteps
and far apart during future timesteps, or vice versa. This phenomenon, which will be rigorously
established in later sections, is called scrambling (Li and Yorke,1975) and emerges as a direct
consequence of the existence of higher order fixed points (called periodic points) of the continuous
map defined by the random RNN.
1
Our code is made publicly available here:
https://github.com/steliostavroulakis/Chaos_RNNs/
blob/main/Depth_2_RNNs_and_Chaos_Period_3_Probability_RNNs.ipynb
2