
4 Luca Galimberti et al.
Our “static” approximation theorems provides quantitative approximation guarantees for several “neural oper-
ators” used in practice, especially in the numerical Partial Differential Equations (PDEs), e.g., [61], and in the
inverse-problem literature, e.g., [2,18,3,19,28]. In the static case, the same argument is valid also for the general
qualitative (rate-free) approximation theorems of [97,12,72].
We now describe more in detail the different areas in which the present paper contributes.
Our contribution in the Approximation Theory of Neural Operators. Our results provide the first set of quantitative
approximation guarantees for generalized dynamical systems evolving on general infinite-dimensional spaces. By
refining the memorizing hypernetwork argument of [1], together with our general solution to the static universal
approximation problem, in the class of Hölder functions2, we are able to confirm a well-known folklore approximation
of dynamical systems literature. Namely, that increasing a sequential neural operator’s latent space’s dimension by
a positive integer Qand our neural network’s depth3by ˜
O(T−Qlog(T−Q)) and width by ˜
O(QT −Q)implies that
we may approximate O(T)more time-steps in the future with the same prescribed approximation error.
To the best of our knowledge, our dynamic result is the only quantitative universal approximation theorem
guaranteeing that a recurrent neural network model can approximate any suitably regular infinite-dimensional
non-linear dynamical systems. Likewise, our static result is to the best of our knowledge the only general infinite-
dimensional guarantee showing that a neural operator enjoys favourable approximation rates when the target map
is smooth enough.
Our contribution in the Approximation Theory of RNNs In the finite-dimensional context, CNOs become strict
sub-structures of full RNNs, where the internal parameters are updated/generated via an auxiliary hypernetwork.
Noticing this structural inclusion, our results rigorously support the folklore that RNNs may be more suitable when
approximating causal maps, than feedforward neural network (FFNN, henceforth), see Section 5. This is because
our theory yields expression rates for RNN approximations of causal maps between finite-dimensional spaces, which
are more efficient than currently available comparable rates for FFNNs.
Technical contributions: Our results apply to sequences of non-linear operators between any “good linear” metric
spaces. By “good linear” metric space we mean any Fréchet space admitting Schauder basis. This includes many
natural examples (e.g., the sequence space RNwith its usual metric) outside the scope of the Banach, Hilbert4
spaces carrying Schauder basis and Euclidean settings; which are completely subsumed by our assumptions. In
other words, we treat the most general tractable linear setting where one can hope to obtain quantitative universal
approximation theorems.
Organization of our paper This research project answers theoretical deep learning questions by combining tools
from approximation theory, functional analysis, and stochastic analysis. Therefore, we provide a concise exposition
of each of the relevant tools from these areas in our “preliminaries” Section 2.
Section 3contains our quantitative universal approximation theorems. In the static case, we derive expression
rates for the static component of our model, namely the neural filters, which depend on the regularity of the target
operator being approximated; from Hölder trace-class to smooth trace-class and on the usual quantities5. Our main
approximation theorem in the dynamic case additionally encodes the target causal map’s memory decay rate.
Section 4.2 applies our main results to derive approximation guarantees for the solution operators of a broad
range of SDEs with stochastic coefficients, possibly having jumps (“stochastic discontinuities”) at times on a pre-
specified time-grid and with initial random noise. Section 5, examines the implication of our approximation rates
for RNNs, in the finite-dimensional setting, where we find that RNNs are strictly more efficient than FFNN when
approximating causal maps. Section 6concludes. Finally, Appendix Acontains any background material required
in the derivations of our main results whose derivations are relegated to Appendix Band Appendix Dcontains
auxiliary background material on Fréchet spaces and generalized inverses.
1.1 Notation
For the sake of the reader, we collect and define here the notations we will use in the rest of the paper, or we
indicate the exact point where the first appearance of a symbol occurs:
2By universality here, we mean that every α-Hölder function can be approximated by our “static model”, for any 0< α ≤1.
NB, when all spaces are finite-dimensional then this implies the classical notion of universal approximation, formulated in [54], since
compactly supported smooth functions are 1-Hölder (i.e. Lipschitz) and these are dense in the space of continuous functions between
two Euclidean spaces equipped with the topology of uniform convergence on compact sets.
3We use ˜
Oto omit terms depending logarithmically on Qand T.
4Note every separable Hilbert space carries an orthonormal Schauder basis, so for the reader interested in Hilbert input and output
spaces, we note that these conditions are automatically satisfied in that setting.
5Such as the compact set’s diameter.