A preprint - October 4, 2022
Proving that specific deep network architectures are universal approximators of broad classes of functions
have been long studied with much progress in recent years. Beginning with [21] and [42], shallow networks
formed with ReLU or sigmoid activation functions were shown to be universal approximators of continuous
functions on compact subsets in Rn. Recently an effort has emerged to extend the existing work to more
specialized problems where other properties, for example monotonicity [43] or stability of inversion [71], are
desired in addition to universality.
In parallel with these developments, manifold learning has emerged as a vibrant subfield of machine learning.
Manifold learning is guided by the manifold hypothesis, the mantra that “high dimensional data are usually
clustered around a low-dimensional manifold” [84]. This in turn begat the subfield of manifold learning
[6, 7, 17, 28, 51, 54, 72, 74, 76, 83, 86, 87]. The guiding principle is that a useful network needn’t (and
often shouldn’t!) operate on all possible values in data space. Instead, it is better to use the ansatz that
one should manipulate data that lies on or near a low-dimensional manifold in data space. The manifold
hypothesis has helped guide network design in numerous applications, for example in classification (see
e.g. [52, 64, 72, 80, 90]) where data belonging to a fixed label is conceived of as being on a common
manifold, as well as both generative and encoding tasks, (see e.g. [12, 14, 20, 30, 49, 66, 67, 75, 78]
and [19, 22, 24, 46]) where the manifold hypothesis is used as an “existence proof” of a low-dimensional
parameterization of the data of interest. In the context of inverse problems, the manifold hypothesis can
be interpreted as a statement that forward operators map low-dimensional space to the high-dimensional
space of all possible measurements [1, 2, 3, 5, 44, 48, 50, 65, 79, 88]. The hypothesis has also been used
in Variational Autoencoder (VAE) and Generative Adversarial Networks (GAN) architectures for solving
inverse problems [2, 3, 4, 5, 31, 44, 50, 77, 79].
A natural question at the intersection of universality efforts and manifold learning is the following. What
kinds of architecture are universal approximators of maps between manifolds? In particular, which networks
are able to learn functions on manifolds if the manifolds are not known ahead of time? Some existing methods
that operate on the level of manifolds use tools such as persistent homology to learn homology groups of
data [15, 38, 39, 40, 41, 59]. In this work we look to learn mappings that are locally inverse stable. This
condition is not generally true for homotopies between manifolds, and so we are unable to use many tools in
TDA tools. Other approaches not utilizing TDA exist and are able to learn mappings between submanifolds,
but only when the manifolds to be learned have simple topology. That is, they only apply to manifolds
that are given by a single chart and so can’t learn mappings that change topology [12, 71, 48]. Because of
this latter limitation they are only able to apply to problems where the starting and target manifold are
different embeddings of the same manifold. In the context of generative models, this means that one has
to get the topology of the starting space “exactly right” in order to learn a pushforward mapping that has
inverse stability.
In order to combine the best of these two approaches, in this paper we look to learn mappings that are uni-
versal approximators of mappings between manifolds that are locally diffeomorphisms but globally complex.
A practical gain of such an approach can be seen in the generative context. One no longer has to get the
topology of the starting space ‘exactly right’ to insure that a suitable stable forward map exists. We also
present a result that establishes mathematical parallels between problems in invariant network design and
cryogenic electron microscopy (cryo-EM) where learning a mapping that changes topology arises naturally.
1.1 Network Description
For families of functions Hand Gwith compatible domain and range we use a well-tuned shorthand notation
and write H ◦ G :={h◦g:h∈ H, g ∈ G} to denote their pair-wise composition. We introduce extension-
projection networks which are of the form
T◦p◦E, where T∈ T , E ∈ E (1)
and T ⊂ C(Rn`,Rn`) is a family of homeomorphism T:Rn`→Rn`,pis a fixed projector, and Eis a family
of networks of the form E:=TnL
L◦ RnL−1,nL
L◦ · · · ◦ T n1
1◦ Rn0,n1
1◦ T n0
0where Rn`−1,n`
`⊂C(Rn`−1,Rn`) are
injective, Tn`
`⊂C(Rn`,Rn`) are homeomorphisms, L∈N,n0=n,nL=m, and n`≥n`−1for `= 1, . . . , L.
Examples of specific choices of Rin the definition of Einclude zero padding, multiplication by full-rank
matrix, injective ReLU layers or injective ReLU networks. Choices of the networks Tinclude Coupling flows
[23, 24, 47] or Autoregressive flows [45, 43]. For an extended discussion of these choices, please see [71,
Section 2].
2