Deep Invertible Approximation of Topologically Rich Maps between Manifolds A Preprint

2025-05-06 0 0 779.89KB 25 页 10玖币
侵权投诉
Deep Invertible Approximation of Topologically
Rich Maps between Manifolds
A Preprint
Michael Puthawala Matti Lassas Ivan Dokmani´c Pekka Pankka §
Maarten V. de Hoop
October 4, 2022
Abstract
How can we design neural networks that allow for stable universal approximation of maps
between topologically interesting manifolds? The answer is with a coordinate projection.
Neural networks based on topological data analysis (TDA) use tools such as persistent
homology to learn topological signatures of data and stabilize training but may not be uni-
versal approximators or have stable inverses. Other architectures universally approximate
data distributions on submanifolds but only when the latter are given by a single chart, mak-
ing them unable to learn maps that change topology. By exploiting the topological parallels
between locally bilipschitz maps, covering spaces, and local homeomorphisms, and by using
universal approximation arguments from machine learning, we find that a novel network of
the form T p◦ E, where Eis an injective network, pa fixed coordinate projection, and T
a bijective network, is a universal approximator of local diffeomorphisms between compact
smooth submanifolds embedded in Rn. We emphasize the case when the target map changes
topology. Further, we find that by constraining the projection p, multivalued inversions of
our networks can be computed without sacrificing universality. As an application, we show
that learning a group invariant function with unknown group action naturally reduces to
the question of learning local diffeomorphisms for finite groups. Our theory permits us to
recover orbits of the group action. We also outline possible extensions of our architecture
to address molecular imaging of molecules with symmetries. Finally, our analysis informs
the choice of topologically expressive starting spaces in generative problems.
1 Introduction
Topology is key in machine learning applications, from generative modeling, classification and autoencoding
to applications in physics such as gauge field theory and occurrence of topological excitations. Here we
describe a neural network architecture which is a universal approximator of locally stable maps between
topological manifolds. In contrast to classical universal architectures, like the multilayer perceptron (MLP),
the architecture studied in this work is built with forward and inverse stability in mind. We emphasize that
our network applies to the case when the topology of the manifolds are not known a priori.
Department of Mathematics and Statistics, South Dakota State University, Chicoine Hall, Box 2225, Brookings,
SD 57007 michael.puthawala@sdstate.edu
Department of Mathematics and Statistics, University of Helsinki, FI-00014 Helsinki, Finland
matti.lassas@helsinki.fi
Department of Mathematics and Computer Science, University of Basel, Peterspl. 1, 4001 Basel, Switzerland
ivan.dokmanic@unibas.ch
§Department of Mathematics and Statistics, University of Helsinki, FI-00014 Helsinki, Finland
pekka.pankka@helsinki.fi
Computational and Applied Mathematics and Earth Science, Rice University, Houston, TX 77005, USA
mdehoop@rice.edu
arXiv:2210.00577v1 [cs.LG] 2 Oct 2022
A preprint - October 4, 2022
Proving that specific deep network architectures are universal approximators of broad classes of functions
have been long studied with much progress in recent years. Beginning with [21] and [42], shallow networks
formed with ReLU or sigmoid activation functions were shown to be universal approximators of continuous
functions on compact subsets in Rn. Recently an effort has emerged to extend the existing work to more
specialized problems where other properties, for example monotonicity [43] or stability of inversion [71], are
desired in addition to universality.
In parallel with these developments, manifold learning has emerged as a vibrant subfield of machine learning.
Manifold learning is guided by the manifold hypothesis, the mantra that “high dimensional data are usually
clustered around a low-dimensional manifold” [84]. This in turn begat the subfield of manifold learning
[6, 7, 17, 28, 51, 54, 72, 74, 76, 83, 86, 87]. The guiding principle is that a useful network needn’t (and
often shouldn’t!) operate on all possible values in data space. Instead, it is better to use the ansatz that
one should manipulate data that lies on or near a low-dimensional manifold in data space. The manifold
hypothesis has helped guide network design in numerous applications, for example in classification (see
e.g. [52, 64, 72, 80, 90]) where data belonging to a fixed label is conceived of as being on a common
manifold, as well as both generative and encoding tasks, (see e.g. [12, 14, 20, 30, 49, 66, 67, 75, 78]
and [19, 22, 24, 46]) where the manifold hypothesis is used as an “existence proof” of a low-dimensional
parameterization of the data of interest. In the context of inverse problems, the manifold hypothesis can
be interpreted as a statement that forward operators map low-dimensional space to the high-dimensional
space of all possible measurements [1, 2, 3, 5, 44, 48, 50, 65, 79, 88]. The hypothesis has also been used
in Variational Autoencoder (VAE) and Generative Adversarial Networks (GAN) architectures for solving
inverse problems [2, 3, 4, 5, 31, 44, 50, 77, 79].
A natural question at the intersection of universality efforts and manifold learning is the following. What
kinds of architecture are universal approximators of maps between manifolds? In particular, which networks
are able to learn functions on manifolds if the manifolds are not known ahead of time? Some existing methods
that operate on the level of manifolds use tools such as persistent homology to learn homology groups of
data [15, 38, 39, 40, 41, 59]. In this work we look to learn mappings that are locally inverse stable. This
condition is not generally true for homotopies between manifolds, and so we are unable to use many tools in
TDA tools. Other approaches not utilizing TDA exist and are able to learn mappings between submanifolds,
but only when the manifolds to be learned have simple topology. That is, they only apply to manifolds
that are given by a single chart and so can’t learn mappings that change topology [12, 71, 48]. Because of
this latter limitation they are only able to apply to problems where the starting and target manifold are
different embeddings of the same manifold. In the context of generative models, this means that one has
to get the topology of the starting space “exactly right” in order to learn a pushforward mapping that has
inverse stability.
In order to combine the best of these two approaches, in this paper we look to learn mappings that are uni-
versal approximators of mappings between manifolds that are locally diffeomorphisms but globally complex.
A practical gain of such an approach can be seen in the generative context. One no longer has to get the
topology of the starting space ‘exactly right’ to insure that a suitable stable forward map exists. We also
present a result that establishes mathematical parallels between problems in invariant network design and
cryogenic electron microscopy (cryo-EM) where learning a mapping that changes topology arises naturally.
1.1 Network Description
For families of functions Hand Gwith compatible domain and range we use a well-tuned shorthand notation
and write H ◦ G :={hg:h∈ H, g ∈ G} to denote their pair-wise composition. We introduce extension-
projection networks which are of the form
TpE, where T T , E ∈ E (1)
and T C(Rn`,Rn`) is a family of homeomorphism T:Rn`Rn`,pis a fixed projector, and Eis a family
of networks of the form E:=TnL
L◦ RnL1,nL
L◦ · · · ◦ T n1
1◦ Rn0,n1
1◦ T n0
0where Rn`1,n`
`C(Rn`1,Rn`) are
injective, Tn`
`C(Rn`,Rn`) are homeomorphisms, LN,n0=n,nL=m, and n`n`1for `= 1, . . . , L.
Examples of specific choices of Rin the definition of Einclude zero padding, multiplication by full-rank
matrix, injective ReLU layers or injective ReLU networks. Choices of the networks Tinclude Coupling flows
[23, 24, 47] or Autoregressive flows [45, 43]. For an extended discussion of these choices, please see [71,
Section 2].
2
A preprint - October 4, 2022
1.2 Comparison to Prior Work
In this subsection, we describe how this work is related to prior work in Topological Data Analysis (TDA),
simplicial flow networks and group invariant/equivariant networks.
1.2.1 Topological Data Analysis
For a general survey of topological machine learning methods we refer to [35]. Many approaches that
stem from TDA use information gained directly from data, for example a persistence diagram [16], and
use this data as either a regularizer (in the form of a loss function) or as a prior for architecture design
[15, 38, 39, 40, 41, 59]. These works are closest to ours in terms of design, but fundamentally look to answer
different questions than the ones studied in this work. TDA tools that use homology groups are sensitive
enough to detect if two manifolds have the same homotopy type, but not sensitive enough to determine if
they are homeomorphic.Two manifolds that are homeomorphic always have the same homotopy type, but
the converse is not true. The converse direction follows by comparing the unit interval to a single point.
Replacing the need for homeomorphisms with the need for homotopies means that TDA based approaches
are unable to enjoy the theoretical guarantees of our work, in particular inverse stability and universality.
TDA based works generally do not prove universality of their networks nor do they guarantee that network
inversion is stable.
1.2.2 Simplicial Flow Networks
There is much work designing and developing the theory for networks that learn maps between graphs
and simplicial complexes [25, 69]. If two manifolds are triangulable, then functions between them can be
learning a function between their triangulation with simplicial networks. The work presented here does not
require access to a triangulation to be a universal approximator. Although all manifolds that we consider
are triangulable, a fact which is important for our proof, the triangulation is not actually necessary for the
statement of the final theorem 3. Simplicial networks have the advantage that they do not require any
dimensionality restrictions, while our results do.
1.2.3 Group Invariant/Equivariant networks
Finally, we describe a connection between group invariant networks and the work presented here. Partially
motivated by the success of convolutional networks applied to image data, there has recently been much
development and analysis of group invariant and group equivariant networks [11, 89, 63, 10, 70, 58, 55]. For
a group Σ with action gσ:M1→ M1defined over M1a function f:M1→ M2is called Σ-invariant if
f(gσx) = f(x) for all x∈ M1,σΣ.
If the symmetry group Σ is known then we can design a network that enforces and exploits this symmetry
by architecture choice [10] or by averaging over the group action [70]. Conceptually, we may consider
both similar in so far as they both approximate an fof the form f=fM1/ΣπΣ:M1→ M2where
πΣ:M1→ M1/Σ projects M1onto the orbits of M1under gΣ.
Learning functions f:M1→ M2that are invariant w.r.t. some group action gΣon M1is closely related to
the idea of learning local homeomorphisms between M1and M2. If Σ is finite with group action gΣthat
is d-to-one and smooth enough then, as presented in Lemma 2, gΣis a local diffeomorphism. Thus it can
be approximated using the networks studied here. Further, because our networks are built with inversion
in mind we can recover orbits of M1under the action of Σ, see Corollary 5. In this way, our network is a
universal solver for the ‘finite blind invariance problem. To this point, we don’t intend to offer our network
as a drop-in replacement to existing Σ invariant networks. Instead it offers a different perspective on a
closely related problem.
1.3 Our Contribution
In this work we show that extension-projection networks of the form 1 are universal approximators of
local diffeomorphisms between smooth compact manifolds. The problem has two parts. In the first, we
show that by extending the analysis of [71], we can universally approximate any embedding of a smooth,
compact (topologically complex) nmanifold. In the second, we show that we approximate mappings that
globally change topology between manifolds, but locally are diffeomorphisms. The latter part is the more
difficult and novel, and so is the main focus of this work. By using topological arguments, in particular
3
A preprint - October 4, 2022
the parallels between local diffeomorphisms and covering maps, we find that local diffeomorphisms (locally
one-to-one, globally many-to-one) can be lifted to diffeomorphisms (globally one-to-one) in a sufficiently
high dimensional space. Further, this lifting can always be done so that the inverse lifting (a projection) is
a coordinate projection. This is the main content of Theorem 2. The projection is simple enough that it
may be inverted. By approximating the lifting, projection, and one final embedding with existing networks,
we find that the architecture end-to-end is a universal approximator of local diffeomorphisms and maintains
the novel inversion property, Theorem 3.
We also consider applications of extension-projection networks in the design of group-invariant networks, the
choice of starting spaces in generative problems, and describe a promising connection between the problem
of cryogenic electron microscopy (cryo-EM) when the sample to be imaged possesses an unknown symmetry.
2 Theoretical development
The manifolds that we consider here are smooth, compact n-dimensional and embedded in Rmfor some
m > n. In particular, we pose the question of approximating a surjection f:M1→ M2with a network of
the form 1. Before presenting our results, we point readers to Appendix A for a definition of terms used in
this manuscript.
2.1 Bistable approximations and local homeomorphisms
We wish to introduce architectures that are universal approximators and have properties which are necessary
or desirable in practice. In particular, we consider approximations that locally have an inverse that is stable.
We call this mode of approximation a bistable approximation. To define this concept, we recall that when
M ⊂ Rmis a C-smooth submanifold, the reach of M, denoted Reach(M), is the supremum of all r > 0
such that for all xin the r-neighborhood UrRmof Mthere is the unique nearest point y∈ M. We
denote the nearest point by y=PM(x). We note that for 0 < r < Reach(M), the map PM:Ur→ M is
C-smooth [27]. For a point xUrthe pair (y, v) of the nearest point y=PM(x) and the normal vector
v=xPM(x)NyMof Mat yform tubular coordinates of the point x.
Definition 1 (Bistable approximation).Let F locdif1(M1,Rm2), and gC(M1,M2) where gis sur-
jective for n-dimensional submanifolds M1Rm1and M2Rm2. We say that Fhas a bistable uniform
approximator of g, if there is an M > 0, so that for any 0 <<Reach(M2) there is an f∈ F so that for
all x∈ M1the following hold
kf(x)g(x)kRm2< , k∇f(x)kTxM1×Rm2M, k((PM2f)(x))1kTPM2(f(x)) M2×TxM1M. (2)
For a family of functions G, we say that Fis a bistable uniform approximator of G, if it has a bistable
uniform approximator for each g∈ G. We call a sequence (fn)
n=1 ⊂ F a bistable uniform approximating
sequence if there is an M > 0 and (n)
n=1 such that limn→∞ n= 0 and for all x∈ M1and nZ+we have
k((PM2fn))1kTPM2(fn(x))M2×TxM1M,k∇fn(x)kTxM1×Rm2Mand kfn(x)g(x)kRm2< n.
A neural network architecture Fundergoing training to approximate a function gcan be formalized as a
bistable uniform approximating sequence (fn)
n=1, where fngin the limit. In this formalization, we let
f1be the network after being trained for one epoch, f2be the network trained for two epochs, etc. The
three terms in Eqn. 2 have a natural interpretation in this formalism.
Requiring kf(x)g(x)kRm2<  forces an approximating sequence to be a uniform approximator on compact
sets and justifies saying that fapproximates g.
The k∇f(x)kTxM1×Rm2Mterm requires that good approximations to gare stable, and in particular
penalizes convergence to discontinuous or ‘kinked’ functions. If, for a given value of the weights, a network
is Lipschitz (as is the case for deep feed-forward networks with ReLU or sigmoid activation functions), then
uniform convergence of fnto a gwith unbounded gradient implies that k∇fn(x)kdiverges as n→ ∞. If the
weights of a neural network diverge as the network is trained, this is generally thought to be undesirable.
Thus requiring k∇fn(x)kto be bounded permits only ‘good’ convergence.
The k((PM2f)(x))1kTPM2(f(x))M2×TxM1term enforces a locally stable inverse of an approximation and
has a meaning in the context of Bayesian Uncertainty Quantification (Bayesian UQ). In Bayesian UQ, the
uncertainty associated with an approximation is evaluated by computing a change of variables term. To
4
A preprint - October 4, 2022
Figure 1: In the space R5there is a genus three surface Σ3and in the space R3there is a genus two surface
Σ2such that projection p3:R5R3is a covering map (and thus a local diffeomorphism) p3: Σ3Σ2.
These surfaces can be obtained by taking curves that are deformed to surfaces by replacing all points of the
curves by circles. The figure gives idea of the construction of such curves. In the figure we have a deformation
h:R3R3that maps a curve S3R3to a self-intersecting curve h(S3)R3and projection p:R3R2
that maps h(S3) to a curve S2R2. We add a forth dimension (time) in the space R3that contains S3,
and modify hin the time-variable to obtain a diffeomorphism ˜
h:R4R4so that ˜
S3=˜
h(S3× {0}) is a
non-self-intersecting by curve R4which projetion to the 3-dimensional space R3is h(S3). We add one more
dimension in all Euclidean spaces and obtain a diffeomorphism ˜
H:R4+1 R4+1,˜
H(x, t) = (˜
h(x), t) and
the projection p3:R4+1 R2+1 such that p3˜
H:˜
S3× {0} → S2× {0}. Finally, we replace all points of
the curves by circles (analogously to the Penrose diagrams in general relativity) to deform the curves ˜
H(˜
S3)
and S2× {0}to surfaces Σ2R2+1 and Σ3R4+1 having the genus 2 and 3, respectively.
estimate this term, it is necessary to compute, or suitably approximate, the inverse gradient. Thus, by
requiring k((PM2f)(x))1kTPM2(f(x))M2×TxM1to be bounded, we require that this change of variables
has determinant bounded away from zero, and so the model admits Bayesian UQ that doesn’t deteriorate
in the limit.
For a family F locdif1(M1,Rm2), the notation Fdenotes the closure of Funder bistable uniform approx-
imation. That is, F={gC(M1,M2): Fhas a bistable uniform approximator for g}.
With the definition given above we present our first result, which describes what kinds of functions admit
bistable uniform approximators by any function class.
Lemma 1 (Closure of Bistable Uniform Approximation).If M1is compact then limit points of bistable
uniform approximating sequences are locally bilipschitz.
The proof for Lemma 1 is given in the Appendix in B.2 and implies the following
Corollary 1 (Bistable Uniform Approximations are Local Homeomorphisms).If M1is compact, then
locdif1(M1,M2)lochom(M1,M2)W1,2(M1,M2). The set W1,2(M1,M2)refers to functions that are
weakly differentiable with L2(M1,M2)derivative.
The proof for Corollary 1 is given in the Appendix in B.3. The fact that locdif1(M1,Rm2)
lochom(M1,M2) is key to the subsequent developments of this paper. Before moving onto those de-
velopments, we give an example of manifolds to illustrate the difference between hom(M1,M2) and
lochom(M1,M2), a distinction which is key to understanding the contribution of our paper.
Example 1 (Examples of Homeomorphic and Locally Homeomorphic Manifolds).Homeomorphisms pre-
serve homotopy type so a surface of genus 3, S3, and genus 2, S2, are not homeomorphic. Nevertheless,
there is e.g. a two-sheet covering of S2by S3, and so there is a local homeomorphism from S3to S2. A
visualization of this covering is given in Figure 1. The covering is done in two steps, in the first the A local
homeomorphism from S3to S2is given explicitly in Appendix B.2.
Lemma 1 and Corollary 1 show that limiting sequences of bistable uniform approximating sequences are
always at least local homeomorphisms, and are in general weakly differentiable. A natural question is if
they are always local diffeomorphisms. They are not. An example of a nonsmooth function that can be
approximated bistably is given in Appendix B.4.
Any C1diffeomorphism can be uniformly approximated by Cdiffeomorphisms [36, Theorem 2.7], hence
uniform approximation results true of C1diffeomorphisms, are automatically true of Cones as well.
5
摘要:

DeepInvertibleApproximationofTopologicallyRichMapsbetweenManifoldsAPreprintMichaelPuthawalaMattiLassasyIvanDokmaniczPekkaPankkaxMaartenV.deHoop{October4,2022AbstractHowcanwedesignneuralnetworksthatallowforstableuniversalapproximationofmapsbetweentopologicallyinterestingmanifolds?Theansweriswithaco...

展开>> 收起<<
Deep Invertible Approximation of Topologically Rich Maps between Manifolds A Preprint.pdf

共25页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:25 页 大小:779.89KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 25
客服
关注