Deep Invertible Approximation of Topologically Rich Maps between Manifolds A Preprint

2025-05-06 0 0 779.89KB 25 页 10玖币

侵权投诉

Deep Invertible Approximation of Topologically

Rich Maps between Manifolds

A Preprint

Michael Puthawala ∗Matti Lassas †Ivan Dokmani´c ‡Pekka Pankka §

Maarten V. de Hoop ¶

October 4, 2022

Abstract

How can we design neural networks that allow for stable universal approximation of maps

between topologically interesting manifolds? The answer is with a coordinate projection.

Neural networks based on topological data analysis (TDA) use tools such as persistent

homology to learn topological signatures of data and stabilize training but may not be uni-

versal approximators or have stable inverses. Other architectures universally approximate

data distributions on submanifolds but only when the latter are given by a single chart, mak-

ing them unable to learn maps that change topology. By exploiting the topological parallels

between locally bilipschitz maps, covering spaces, and local homeomorphisms, and by using

universal approximation arguments from machine learning, we ﬁnd that a novel network of

the form T ◦ p◦ E, where Eis an injective network, pa ﬁxed coordinate projection, and T

a bijective network, is a universal approximator of local diﬀeomorphisms between compact

smooth submanifolds embedded in Rn. We emphasize the case when the target map changes

topology. Further, we ﬁnd that by constraining the projection p, multivalued inversions of

our networks can be computed without sacriﬁcing universality. As an application, we show

that learning a group invariant function with unknown group action naturally reduces to

the question of learning local diﬀeomorphisms for ﬁnite groups. Our theory permits us to

recover orbits of the group action. We also outline possible extensions of our architecture

to address molecular imaging of molecules with symmetries. Finally, our analysis informs

the choice of topologically expressive starting spaces in generative problems.

1 Introduction

Topology is key in machine learning applications, from generative modeling, classiﬁcation and autoencoding

to applications in physics such as gauge ﬁeld theory and occurrence of topological excitations. Here we

describe a neural network architecture which is a universal approximator of locally stable maps between

topological manifolds. In contrast to classical universal architectures, like the multilayer perceptron (MLP),

the architecture studied in this work is built with forward and inverse stability in mind. We emphasize that

our network applies to the case when the topology of the manifolds are not known a priori.

∗Department of Mathematics and Statistics, South Dakota State University, Chicoine Hall, Box 2225, Brookings,

SD 57007 michael.puthawala@sdstate.edu

†Department of Mathematics and Statistics, University of Helsinki, FI-00014 Helsinki, Finland

matti.lassas@helsinki.fi

‡Department of Mathematics and Computer Science, University of Basel, Peterspl. 1, 4001 Basel, Switzerland

ivan.dokmanic@unibas.ch

§Department of Mathematics and Statistics, University of Helsinki, FI-00014 Helsinki, Finland

pekka.pankka@helsinki.fi

¶Computational and Applied Mathematics and Earth Science, Rice University, Houston, TX 77005, USA

mdehoop@rice.edu

arXiv:2210.00577v1 [cs.LG] 2 Oct 2022

A preprint - October 4, 2022

Proving that speciﬁc deep network architectures are universal approximators of broad classes of functions

have been long studied with much progress in recent years. Beginning with [21] and [42], shallow networks

formed with ReLU or sigmoid activation functions were shown to be universal approximators of continuous

functions on compact subsets in Rn. Recently an eﬀort has emerged to extend the existing work to more

specialized problems where other properties, for example monotonicity [43] or stability of inversion [71], are

desired in addition to universality.

In parallel with these developments, manifold learning has emerged as a vibrant subﬁeld of machine learning.

Manifold learning is guided by the manifold hypothesis, the mantra that “high dimensional data are usually

clustered around a low-dimensional manifold” [84]. This in turn begat the subﬁeld of manifold learning

[6, 7, 17, 28, 51, 54, 72, 74, 76, 83, 86, 87]. The guiding principle is that a useful network needn’t (and

often shouldn’t!) operate on all possible values in data space. Instead, it is better to use the ansatz that

one should manipulate data that lies on or near a low-dimensional manifold in data space. The manifold

hypothesis has helped guide network design in numerous applications, for example in classiﬁcation (see

e.g. [52, 64, 72, 80, 90]) where data belonging to a ﬁxed label is conceived of as being on a common

manifold, as well as both generative and encoding tasks, (see e.g. [12, 14, 20, 30, 49, 66, 67, 75, 78]

and [19, 22, 24, 46]) where the manifold hypothesis is used as an “existence proof” of a low-dimensional

parameterization of the data of interest. In the context of inverse problems, the manifold hypothesis can

be interpreted as a statement that forward operators map low-dimensional space to the high-dimensional

space of all possible measurements [1, 2, 3, 5, 44, 48, 50, 65, 79, 88]. The hypothesis has also been used

in Variational Autoencoder (VAE) and Generative Adversarial Networks (GAN) architectures for solving

inverse problems [2, 3, 4, 5, 31, 44, 50, 77, 79].

A natural question at the intersection of universality eﬀorts and manifold learning is the following. What

kinds of architecture are universal approximators of maps between manifolds? In particular, which networks

are able to learn functions on manifolds if the manifolds are not known ahead of time? Some existing methods

that operate on the level of manifolds use tools such as persistent homology to learn homology groups of

data [15, 38, 39, 40, 41, 59]. In this work we look to learn mappings that are locally inverse stable. This

condition is not generally true for homotopies between manifolds, and so we are unable to use many tools in

TDA tools. Other approaches not utilizing TDA exist and are able to learn mappings between submanifolds,

but only when the manifolds to be learned have simple topology. That is, they only apply to manifolds

that are given by a single chart and so can’t learn mappings that change topology [12, 71, 48]. Because of

this latter limitation they are only able to apply to problems where the starting and target manifold are

diﬀerent embeddings of the same manifold. In the context of generative models, this means that one has

to get the topology of the starting space “exactly right” in order to learn a pushforward mapping that has

inverse stability.

In order to combine the best of these two approaches, in this paper we look to learn mappings that are uni-

versal approximators of mappings between manifolds that are locally diﬀeomorphisms but globally complex.

A practical gain of such an approach can be seen in the generative context. One no longer has to get the

topology of the starting space ‘exactly right’ to insure that a suitable stable forward map exists. We also

present a result that establishes mathematical parallels between problems in invariant network design and

cryogenic electron microscopy (cryo-EM) where learning a mapping that changes topology arises naturally.

1.1 Network Description

For families of functions Hand Gwith compatible domain and range we use a well-tuned shorthand notation

and write H ◦ G :={h◦g:h∈ H, g ∈ G} to denote their pair-wise composition. We introduce extension-

projection networks which are of the form

T◦p◦E, where T∈ T , E ∈ E (1)

and T ⊂ C(Rn`,Rn`) is a family of homeomorphism T:Rn`→Rn`,pis a ﬁxed projector, and Eis a family

of networks of the form E:=TnL

L◦ RnL−1,nL

L◦ · · · ◦ T n1

1◦ Rn0,n1

1◦ T n0

0where Rn`−1,n`

`⊂C(Rn`−1,Rn`) are

injective, Tn`

`⊂C(Rn`,Rn`) are homeomorphisms, L∈N,n0=n,nL=m, and n`≥n`−1for `= 1, . . . , L.

Examples of speciﬁc choices of Rin the deﬁnition of Einclude zero padding, multiplication by full-rank

matrix, injective ReLU layers or injective ReLU networks. Choices of the networks Tinclude Coupling ﬂows

[23, 24, 47] or Autoregressive ﬂows [45, 43]. For an extended discussion of these choices, please see [71,

Section 2].

A preprint - October 4, 2022

1.2 Comparison to Prior Work

In this subsection, we describe how this work is related to prior work in Topological Data Analysis (TDA),

simplicial ﬂow networks and group invariant/equivariant networks.

1.2.1 Topological Data Analysis

For a general survey of topological machine learning methods we refer to [35]. Many approaches that

stem from TDA use information gained directly from data, for example a persistence diagram [16], and

use this data as either a regularizer (in the form of a loss function) or as a prior for architecture design

[15, 38, 39, 40, 41, 59]. These works are closest to ours in terms of design, but fundamentally look to answer

diﬀerent questions than the ones studied in this work. TDA tools that use homology groups are sensitive

enough to detect if two manifolds have the same homotopy type, but not sensitive enough to determine if

they are homeomorphic.Two manifolds that are homeomorphic always have the same homotopy type, but

the converse is not true. The converse direction follows by comparing the unit interval to a single point.

Replacing the need for homeomorphisms with the need for homotopies means that TDA based approaches

are unable to enjoy the theoretical guarantees of our work, in particular inverse stability and universality.

TDA based works generally do not prove universality of their networks nor do they guarantee that network

inversion is stable.

1.2.2 Simplicial Flow Networks

There is much work designing and developing the theory for networks that learn maps between graphs

and simplicial complexes [25, 69]. If two manifolds are triangulable, then functions between them can be

learning a function between their triangulation with simplicial networks. The work presented here does not

require access to a triangulation to be a universal approximator. Although all manifolds that we consider

are triangulable, a fact which is important for our proof, the triangulation is not actually necessary for the

statement of the ﬁnal theorem 3. Simplicial networks have the advantage that they do not require any

dimensionality restrictions, while our results do.

1.2.3 Group Invariant/Equivariant networks

Finally, we describe a connection between group invariant networks and the work presented here. Partially

motivated by the success of convolutional networks applied to image data, there has recently been much

development and analysis of group invariant and group equivariant networks [11, 89, 63, 10, 70, 58, 55]. For

a group Σ with action gσ:M1→ M1deﬁned over M1a function f:M1→ M2is called Σ-invariant if

f(gσx) = f(x) for all x∈ M1,σ∈Σ.

If the symmetry group Σ is known then we can design a network that enforces and exploits this symmetry

by architecture choice [10] or by averaging over the group action [70]. Conceptually, we may consider

both similar in so far as they both approximate an fof the form f=fM1/Σ◦πΣ:M1→ M2where

πΣ:M1→ M1/Σ projects M1onto the orbits of M1under gΣ.

Learning functions f:M1→ M2that are invariant w.r.t. some group action gΣon M1is closely related to

the idea of learning local homeomorphisms between M1and M2. If Σ is ﬁnite with group action gΣthat

is d-to-one and smooth enough then, as presented in Lemma 2, gΣis a local diﬀeomorphism. Thus it can

be approximated using the networks studied here. Further, because our networks are built with inversion

in mind we can recover orbits of M1under the action of Σ, see Corollary 5. In this way, our network is a

universal solver for the ‘ﬁnite blind invariance problem.’ To this point, we don’t intend to oﬀer our network

as a drop-in replacement to existing Σ invariant networks. Instead it oﬀers a diﬀerent perspective on a

closely related problem.

1.3 Our Contribution

In this work we show that extension-projection networks of the form 1 are universal approximators of

local diﬀeomorphisms between smooth compact manifolds. The problem has two parts. In the ﬁrst, we

show that by extending the analysis of [71], we can universally approximate any embedding of a smooth,

compact (topologically complex) nmanifold. In the second, we show that we approximate mappings that

globally change topology between manifolds, but locally are diﬀeomorphisms. The latter part is the more

diﬃcult and novel, and so is the main focus of this work. By using topological arguments, in particular

A preprint - October 4, 2022

the parallels between local diﬀeomorphisms and covering maps, we ﬁnd that local diﬀeomorphisms (locally

one-to-one, globally many-to-one) can be lifted to diﬀeomorphisms (globally one-to-one) in a suﬃciently

high dimensional space. Further, this lifting can always be done so that the inverse lifting (a projection) is

a coordinate projection. This is the main content of Theorem 2. The projection is simple enough that it

may be inverted. By approximating the lifting, projection, and one ﬁnal embedding with existing networks,

we ﬁnd that the architecture end-to-end is a universal approximator of local diﬀeomorphisms and maintains

the novel inversion property, Theorem 3.

We also consider applications of extension-projection networks in the design of group-invariant networks, the

choice of starting spaces in generative problems, and describe a promising connection between the problem

of cryogenic electron microscopy (cryo-EM) when the sample to be imaged possesses an unknown symmetry.

2 Theoretical development

The manifolds that we consider here are smooth, compact n-dimensional and embedded in Rmfor some

m > n. In particular, we pose the question of approximating a surjection f:M1→ M2with a network of

the form 1. Before presenting our results, we point readers to Appendix A for a deﬁnition of terms used in

this manuscript.

2.1 Bistable approximations and local homeomorphisms

We wish to introduce architectures that are universal approximators and have properties which are necessary

or desirable in practice. In particular, we consider approximations that locally have an inverse that is stable.

We call this mode of approximation a bistable approximation. To deﬁne this concept, we recall that when

M ⊂ Rmis a C∞-smooth submanifold, the reach of M, denoted Reach(M), is the supremum of all r > 0

such that for all xin the r-neighborhood Ur⊂Rmof Mthere is the unique nearest point y∈ M. We

denote the nearest point by y=PM(x). We note that for 0 < r < Reach(M), the map PM:Ur→ M is

C∞-smooth [27]. For a point x∈Urthe pair (y, v) of the nearest point y=PM(x) and the normal vector

v=x−PM(x)∈NyMof Mat yform tubular coordinates of the point x.

Deﬁnition 1 (Bistable approximation).Let F ⊂ locdif1(M1,Rm2), and g∈C(M1,M2) where gis sur-

jective for n-dimensional submanifolds M1⊂Rm1and M2⊂Rm2. We say that Fhas a bistable uniform

approximator of g, if there is an M > 0, so that for any 0 <<Reach(M2) there is an f∈ F so that for

all x∈ M1the following hold

kf(x)−g(x)kRm2< , k∇f(x)kTxM1×Rm2≤M, k(∇(PM2◦f)(x))−1kTPM2(f(x)) M2×TxM1≤M. (2)

For a family of functions G, we say that Fis a bistable uniform approximator of G, if it has a bistable

uniform approximator for each g∈ G. We call a sequence (fn)∞

n=1 ⊂ F a bistable uniform approximating

sequence if there is an M > 0 and (n)∞

n=1 such that limn→∞ n= 0 and for all x∈ M1and n∈Z+we have

k(∇(PM2◦fn))−1kTPM2(fn(x))M2×TxM1≤M,k∇fn(x)kTxM1×Rm2≤Mand kfn(x)−g(x)kRm2< n.

A neural network architecture Fundergoing training to approximate a function gcan be formalized as a

bistable uniform approximating sequence (fn)∞

n=1, where fn→gin the limit. In this formalization, we let

f1be the network after being trained for one epoch, f2be the network trained for two epochs, etc. The

three terms in Eqn. 2 have a natural interpretation in this formalism.

Requiring kf(x)−g(x)kRm2<  forces an approximating sequence to be a uniform approximator on compact

sets and justiﬁes saying that fapproximates g.

The k∇f(x)kTxM1×Rm2≤Mterm requires that good approximations to gare stable, and in particular

penalizes convergence to discontinuous or ‘kinked’ functions. If, for a given value of the weights, a network

is Lipschitz (as is the case for deep feed-forward networks with ReLU or sigmoid activation functions), then

uniform convergence of fnto a gwith unbounded gradient implies that k∇fn(x)kdiverges as n→ ∞. If the

weights of a neural network diverge as the network is trained, this is generally thought to be undesirable.

Thus requiring k∇fn(x)kto be bounded permits only ‘good’ convergence.

The k(∇(PM2◦f)(x))−1kTPM2(f(x))M2×TxM1term enforces a locally stable inverse of an approximation and

has a meaning in the context of Bayesian Uncertainty Quantiﬁcation (Bayesian UQ). In Bayesian UQ, the

uncertainty associated with an approximation is evaluated by computing a change of variables term. To

A preprint - October 4, 2022

Figure 1: In the space R5there is a genus three surface Σ3and in the space R3there is a genus two surface

Σ2such that projection p3:R5→R3is a covering map (and thus a local diﬀeomorphism) p3: Σ3→Σ2.

These surfaces can be obtained by taking curves that are deformed to surfaces by replacing all points of the

curves by circles. The ﬁgure gives idea of the construction of such curves. In the ﬁgure we have a deformation

h:R3→R3that maps a curve S3⊂R3to a self-intersecting curve h(S3)⊂R3and projection p:R3→R2

that maps h(S3) to a curve S2⊂R2. We add a forth dimension (time) in the space R3that contains S3,

and modify hin the time-variable to obtain a diﬀeomorphism ˜

h:R4→R4so that ˜

S3=˜

h(S3× {0}) is a

non-self-intersecting by curve R4which projetion to the 3-dimensional space R3is h(S3). We add one more

dimension in all Euclidean spaces and obtain a diﬀeomorphism ˜

H:R4+1 →R4+1,˜

H(x, t) = (˜

h(x), t) and

the projection p3:R4+1 →R2+1 such that p3◦˜

H:˜

S3× {0} → S2× {0}. Finally, we replace all points of

the curves by circles (analogously to the Penrose diagrams in general relativity) to deform the curves ˜

H(˜

S3)

and S2× {0}to surfaces Σ2⊂R2+1 and Σ3⊂R4+1 having the genus 2 and 3, respectively.

estimate this term, it is necessary to compute, or suitably approximate, the inverse gradient. Thus, by

requiring k(∇(PM2◦f)(x))−1kTPM2(f(x))M2×TxM1to be bounded, we require that this change of variables

has determinant bounded away from zero, and so the model admits Bayesian UQ that doesn’t deteriorate

in the limit.

For a family F ⊂ locdif1(M1,Rm2), the notation Fdenotes the closure of Funder bistable uniform approx-

imation. That is, F={g∈C(M1,M2): Fhas a bistable uniform approximator for g}.

With the deﬁnition given above we present our ﬁrst result, which describes what kinds of functions admit

bistable uniform approximators by any function class.

Lemma 1 (Closure of Bistable Uniform Approximation).If M1is compact then limit points of bistable

uniform approximating sequences are locally bilipschitz.

The proof for Lemma 1 is given in the Appendix in B.2 and implies the following

Corollary 1 (Bistable Uniform Approximations are Local Homeomorphisms).If M1is compact, then

locdif1(M1,M2)⊂lochom(M1,M2)∩W1,2(M1,M2). The set W1,2(M1,M2)refers to functions that are

weakly diﬀerentiable with L2(M1,M2)derivative.

The proof for Corollary 1 is given in the Appendix in B.3. The fact that locdif1(M1,Rm2)⊂

lochom(M1,M2) is key to the subsequent developments of this paper. Before moving onto those de-

velopments, we give an example of manifolds to illustrate the diﬀerence between hom(M1,M2) and

lochom(M1,M2), a distinction which is key to understanding the contribution of our paper.

Example 1 (Examples of Homeomorphic and Locally Homeomorphic Manifolds).Homeomorphisms pre-

serve homotopy type so a surface of genus 3, S3, and genus 2, S2, are not homeomorphic. Nevertheless,

there is e.g. a two-sheet covering of S2by S3, and so there is a local homeomorphism from S3to S2. A

visualization of this covering is given in Figure 1. The covering is done in two steps, in the ﬁrst the A local

homeomorphism from S3to S2is given explicitly in Appendix B.2.

Lemma 1 and Corollary 1 show that limiting sequences of bistable uniform approximating sequences are

always at least local homeomorphisms, and are in general weakly diﬀerentiable. A natural question is if

they are always local diﬀeomorphisms. They are not. An example of a nonsmooth function that can be

approximated bistably is given in Appendix B.4.

Any C1diﬀeomorphism can be uniformly approximated by C∞diﬀeomorphisms [36, Theorem 2.7], hence

uniform approximation results true of C1diﬀeomorphisms, are automatically true of C∞ones as well.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DeepInvertibleApproximationofTopologicallyRichMapsbetweenManifoldsAPreprintMichaelPuthawalaMattiLassasyIvanDokmaniczPekkaPankkaxMaartenV.deHoop{October4,2022AbstractHowcanwedesignneuralnetworksthatallowforstableuniversalapproximationofmapsbetweentopologicallyinterestingmanifolds?Theansweriswithaco...

展开>> 收起<<

Deep Invertible Approximation of Topologically Rich Maps between Manifolds A Preprint.pdf

共25页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Deep Invertible Approximation of Topologically Rich Maps between Manifolds A Preprint

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: