Topological Continual Learning with Wasserstein Distance and Barycenter Tananun Songdechakraiwut1 Xiaoshuang Yin2 and Barry D. Van Veen1

2025-05-06 1 0 806.18KB 25 页 10玖币

侵权投诉

Topological Continual Learning

with Wasserstein Distance and Barycenter

Tananun Songdechakraiwut1, Xiaoshuang Yin2, and Barry D. Van Veen1

1University of Wisconsin–Madison

2Google

Abstract

Continual learning in neural networks suﬀers from a phenomenon called

catastrophic forgetting, in which a network quickly forgets what was learned in

a previous task. The human brain, however, is able to continually learn new

tasks and accumulate knowledge throughout life. Neuroscience ﬁndings suggest

that continual learning success in the human brain is potentially associated with

its modular structure and memory consolidation mechanisms. In this paper

we propose a novel topological regularization that penalizes cycle structure

in a neural network during training using principled theory from persistent

homology and optimal transport. The penalty encourages the network to learn

modular structure during training. The penalization is based on the closed-

form expressions of the Wasserstein distance and barycenter for the topological

features of a 1-skeleton representation for the network. Our topological continual

learning method combines the proposed regularization with a tiny episodic

memory to mitigate forgetting. We demonstrate that our method is eﬀective in

both shallow and deep network architectures for multiple image classiﬁcation

datasets.

arXiv:2210.02661v1 [cs.LG] 6 Oct 2022

1 Introduction

Neural networks can be trained to achieve impressive performance on a variety of

learning tasks. However, when an already trained network is further trained on a

new task, a phenomenon called catastrophic forgetting [McCloskey and Cohen,1989]

occurs, in which previously learned tasks are quickly forgotten with additional training.

The human brain, however, is able to continually learn new tasks and accumulate

knowledge throughout life without signiﬁcant loss of previously learned skills. The

biological mechanisms behind this trait are not yet fully understood. Neuroscience

ﬁndings suggest that the principle of modularity [Hart and Giszter,2010] may play

an important role. Modular structures [Simon,1962] are aggregates of modules

(components) that perform speciﬁc functions without perturbing one another. Human

brains have been characterized by modular structures during learning [Finc et al.,

2020]. Such structures are hypothesized to reduce the interdependence of components,

enhance robustness, and facilitate learning [Bassett et al.,2011]. Another important

learning mechanism is the hippocampal and neocortical interaction responsible for

memory consolidation [McGaugh,2000]. In particular, the hippocampal system

encodes recent experiences that later are replayed multiple times before consolidation

as episodic memory in the neocortex [Klinzing et al.,2019]. The interplay between

the modularity principle and memory consolidation, among other mechanisms, is

potentially associated with continual learning success.

Persistent homology [Barannikov,1994,Edelsbrunner et al.,2000,Wasserman,

2018] has emerged as a tool for understanding, characterizing and quantifying the

topology of brain networks. For example, it has been used to evaluate biomarkers

of the neural basis of consciousness [Songdechakraiwut et al.,2022] and the impact

of twin genetics [Songdechakraiwut et al.,2021] by interpreting brain networks as

1-skeletons [Munkres,1996] of a simplicial complex. The topology of a 1-skeleton

is completely characterized by connected components and cycles [Songdechakraiwut

et al.,2022]. Brain networks naturally organize into modules or connected components

[Bullmore and Sporns,2009,Honey et al.,2007], while cycle structure is ubiquitous

and is often interpreted in terms of information propagation, redundancy and feedback

loops [Kwon and Cho,2007,Ozbudak et al.,2005,Venkatesh et al.,2004,Weiner

et al.,2002].

In this paper we show that persistent homology can be used to improve performance

of neural networks in continual learning tasks. In particular, we interpret a network

as a 1-skeleton and propose a novel topological regularization of the 1-skeleton’s cycle

structure to avoid catastrophic forgetting of previously learned tasks. Regularizing the

cycle structure allows the network to explicitly learn its complement, i.e., the modular

Figure 1: Schematic illustrating our topological continual learning approach. A tiny

episodic memory replays past examples from previously learned tasks. The cycle

topology of a subset of layers (shaded) is regularized based on the Wasserstein distance

between the network and barycenter of previously learned networks to improve the

generalization of the learning and memory consolidation processes.

structure, through gradient optimization. Our approach is made computationally

eﬃcient by use of the closed form expressions for the Wasserstein barycenter and

the gradient of Wasserstein distance between network cycle structures presented in

[Songdechakraiwut et al.,2021,2022]. Figure 1illustrates that our proposed approach

employs topological regularization with a tiny episodic memory to mitigate forgetting

and facilitate learning. We evaluate our approach using image classiﬁcation across

multiple data sets and show that it generally improves classiﬁcation performance

compared to competing approaches in the challenging case of both shallow and deep

networks of limited width when faced with many learning tasks.

The paper is organized as follows. Eﬃcient computation of persistent homology for

neural networks is given in Section 2, and Section 3presents our topology-regularized

continual learning strategy. In Section 4, we compare our methods to multiple

baseline algorithms using four image classiﬁcation datasets. Section 5provides a brief

discussion of the relationship between the proposed approach and previously reported

methods for continual learning.

2 Eﬃcient Computation of Topology for Network

Graphs

2.1 Graph Filtration

Represent a neural network as an undirected weighted graph

= (

V, W

)with a set

of nodes

, and a set of edge weights

{wi,j }

. The number of nodes and weights

are denoted by

|V|

and

|W|

, respectively. Create a binary graph

G

with the identical

Figure 2: Illustration of graph ﬁltration. A toy neural network

showing its

maximum spanning tree (MST) as edges denoted by dark lines and a subnetwork

with non-MST edges shown as dashed lines. As the ﬁltration value increases, the

number of connected components

β0

monotonically increases while the number of

cycles

β1

monotonically decreases. Connected components are born at the MST edge

weights e2, e4, e5, e6while cycles die at the non-MST edge weights e1, e3.

node set Vby thresholding the edge weights so that an edge between nodes iand j

exists if

wi,j > 

. The binary graph is a simplicial complex consisting of only nodes

and edges known as a 1-skeleton [Munkres,1996]. As



increases, more and more

edges are removed from the network

, resulting in a nested sequence of 1-skeletons:

G0⊇G1⊇ · · · ⊇ Gk,(1)

where

0≤1≤ · · · ≤ k

are called ﬁltration values. This sequence of 1-skeletons is

called a graph ﬁltration [Lee et al.,2012]. Figure 2illustrates the graph ﬁltration of a

toy neural network.

2.2 Birth and Death Decomposition

The only non-trivial topological features in a 1-skeleton are connected components

(0-dimensional topological features) and cycles (1-dimensional topological features).

There are no higher-dimensional topological features in the 1-skeleton, in contrast to

clique complexes [Otter et al.,2017] and Rips complexes [Ghrist,2008,Zomorodian,

2010]. Persistent homology keeps track of the birth and death of topological features

over ﬁltration values



. If a topological feature is born at a ﬁltration value

and

persists up to a ﬁltration value

, then this feature is represented as a two-dimensional

persistence point (

bl, dl

)in a plane. The set of all points

{

(

bl, dl

)

is called persistence

diagram [Edelsbrunner and Harer,2008] or, equivalently, persistence barcode [Ghrist,

2008]. The use of the 1-skeleton simpliﬁes the persistence barcodes to one-dimensional

descriptors [Songdechakraiwut et al.,2021,2022]. Speciﬁcally, the representation of

the connected components can be simpliﬁed to a collection of birth values

(

) =

{bl}

and that of cycles to a collection of death values

(

) =

{dl}

. In addition, neural

networks of the same architecture have a birth set

and a death set

of the same

cardinality as

|V| −

1and

|W| −

(

|V| −

1), respectively. This result completely resolves

the problem of point mismatch in persistence barcodes for same-architecture neural

networks. The example network of Fig. 2has

{ei}6

i=1,B

(

) =

{e2, e4, e5, e6}

and

(

) =

{e1, e3}

(

)and

(

)can be identiﬁed very eﬃciently by computing

the maximum spanning tree (MST) of the network [Lee et al.,2012] in

(

nlog n

)

operations, where

is the number of edges in the network. Supplementary description

of the decomposition is provided in Appendix A.

2.3 Closed Form Wasserstein Distance and Gradient

The birth set of connected components and the death set of cycles are theoretically

shown to completely characterize the topology of a 1-skeleton network representation

as described in Section 2.2. The Wasserstein distance between 1-skeleton network

representations has a closed-form expression [Songdechakraiwut et al.,2022]. Here

we only consider the Wasserstein distance for cycle structure, which depends solely

on the death sets. Let

G, H

be two given networks based on the same architecture.

Their (squared) 2-Wasserstein distance for cycles is deﬁned as the optimal matching

cost between D(G)and D(H):

cycle(G, H) = min

φX

dl∈D(G)dl−φ(dl)2,(2)

where

is a bijection from

(

)to

(

). The Wasserstein distance form given in

(2) has a closed-form expression that allows for very eﬃcient computation as follows

[Songdechakraiwut et al.,2022]

cycle(G, H) = X

dl∈D(G)dl−φ∗(dl)2,(3)

where

φ∗

maps the

-th smallest death value in

(

)to the

-th smallest death

value in D(H)for all l.

In addition, the gradient of the Wasserstein distance for cycles

∇GW2

cycle

(

G, H

)

with respect to edge weights

wi,j ∈W

is given as a gradient matrix whose

i, j

-th entry

is [Songdechakraiwut et al.,2022]

∂W 2

cycle(G, H)

∂wi,j

=(0if wi,j ∈B(G);

2wi,j −φ∗(wi,j )if wi,j ∈D(G).(4)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TopologicalContinualLearningwithWassersteinDistanceandBarycenterTananunSongdechakraiwut1,XiaoshuangYin2,andBarryD.VanVeen11UniversityofWisconsinMadison2GoogleAbstractContinuallearninginneuralnetworkssuersfromaphenomenoncalledcatastrophicforgetting,inwhichanetworkquicklyforgetswhatwaslearnedinaprev...

展开>> 收起<<

Topological Continual Learning with Wasserstein Distance and Barycenter Tananun Songdechakraiwut1 Xiaoshuang Yin2 and Barry D. Van Veen1.pdf

共25页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Topological Continual Learning with Wasserstein Distance and Barycenter Tananun Songdechakraiwut1 Xiaoshuang Yin2 and Barry D. Van Veen1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: