Task-Free Continual Learning via Online Discrepancy Distance Learning Fei Ye and Adrian G. Bors

2025-04-24 0 0 1.54MB 14 页 10玖币

侵权投诉

Task-Free Continual Learning via

Online Discrepancy Distance Learning

Fei Ye and Adrian G. Bors

Department of Computer Science

University of York

York, YO10 5GH, UK

{fy689,adrian.bors}@york.ac.uk

Abstract

Learning from non-stationary data streams, also called Task-Free Continual Learn-

ing (TFCL) remains challenging due to the absence of explicit task information.

Although recently some methods have been proposed for TFCL, they lack theo-

retical guarantees. Moreover, forgetting analysis during TFCL was not studied

theoretically before. This paper develops a new theoretical analysis framework

which provides generalization bounds based on the discrepancy distance between

the visited samples and the entire information made available for training the model.

This analysis gives new insights into the forgetting behaviour in classiﬁcation tasks.

Inspired by this theoretical model, we propose a new approach enabled by the

dynamic component expansion mechanism for a mixture model, namely the Online

Discrepancy Distance Learning (ODDL). ODDL estimates the discrepancy be-

tween the probabilistic representation of the current memory buffer and the already

accumulated knowledge and uses it as the expansion signal to ensure a compact

network architecture with optimal performance. We then propose a new sample se-

lection approach that selectively stores the most relevant samples into the memory

buffer through the discrepancy-based measure, further improving the performance.

We perform several TFCL experiments with the proposed methodology, which

demonstrate that the proposed approach achieves the state of the art performance.

1 Introduction

Continual learning (CL) and its extension to lifelong learning, represents one of the most desired

functions in an artiﬁcial intelligence system, representing the capability of learning new concepts

while preserving the knowledge of past experiences [

]. Such an ability can be used in many

real-time applications such as robotics, health investigative systems, autonomous vehicles [

] or for

guiding agents exploring artiﬁcial (meta) universes, requiring adapting to a changing environment.

Unfortunately, modern deep learning models suffer from a degenerated performance on past data

after learning novel knowledge, a phenomenon called catastrophic forgetting [13].

A popular attempt to relieve forgetting in CL is by employing a small memory buffer to preserve

a few past samples and replay them when training on a new task [

]. However, when there

are restrictions on the available memory capacity, memory-based approaches would suffer from

degenerated performance on past tasks, especially when aiming to learn an inﬁnite number of tasks.

Recently, the Dynamic Expansion Model (DEM) [

] has shown promising results in CL, aiming to

guarantee optimal performance by preserving the previously learnt knowledge through the parameters

of frozen components trained on past data, while adding a new component when learning a novel

task. However, such approaches require knowing where and when the knowledge associated with a

given task is changed, which is not always applicable in a real environment.

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.06579v1 [cs.CV] 12 Oct 2022

In this paper, we address a more realistic scenario, called Task-Free Continual Learning (TFCL) [

where task identities are not available while the model can only access a small batch of samples

at a given time. Most existing CL methods requiring the task label can be adapted to TFCL by

removing the task information dependency. For instance, memory-based approaches can store a few

past samples from the data stream at each training time and replay them during later training steps

[

]. However, such an approach requires to carefully design the sample selection criterion to

avoid memory overload. The key challenge for the memory-based approach is the negative backward

transfer caused by the stored samples that interfere with the model’s updating with incoming samples

[

]. This issue can be relieved by DEM in which previously learnt samples are preserved into

frozen components and do not interfere with the learning of probabilistic representations of new

data [

]. However, these approaches do not provide any theoretical guarantees and there are no

studies analysing the trade-off between the model’s generalization and its complexity under TFCL.

Recent attempts have provided the theoretical analysis for CL from different perspectives including

the risk bound [

], NP-hard problem [

], Teacher-Student framework [

] and game theory

[

]. However all these approaches require strong assumptions, such as deﬁning the task identities,

which is not available in TFCL. This inspires us to bridge the gaps between the underlying theory

and the algorithm implementation for TFCL. We propose a theoretical classiﬁcation framework,

which provides new insights in the forgetting behaviour analysis and guidance for algorithm design

addressing catastrophic forgetting. The primary motivation behind the proposed theoretical framework

is that we can formulate forgetting as a generalization error in the domain adaptation theory. Based

on this analysis we extend the domain adaptation theory [

] to derive time-dependent generalization

risk bounds, explicitly explaining the forgetting process at each training step.

Inspired by the theory, we devise the Online Discrepancy Distance Learning (ODDL) method

which introduces a new expansion mechanism based on the discrepancy distance estimation for

implementing TFCL. The proposed expansion mechanism detects the data distribution shift by

evaluating the variance of the discrepancy distance during the training. This model enables a trade-off

mechanism between the model’s generalization and complexity. We also propose a new sample

selection approach based on the discrepancy-based criterion, which guides storing diverse samples

with respect to the already learnt knowledge, further improving performance. Our contributions are :

•

This paper is the ﬁrst research study to propose a new theoretical framework for TFCL, which

provides new insights into the forgetting behaviour of the model in classiﬁcation tasks.

•

Inspired by the theoretical analysis, we develop a novel dynamic expansion approach, which

ensures a compact model architecture enabled by optimal performance.

•

We propose a new sample selection approach that selects appropriate data samples for the memory

buffer, further improving performance.

• The proposed method achieves state of the art results on TFCL benchmarks,

2 Related works

Continual learning

deﬁnes a learning paradigm which aims to learn a sequence of tasks without

forgetting. Catastrophic forgetting is a major challenge in continual learning. One of the most popular

approaches to relieve forgetting is by imposing a regularization loss within the optimization procedure

[

], where the network’s parameters which are important

to the past tasks are penalized when updating. Another kind of approaches for continual learning

focuses on the memory system, which usually employs a small memory buffer [

]

to store a few past data or trains a generator to provide the replay samples when learning new tasks

[

]. However, these approaches usually rely on knowing the task information,

which is not applicable in TFCL.

Task-free continual learning

is a special scenario in CL where a model can only see one or very few

samples in each training step/time without having any task labels. Using a small memory buffer

to store past samples has shown beneﬁts for TFCL and was ﬁrstly investigated in [

]. This

memory replay approach was then extended by employing Generative Replay Mechanisms (GRMs)

for training both a Variational Autoencoder (VAEs) [

] and a classiﬁer, where a new retrieving

mechanism is used to select speciﬁc data samples, called the Maximal Interfered Retrieval (MIR), [

The Gradient Sample Selection (GSS) [

] is another sample selection approach that treats sample

selection as a constrained optimization reduction. More recently, a Learner-Evaluator framework

is proposed for TFCL, called the Continual Prototype Evolution (CoPE) [

] which stores the same

number of samples for each class in the memory in order to ensure the balance replay. Another

direction for the memory-based approaches is to edit the stored samples which would increase the

loss in the upcoming model updates, called the Gradient based Memory EDiting (GMED) [

], which

can be employed in the existing CL models to further enhance their performance.

Dynamic expansion models

aim to automatically increase the model’s capacity to adapt to new tasks

by adding new hidden layers and units. Such an approach, called the Continual Unsupervised

Representation Learning (CURL) [

], dynamically builds new inference models when meeting the

expansion criterion. However, since CURL still requires a GRM to relieve forgetting, it would lead

to a negative knowledge transfer when updating the network’s parameters to adapt to a new task.

This issue can be addressed by using Dirichlet processes by adding new components while freezing

all previously learnt members, in a model called the Continual Neural Dirichlet Process Mixture

(CN-DPM), [

]. However, the expansion criterion used by these approaches relies on the change of

the loss when training each time, which does not have any theoretical guarantees.

3 Theoretical analysis of TFCL

In this section, we ﬁrstly introduce learning settings and notations, and then we analyze the forgetting

behaviour for a single as well as for a dynamic expansion model by deriving their Generalization

Bounds (GBs).

3.1 Preliminary

Let

be the input space and

represent the output space which is

{−1,1}

for binary classi-

ﬁcation and

{1,2, . . . , n0}, n0>2

for multi-class classiﬁcation. Let

i={xT

j, yT

j}NT

j=1

and

i={xS

j, yS

j}NS

j=1

represent the training and testing sets for the

-th dataset where

j∈ X

and

j∈ Y

are the image and the associated ground truth label.

and

are the total number of

samples for

and

, respectively. In the paper, we mainly focus on the task-free class-incremental

learning, described as follows.

Deﬁnition 1.

(Data stream.)

For a given

-th training dataset

with

data categories, let us

consider a data stream

which consists of samples

t,j

from each category, expressed by

SCS

j=1 DS

t,j

. Let

t,j

represent the set of samples drawn from the

-th category of

. Let

t,j

and

t,j

represent the distributions for

t,j

and

t,j

, respectively. Let

PT,X

t,j

represent the marginal

distribution over X.

Deﬁnition 2.

(Learning setting.)

Let

represent the

-th training step. For a given data stream

, we assume that there are a total of

training steps for

, where each training step

associated with a small batch of paired samples

{Xb

i,Yb

drawn from

, expressed by

i=1{Xb

i,Yb

i},{Xb

i,Yb

i}∩{Xb

j,Yb

j}=∅

, where

i6=j

and a model (classiﬁer) can only

access

{Xb

i,Yb

while all previous batches are not available. After ﬁnishing all training steps,

we evaluate the classiﬁcation accuracy of the model on the testing set

. In the following, we deﬁne

the model and the memory buffer.

Deﬁnition 3.

(Model and memory.)

Let us consider

a model implemented by a classiﬁer, and

H={h|h:X → Y}

the space of classiﬁers. Let

be a memory buffer at

. We assume that

randomly removes samples from the memory buffer while continually adding new data to its

memory at each training step. Let

PMi

represent the probabilistic representation of

and

|Mi|

its

cardinality.

3.2 Measuring the distribution shift

In TFCL, the distance between the target domain (testing set) and the source domain (memory)

would be dynamically changed during each training step. We can use the discrepancy distance [

]

to measure this gap through the analysis of the model’s risk.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Task-FreeContinualLearningviaOnlineDiscrepancyDistanceLearningFeiYeandAdrianG.BorsDepartmentofComputerScienceUniversityofYorkYork,YO105GH,UK{fy689,adrian.bors}@york.ac.ukAbstractLearningfromnon-stationarydatastreams,alsocalledTask-FreeContinualLearn-ing(TFCL)remainschallengingduetotheabsenceofexplic...

展开>> 收起<<

Task-Free Continual Learning via Online Discrepancy Distance Learning Fei Ye and Adrian G. Bors.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Task-Free Continual Learning via Online Discrepancy Distance Learning Fei Ye and Adrian G. Bors

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: