Task-Free Continual Learning via Online Discrepancy Distance Learning Fei Ye and Adrian G. Bors

2025-04-24 0 0 1.54MB 14 页 10玖币
侵权投诉
Task-Free Continual Learning via
Online Discrepancy Distance Learning
Fei Ye and Adrian G. Bors
Department of Computer Science
University of York
York, YO10 5GH, UK
{fy689,adrian.bors}@york.ac.uk
Abstract
Learning from non-stationary data streams, also called Task-Free Continual Learn-
ing (TFCL) remains challenging due to the absence of explicit task information.
Although recently some methods have been proposed for TFCL, they lack theo-
retical guarantees. Moreover, forgetting analysis during TFCL was not studied
theoretically before. This paper develops a new theoretical analysis framework
which provides generalization bounds based on the discrepancy distance between
the visited samples and the entire information made available for training the model.
This analysis gives new insights into the forgetting behaviour in classification tasks.
Inspired by this theoretical model, we propose a new approach enabled by the
dynamic component expansion mechanism for a mixture model, namely the Online
Discrepancy Distance Learning (ODDL). ODDL estimates the discrepancy be-
tween the probabilistic representation of the current memory buffer and the already
accumulated knowledge and uses it as the expansion signal to ensure a compact
network architecture with optimal performance. We then propose a new sample se-
lection approach that selectively stores the most relevant samples into the memory
buffer through the discrepancy-based measure, further improving the performance.
We perform several TFCL experiments with the proposed methodology, which
demonstrate that the proposed approach achieves the state of the art performance.
1 Introduction
Continual learning (CL) and its extension to lifelong learning, represents one of the most desired
functions in an artificial intelligence system, representing the capability of learning new concepts
while preserving the knowledge of past experiences [
32
]. Such an ability can be used in many
real-time applications such as robotics, health investigative systems, autonomous vehicles [
20
] or for
guiding agents exploring artificial (meta) universes, requiring adapting to a changing environment.
Unfortunately, modern deep learning models suffer from a degenerated performance on past data
after learning novel knowledge, a phenomenon called catastrophic forgetting [13].
A popular attempt to relieve forgetting in CL is by employing a small memory buffer to preserve
a few past samples and replay them when training on a new task [
1
,
6
]. However, when there
are restrictions on the available memory capacity, memory-based approaches would suffer from
degenerated performance on past tasks, especially when aiming to learn an infinite number of tasks.
Recently, the Dynamic Expansion Model (DEM) [
51
] has shown promising results in CL, aiming to
guarantee optimal performance by preserving the previously learnt knowledge through the parameters
of frozen components trained on past data, while adding a new component when learning a novel
task. However, such approaches require knowing where and when the knowledge associated with a
given task is changed, which is not always applicable in a real environment.
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.06579v1 [cs.CV] 12 Oct 2022
In this paper, we address a more realistic scenario, called Task-Free Continual Learning (TFCL) [
3
],
where task identities are not available while the model can only access a small batch of samples
at a given time. Most existing CL methods requiring the task label can be adapted to TFCL by
removing the task information dependency. For instance, memory-based approaches can store a few
past samples from the data stream at each training time and replay them during later training steps
[
8
,
12
]. However, such an approach requires to carefully design the sample selection criterion to
avoid memory overload. The key challenge for the memory-based approach is the negative backward
transfer caused by the stored samples that interfere with the model’s updating with incoming samples
[
6
]. This issue can be relieved by DEM in which previously learnt samples are preserved into
frozen components and do not interfere with the learning of probabilistic representations of new
data [
24
,
38
]. However, these approaches do not provide any theoretical guarantees and there are no
studies analysing the trade-off between the model’s generalization and its complexity under TFCL.
Recent attempts have provided the theoretical analysis for CL from different perspectives including
the risk bound [
46
,
51
], NP-hard problem [
17
], Teacher-Student framework [
23
,
58
] and game theory
[
37
]. However all these approaches require strong assumptions, such as defining the task identities,
which is not available in TFCL. This inspires us to bridge the gaps between the underlying theory
and the algorithm implementation for TFCL. We propose a theoretical classification framework,
which provides new insights in the forgetting behaviour analysis and guidance for algorithm design
addressing catastrophic forgetting. The primary motivation behind the proposed theoretical framework
is that we can formulate forgetting as a generalization error in the domain adaptation theory. Based
on this analysis we extend the domain adaptation theory [
29
] to derive time-dependent generalization
risk bounds, explicitly explaining the forgetting process at each training step.
Inspired by the theory, we devise the Online Discrepancy Distance Learning (ODDL) method
which introduces a new expansion mechanism based on the discrepancy distance estimation for
implementing TFCL. The proposed expansion mechanism detects the data distribution shift by
evaluating the variance of the discrepancy distance during the training. This model enables a trade-off
mechanism between the model’s generalization and complexity. We also propose a new sample
selection approach based on the discrepancy-based criterion, which guides storing diverse samples
with respect to the already learnt knowledge, further improving performance. Our contributions are :
This paper is the first research study to propose a new theoretical framework for TFCL, which
provides new insights into the forgetting behaviour of the model in classification tasks.
Inspired by the theoretical analysis, we develop a novel dynamic expansion approach, which
ensures a compact model architecture enabled by optimal performance.
We propose a new sample selection approach that selects appropriate data samples for the memory
buffer, further improving performance.
The proposed method achieves state of the art results on TFCL benchmarks,
2 Related works
Continual learning
defines a learning paradigm which aims to learn a sequence of tasks without
forgetting. Catastrophic forgetting is a major challenge in continual learning. One of the most popular
approaches to relieve forgetting is by imposing a regularization loss within the optimization procedure
[
7
,
11
,
13
,
16
,
19
,
25
,
26
,
31
,
34
,
35
,
40
,
41
,
57
], where the network’s parameters which are important
to the past tasks are penalized when updating. Another kind of approaches for continual learning
focuses on the memory system, which usually employs a small memory buffer [
1
,
5
,
6
,
28
,
36
,
44
,
59
]
to store a few past data or trains a generator to provide the replay samples when learning new tasks
[
38
,
43
,
46
,
47
,
52
,
58
,
53
]. However, these approaches usually rely on knowing the task information,
which is not applicable in TFCL.
Task-free continual learning
is a special scenario in CL where a model can only see one or very few
samples in each training step/time without having any task labels. Using a small memory buffer
to store past samples has shown benefits for TFCL and was firstly investigated in [
3
,
54
,
56
]. This
memory replay approach was then extended by employing Generative Replay Mechanisms (GRMs)
for training both a Variational Autoencoder (VAEs) [
15
] and a classifier, where a new retrieving
mechanism is used to select specific data samples, called the Maximal Interfered Retrieval (MIR), [
2
].
The Gradient Sample Selection (GSS) [
1
] is another sample selection approach that treats sample
selection as a constrained optimization reduction. More recently, a Learner-Evaluator framework
2
is proposed for TFCL, called the Continual Prototype Evolution (CoPE) [
8
] which stores the same
number of samples for each class in the memory in order to ensure the balance replay. Another
direction for the memory-based approaches is to edit the stored samples which would increase the
loss in the upcoming model updates, called the Gradient based Memory EDiting (GMED) [
12
], which
can be employed in the existing CL models to further enhance their performance.
Dynamic expansion models
aim to automatically increase the model’s capacity to adapt to new tasks
by adding new hidden layers and units. Such an approach, called the Continual Unsupervised
Representation Learning (CURL) [
38
], dynamically builds new inference models when meeting the
expansion criterion. However, since CURL still requires a GRM to relieve forgetting, it would lead
to a negative knowledge transfer when updating the network’s parameters to adapt to a new task.
This issue can be addressed by using Dirichlet processes by adding new components while freezing
all previously learnt members, in a model called the Continual Neural Dirichlet Process Mixture
(CN-DPM), [
24
]. However, the expansion criterion used by these approaches relies on the change of
the loss when training each time, which does not have any theoretical guarantees.
3 Theoretical analysis of TFCL
In this section, we firstly introduce learning settings and notations, and then we analyze the forgetting
behaviour for a single as well as for a dynamic expansion model by deriving their Generalization
Bounds (GBs).
3.1 Preliminary
Let
X
be the input space and
Y
represent the output space which is
{−1,1}
for binary classi-
fication and
{1,2, . . . , n0}, n0>2
for multi-class classification. Let
DT
i={xT
j, yT
j}NT
i
j=1
and
DS
i={xS
j, yS
j}NS
i
j=1
represent the training and testing sets for the
i
-th dataset where
xT
j∈ X
and
yT
j∈ Y
are the image and the associated ground truth label.
NT
i
and
NS
i
are the total number of
samples for
DT
i
and
DS
i
, respectively. In the paper, we mainly focus on the task-free class-incremental
learning, described as follows.
Definition 1.
(Data stream.)
For a given
t
-th training dataset
DS
t
with
CS
t
data categories, let us
consider a data stream
S
which consists of samples
DS
t,j
from each category, expressed by
S=
SCS
t
j=1 DS
t,j
. Let
DT
t,j
represent the set of samples drawn from the
j
-th category of
DT
t
. Let
PT
t,j
and
PS
t,j
represent the distributions for
DT
t,j
and
DS
t,j
, respectively. Let
PT,X
t,j
represent the marginal
distribution over X.
Definition 2.
(Learning setting.)
Let
Ti
represent the
i
-th training step. For a given data stream
S
, we assume that there are a total of
n
training steps for
S
, where each training step
Ti
is
associated with a small batch of paired samples
{Xb
i,Yb
i}
drawn from
S
, expressed by
S=
Sn
i=1{Xb
i,Yb
i},{Xb
i,Yb
i}∩{Xb
j,Yb
j}=
, where
i6=j
and a model (classifier) can only
access
{Xb
i,Yb
i}
at
Ti
while all previous batches are not available. After finishing all training steps,
we evaluate the classification accuracy of the model on the testing set
DT
t
. In the following, we define
the model and the memory buffer.
Definition 3.
(Model and memory.)
Let us consider
h
a model implemented by a classifier, and
H={h|h:X → Y}
the space of classifiers. Let
Mi
be a memory buffer at
Ti
. We assume that
Mi
randomly removes samples from the memory buffer while continually adding new data to its
memory at each training step. Let
PMi
represent the probabilistic representation of
Mi
and
|Mi|
its
cardinality.
3.2 Measuring the distribution shift
In TFCL, the distance between the target domain (testing set) and the source domain (memory)
would be dynamically changed during each training step. We can use the discrepancy distance [
29
]
to measure this gap through the analysis of the model’s risk.
3
摘要:

Task-FreeContinualLearningviaOnlineDiscrepancyDistanceLearningFeiYeandAdrianG.BorsDepartmentofComputerScienceUniversityofYorkYork,YO105GH,UK{fy689,adrian.bors}@york.ac.ukAbstractLearningfromnon-stationarydatastreams,alsocalledTask-FreeContinualLearn-ing(TFCL)remainschallengingduetotheabsenceofexplic...

展开>> 收起<<
Task-Free Continual Learning via Online Discrepancy Distance Learning Fei Ye and Adrian G. Bors.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:1.54MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注