In this paper, we address a more realistic scenario, called Task-Free Continual Learning (TFCL) [
3
],
where task identities are not available while the model can only access a small batch of samples
at a given time. Most existing CL methods requiring the task label can be adapted to TFCL by
removing the task information dependency. For instance, memory-based approaches can store a few
past samples from the data stream at each training time and replay them during later training steps
[
8
,
12
]. However, such an approach requires to carefully design the sample selection criterion to
avoid memory overload. The key challenge for the memory-based approach is the negative backward
transfer caused by the stored samples that interfere with the model’s updating with incoming samples
[
6
]. This issue can be relieved by DEM in which previously learnt samples are preserved into
frozen components and do not interfere with the learning of probabilistic representations of new
data [
24
,
38
]. However, these approaches do not provide any theoretical guarantees and there are no
studies analysing the trade-off between the model’s generalization and its complexity under TFCL.
Recent attempts have provided the theoretical analysis for CL from different perspectives including
the risk bound [
46
,
51
], NP-hard problem [
17
], Teacher-Student framework [
23
,
58
] and game theory
[
37
]. However all these approaches require strong assumptions, such as defining the task identities,
which is not available in TFCL. This inspires us to bridge the gaps between the underlying theory
and the algorithm implementation for TFCL. We propose a theoretical classification framework,
which provides new insights in the forgetting behaviour analysis and guidance for algorithm design
addressing catastrophic forgetting. The primary motivation behind the proposed theoretical framework
is that we can formulate forgetting as a generalization error in the domain adaptation theory. Based
on this analysis we extend the domain adaptation theory [
29
] to derive time-dependent generalization
risk bounds, explicitly explaining the forgetting process at each training step.
Inspired by the theory, we devise the Online Discrepancy Distance Learning (ODDL) method
which introduces a new expansion mechanism based on the discrepancy distance estimation for
implementing TFCL. The proposed expansion mechanism detects the data distribution shift by
evaluating the variance of the discrepancy distance during the training. This model enables a trade-off
mechanism between the model’s generalization and complexity. We also propose a new sample
selection approach based on the discrepancy-based criterion, which guides storing diverse samples
with respect to the already learnt knowledge, further improving performance. Our contributions are :
•
This paper is the first research study to propose a new theoretical framework for TFCL, which
provides new insights into the forgetting behaviour of the model in classification tasks.
•
Inspired by the theoretical analysis, we develop a novel dynamic expansion approach, which
ensures a compact model architecture enabled by optimal performance.
•
We propose a new sample selection approach that selects appropriate data samples for the memory
buffer, further improving performance.
• The proposed method achieves state of the art results on TFCL benchmarks,
2 Related works
Continual learning
defines a learning paradigm which aims to learn a sequence of tasks without
forgetting. Catastrophic forgetting is a major challenge in continual learning. One of the most popular
approaches to relieve forgetting is by imposing a regularization loss within the optimization procedure
[
7
,
11
,
13
,
16
,
19
,
25
,
26
,
31
,
34
,
35
,
40
,
41
,
57
], where the network’s parameters which are important
to the past tasks are penalized when updating. Another kind of approaches for continual learning
focuses on the memory system, which usually employs a small memory buffer [
1
,
5
,
6
,
28
,
36
,
44
,
59
]
to store a few past data or trains a generator to provide the replay samples when learning new tasks
[
38
,
43
,
46
,
47
,
52
,
58
,
53
]. However, these approaches usually rely on knowing the task information,
which is not applicable in TFCL.
Task-free continual learning
is a special scenario in CL where a model can only see one or very few
samples in each training step/time without having any task labels. Using a small memory buffer
to store past samples has shown benefits for TFCL and was firstly investigated in [
3
,
54
,
56
]. This
memory replay approach was then extended by employing Generative Replay Mechanisms (GRMs)
for training both a Variational Autoencoder (VAEs) [
15
] and a classifier, where a new retrieving
mechanism is used to select specific data samples, called the Maximal Interfered Retrieval (MIR), [
2
].
The Gradient Sample Selection (GSS) [
1
] is another sample selection approach that treats sample
selection as a constrained optimization reduction. More recently, a Learner-Evaluator framework
2