This paper discusses CL under a task continual learning (TCL) setting, i.e., that in which data arrives
sequentially in groups of tasks. For works under this scenario [
17
,
18
], the assumption is usually
that once a new task is presented, all of its data becomes readily available for batch (offline) training.
In this setting, a task is defined as an individual training phase with a new collection of data that
belongs to a new (never seen) group of classes, or in general, a new domain. Further, TCL also
(implicitly) requires a task identifier during training. However, in practice, when the model has seen
enough tasks, a newly arriving batch of data becomes increasingly likely to belong to the same group
of classes or domain of a previously seen task. Importantly, most existing works on TCL fail to
acknowledge this possibility. Moreover and in general, the task definition or identifier may not be
available during training, e.g., the model does not have access to the task description due to (user)
privacy concerns. In such case mostly concerning dynamic models, the system has to treat every task
as new, thus constantly learning new sets of parameters regardless of task similarity or overlap. This
clearly constitutes a suboptimal use of resources (predominantly memory), especially as the number
of tasks experienced by the CL system grows.
This study investigates the aforementioned scenario and makes it an endeavor to create a memory-
efficient CL system which though focused on image classification tasks, is general and in principle
can be readily used toward other applications or data modality settings. We provide a solution for
dynamic models to identify similar tasks when no task identifier is provided during the training phase.
To the best of our knowledge, the only work that also discusses the learning of a continual learning
system with mixed similar and dissimilar tasks is [
19
], which proposes a task similarity function to
identify previously seen similar tasks, which requires training a reference model every time a new
task becomes available. Alternatively, in this work, we identify similar tasks without the need for
training a new model, by leveraging a task similarity metric, which in practice results in high task
similarity identification accuracy. We also discuss memory usage under challenging scenarios where
longer, more realistic, sequences of more than 20 tasks are used.
To summarize, our contributions are listed below:
•
We propose a new framework for an under-explored and yet practical TCL setting in which we
seek to learn a sequence of mixed similar and dissimilar tasks, while preventing (catastrophic)
forgetting and repurposing task-specific parameters from a previously seen similar task, thus
slowing down parameter expansion.
•
The proposed TCL framework is characterized by a task similarity detection module that deter-
mines, without additional learning, whether the CL system can reuse the task-specific parameters
of the model for a previous task or needs to instantiate new ones.
•
Our task similarity detection module shows remarkable performance on widely used computer
vision benchmarks, such as
CIFAR10
[
20
],
CIFAR100
[
20
],
EMNIST
[
21
], from which we create
sequences of 10 to 100 tasks.
2 Related Work
TCL in Practical Scenarios
Task continual learning (TCL) being an intuitive imitation of human
learning process constitutes one of the most studied scenarios in CL. Though TCL systems have
achieved impressive performance [
22
,
23
], previous works have mainly focused on circumventing
the problems associated with CF. Historically, task sequences have been restricted to no more than 10
tasks and strong assumptions have been imposed so that all the tasks in the sequence are unique and
classes among tasks are disjoint [
10
,
24
]. The authors of [
25
] rightly argue that currently discussed
CL settings are oversimplified, and more general and practical CL forms should be discussed to
advance the field. It is not until recently that solutions have been proposed for longer sequences
and more practical CL scenarios. Particularly, [
19
] proposed CAT, which learns from a sequence of
mixed similar and dissimilar tasks, thus enabling knowledge transfer between future and past tasks
detected to be similar. To characterize the tasks, a set of task-specific masks, i.e., binary matrices
indicating which parameters are important for a given task [
22
], are trained along other model
parameters. Specifically, these masks are activated and parameters associated to them are finetuned
once the current task is identified as “similar” by a task similarity function, or otherwise held fixed
by the masking parameters to protect them from changing, hence preventing CF. Alternatively, [
15
]
introduces a modular network, which is composed of neural modules that are potentially shared with
related tasks. Each task is optimized by selecting a set of modules that are either freshly trained on
2