
IS MULTI-TASK LEARNING AN UPPER BOUND FOR CONTINUAL LEARNING?
Zihao Wu1, Huy Tran1, Hamed Pirsiavash2, and Soheil Kolouri1
1: Department of Computer Science, Vanderbilt University, Nashville, TN,
2: Department of Computer Science, University of California, Davis, CA
ABSTRACT
Continual and multi-task learning are common machine
learning approaches to learning from multiple tasks. The ex-
isting works in the literature often assume multi-task learning
as a sensible performance upper bound for various continual
learning algorithms. While this assumption is empirically
verified for different continual learning benchmarks, it is not
rigorously justified. Moreover, it is imaginable that when
learning from multiple tasks, a small subset of these tasks
could behave as adversarial tasks reducing the overall learn-
ing performance in a multi-task setting. In contrast, continual
learning approaches can avoid the performance drop caused
by such adversarial tasks to preserve their performance on
the rest of the tasks, leading to better performance than a
multi-task learner. This paper proposes a novel continual
self-supervised learning setting, where each task corresponds
to learning an invariant representation for a specific class of
data augmentations. In this setting, we show that continual
learning often beats multi-task learning on various benchmark
datasets, including MNIST, CIFAR-10, and CIFAR-100.
Index Terms—Continual Learning, Multi-Task Learn-
ing, Self Supervised Learning
1. INTRODUCTION
Modern Machine Learning (ML) is rapidly moving away
from single-task experts towards foundational models that
can generalize to multiple tasks. Multi-task and continual
learning are the two commonly used paradigms in machine
learning when learning from a multitude of tasks. Multi-task
learning (MTL) assumes simultaneous access to indepen-
dent and identically distributed (i.i.d) samples from the joint
distribution over all tasks and trains the ML model on this
joint distribution. However, in many practical settings, e.g.,
autonomous driving, one deals with an input data stream and
joint training on the ever-growing data and its ever-changing
distribution poses a major challenge to MTL. In contrast,
continual learning (CL) assumes that the ML model can
only have access to one task at a time and learns them se-
quentially. We note that, appropriately, MTL is sometimes
This work was supported by the Defense Advanced Research Projects
Agency (DARPA) under Contract No. HR00112190135.
referred to as joint training, while CL is referred to as sequen-
tial/incremental training. A dire consequence of not having
access to i.i.d. samples from the joint distribution in CL is the
catastrophic forgetting phenomenon, which refers to the loss
of performance on previous tasks while learning new ones.
The research community has recently proposed a plethora
of approaches for overcoming catastrophic forgetting in the
continual learning of deep neural networks. One can broadly
categorize these methods into: 1) regularization-based ap-
proaches that penalize large changes to important parameters
of previous tasks [1, 2, 3, 4, 5], and 2) memory replay and
rehearsal-based approaches [6, 7, 8, 9], and 3) architectural
methods that rely on model expansion, parameter isolation,
and masking [10, 11, 12, 13]. Others have studied brain-
inspired mechanisms [14] that allow for continual learning
in mammalian brains and how to leverage them for contin-
ual machine learning. Interestingly, in nearly all existing ap-
proaches, the common baselines are MTL (i.e., joint training)
as an upper bound and naive sequential training, leading to
catastrophic forgetting, as a lower bound.
In this paper, we argue that MTL, while being a valu-
able baseline, is not necessarily an upper bound for CL. We
observe that interference in learning is not unique to the CL
framework and can happen in MTL. For instance, adversarial
examples/tasks, whether optimized or occurring naturally,
could significantly reduce the performance of a multi-task
learner during training. While CL methods that overcome
catastrophic forgetting could avoid the performance drop
caused by such adversarial examples/tasks in favor of pre-
serving the performance on non-adversarial examples/tasks.
Notably, one can argue that this effect does not happen in
non-adversarial settings, as evident in the results on most CL
benchmark problems indicating MTL as an upper bound for
CL; hence, it could be of limited interest to the community.
In response to this criticism, we show that our observation is
not unique to adversarial settings and can naturally happen in
CL. We introduce the continual learning of augmentation in-
variant representations as a continual self-supervised learning
(SSL) problem. We show that, in this setting, CL often out-
performs its MTL counterpart on various benchmark datasets,
including MNIST, CIFAR10, and CIFAR100.
arXiv:2210.14797v1 [cs.LG] 26 Oct 2022