CLAD A realistic Continual Learning benchmark for Autonomous Driving_2

2025-05-01 0 0 850.64KB 16 页 10玖币
侵权投诉
CLAD: A realistic Continual Learning benchmark for Autonomous Driving
Eli Verwimp1a, Kuo Yangb, Sarah Parisotb, Lanqing Hongb, Steven McDonaghb, Eduardo P´
erez-Pelliterob, Matthias
De Langea, Tinne Tuytelaarsa
aPSI, ESAT, KU Leuven,
bHuawei Noah’s Ark Lab,
Abstract
In this paper we describe the design and the ideas motivating a new Continual Learning benchmark for Autonomous
Driving (CLAD), that focuses on the problems of object classification and object detection. The benchmark utilises
SODA10M, a recently released large-scale dataset that concerns autonomous driving related problems. First, we re-
view and discuss existing continual learning benchmarks, how they are related, and show that most are extreme cases
of continual learning. To this end, we survey the benchmarks used in continual learning papers at three highly ranked
computer vision conferences. Next, we introduce CLAD-C, an online classification benchmark realised through a
chronological data stream that poses both class and domain incremental challenges; and CLAD-D, a domain in-
cremental continual object detection benchmark. We examine the inherent diculties and challenges posed by the
benchmark, through a survey of the techniques and methods used by the top-3 participants in a CLAD-challenge
workshop at ICCV 2021. We conclude with possible pathways to improve the current continual learning state of the
art, and which directions we deem promising for future research.
Keywords: Continual Learning, Classification, Object Detection, Challenge Report, Benchmark
1. Introduction
Today, if a team of engineers were given the task to
develop a new machine learning system for autonomous
driving, they would start by defining the situations a car
might encounter and should be able to handle. Data that
fits the problem description is gathered and annotated,
and a model is trained. Soon after deployment, reports
of failure cases come in, e.g. the system fails in snowy
weather, does not detect tricycles and misses passing
cars on the right-hand side. In good faith, the engineers
collect new data to include these situations, and retrain
the model from scratch. Yet soon after the second
version, reports come in of malfunctioning systems in
cities for which no training data was collected, and the
cycle starts again. Knowing how and where a model
is going to be used, and how it will fail, is a nearly
impossible task. While avoiding all possible failures is
unattainable, decreasing the cost of incorporating new
knowledge into a system is a much more feasible goal.
When trying to develop such systems, the assumption
is made that there exists a complete set of all the
object-label pairs in the world, which includes those
of interest. Observing these objects does not happen
randomly, but depends on the context in which they
are observed. For instance, the geo-location, the time
of the day (e.g. day vs. night) and the time period or
era (e.g. 1970’s vs. now) influence the probability with
which a certain type of vehicle is observed. Besides
appearance, context also influences the frequency
with which certain types of objects appear (e.g. fewer
pedestrians on a highway). This is also true for machine
learning datasets, where data is gathered within a
context that ultimately determines which objects of
the complete set are included in the machine learning
process.
Recognising that contexts are rarely constant, Contin-
ual Learning (CL) studies how we can enable neural
networks to include new knowledge from changing
contexts, at the lowest cost possible. Without continual
learning, solving failure cases and extending to new
domains requires retraining and tuning models from
scratch. Given the increasing sizes of contemporary
models, this has a significant energy and time cost [1].
A simple and straight-forward idea is optimising the
model on the new data only. In 1989, McCloskey et
al. [2] were among the first that observed that this tech-
Preprint submitted to Neural Networks October 10, 2022
arXiv:2210.03482v1 [cs.CV] 7 Oct 2022
nique, finetuning, leads to rapid performance decreases
on the old data. Today, this is referred to as catastrophic
forgetting, and represents the largest challenge for
continual learning: forgetting old knowledge cannot be
the consequence of incorporating new knowledge in a
system.
With growing interest in continual learning, there
is an increasing need for rigorous benchmarks and
routes to reliably assess the progress that is being made.
The earliest benchmarks focused on classification
problems and split versions of popular datasets like
MNIST [3] and CIFAR10 [4]. In a typical split-dataset,
all available classes are artificially (and often randomly)
divided into dierent tasks (or contexts). Then, a model
is trained task-by-task, without access to past or future
data. The performance of a CL-algorithm is assessed
by the final accuracy on each task. While they are
useful for early methodological development, these
benchmarks pose an artificial challenge. Randomly
chosen distribution shifts are not assured to be aligned
with real-world context shifts, which are the result of
changing environments and model requirements. Addi-
tionally, datasets in image classification (e.g. CIFAR10)
are often designed and simplified to only have a single
object in the foreground, with a relatively small range
of object scales. In Section 2, we describe how these
benchmarks are only a small fraction of all possible
CL-problems, and how other settings might be more
realistic.
In this paper we describe the design and the ideas
motivating a new Continual Learning benchmark
for Autonomous Driving (CLAD). CLAD-C tests a
continual classification model on naturally occurring
context shifts along the time dimension, and CLAD-D
focuses on the more realistic problem of continual
object detection. Both settings have been introduced as
part of the ICCV SSLAD2workshop in October 2021,
for which we will present the top-3 submissions in
this work. We start with a discussion on the principles
we followed during the design of this challenge. We
continue with a introduction of naive baselines, which
highlight the diculties and pitfalls of the proposed
benchmarks. Then we discuss the solutions proposed
by challenge participants and their results. Finally, we
conclude with further details and experiments on the
most promising submission ideas, discussion on future
research directions to improve on the benchmarks,
2https://sslad2021.github.io/pages/challenge.html
and continual learning in general. Our contributions
include:
A review of current CL-benchmarks and their rela-
tions to one-another and shortcomings.
Introduction of two new CL-benchmarks, with the
goal of moving closer to CL-scenarios encountered
in real-world problems. For both benchmarks code
is made publicly available3.
Review of the top performing methods on the pro-
posed benchmarks, from the SSLAD ICCV ’21
challenge entries, highlighting concrete pathways
to progress the current state of continual learning.
2. A Continual Learning Framework
Research interest in Continual Learning has increased
substantially in recent years, resulting in the introduc-
tion of a growing number of benchmarks and desider-
ata. In this section we review the tasks and datasets that
have been proposed and highlight those we deem most
popular in terms of adoption at top tier Computer Vision
focused conference venues.
2.1. Continual Tasks
Starting from the complete set S={X,Y}of ob-
jects Xand their ground-truth labels Y, a machine learn-
ing dataset is defined as a subset DS, and charac-
terised by the probability distribution P(X,Y|C). The
context Cdetermines which object-label pair have non-
zero probability of being in the dataset. In i.i.d. training,
this context stays constant and therefore the probabil-
ity with which samples are observed by the model also
remain constant. In contrast, continual learning is char-
acterised by a changing context c, which induces a shift
in P(X,Y|C=c). Often, each of these context changes
is referred to as a task. While in theory all kinds of
context switches are a part of continual learning, only
two types of context variations are commonly used to
assess CL performance. In Class-incremental learning,
a new context causes P(X) to shift, which induces a
shift in P(Y), and typically supp (Yn)supp (Ym)=if
context-ID m,n, meaning each class only occurs (has
non-zero probability) during a single context. Domain-
incremental learning considers shifts in P(X) that do not
aect P(Y), thus only the environment in which a class
3https://github.com/VerwimpEli/SSLAD_Track_3
2
is observed changes. In some settings, the model has ac-
cess to the context variable or context-ID during train-
ing and testing, which is referred to as task-incremental
learning. Finally, in this work we assume that P(Y|X)
is not subject to change, i.e. the label yof a sample x
never changes [5,6]. See Figure 1for an overview and
stratification of common benchmarks, including those
proposed here.
2.2. Contemporary CL for Computer Vision
In theory, continual learning is concerned with any
kind of distribution change in the training data. Given
the many possible choices, many dierent benchmarks
have been proposed and used. To understand which
ones are currently popular in computer vision, we sur-
veyed CVPR, ICCV and NeurIPS of 2021; three highly
ranked computer vision and machine learning confer-
ences. From their proceedings, we selected all pa-
pers with the words continual, lifelong, sequential, in-
cremental or forget in their titles. These keywords
were defined using a manually collected list of 50 pa-
pers concerning CL, spanning publication year 2017-
2022, of which 98% had positive matches with our
keywords. After filtering for false positives, 60 rele-
vant papers remain. Of these works, 73% included at
least one classification problem, 10% a semantic seg-
mentation problem, 7% a generative problem, 3% an
object detection problem and 10% various other prob-
lems. In the papers that focused on classification, 188
experiments (excluding ablations etc. ) were conducted.
Of these, 90% used a non-continual dataset, and ran-
domly change P(X) at discrete intervals, such that each
class has only non-zero probability in a single context
(strictly class-incremental). 8.5% changed P(X) without
aecting P(Y) (strictly domain incremental), and only
1.5% included more gradual context switches, see Sec-
tion 2.3. See Figure 1for the distribution of datasets
used. While the random and discrete context switches
in these benchmarks are only a small part of the space
of CL-problems, they are currently used to assess the
quality of almost all new CL-algorithms.
2.3. Towards more realistic benchmarks
While discrete and random distribution shifts are an
interesting tool to study continual learning, they are,
as previously discussed, not necessarily representative
of the changing context in which continual learning
systems can be used. In fact, as shown in Figure 1,
there is a whole continuum of context changes between
strict class and domain incremental learning. To
aid progress towards applicable continual learning,
Lomonaco et al. introduced CORe50 [7], and claim
that realistic CL-benchmarks should have access to
multiple views of the same object. This coincides with
a more gradual changing context of the data distribution
P(X), an idea shared by [8,9,10], whom introduce
iCub World, Toys-200 and Stream-51, respectively.
Cossu et al. are critical of the lack of class repetition
in continual learning benchmarks, and claim that this
makes CL artificially dicult, and unlike real-world
scenarios [11]. They consider it realistic that a class
has non-zero probability during multiple contexts,
albeit with dierent frequencies. Additional support for
this hypothesis can be found in related works [12,7].
Repetition occurs naturally in benchmarks leveraging
temporal meta-data of images, an idea implemented
by the benchmarks Wanderlust [13], Clear [14] and
CLOC [15]. Using time of day as a context variable,
both the data and label distributions change gradually
and non-randomly, which comes closest to a real-world
setting, see Figure 1for how they compare to more
traditional benchmarks. Regardless of context changes,
some benchmarks only allow for online learning, where
only a single pass over the data within a context is
allowed [16]. This is regarded a realistic scenario in
other works [17,15,18,19]. Finally, using the context
as an input variable, such that the model knows the
context of a sample (task-incremental learning), has
been critiqued as too restricting [16]. Despite these
critiques and proposals to work towards more natural
benchmarks, Section 2.2 indicated that they are seldom
adopted in papers proposing new methods.
Most of these benchmarks focus on classification
problems, as are most papers surveyed in Section 2.2.
Despite its prevalence, classification is likely not the
only scenario where CL will be applied in practice,
since it often requires having a single (centered) object
per image. Recent works [20,21] started exploring
object detection and semantic segmentation in CL; two
problems that are more likely to practically benefit from
CL.
2.4. Evaluation of Continual Learning
Besides good benchmarks, metrics that accurately re-
flect the goals of continual learning are indispensable.
CL-methods are commonly evaluated using the average
accuracy of each task at the end of training and aver-
age backward transfer (BWT); the dierence between
the accuracy of a task directly after it was trained and
after all tasks were trained [22]. These metrics are not
necessarily aligned with the CL-goal of including new
knowledge to an already working system. According
3
Cifar100*
MNIST*
ImageNet* Permuted
MNIST Rotated
MNIST
CORe50 Toys200 iCub CLAD-C CORe50.NI
Wanderlust Clear
y1y2y3y4y5y6
y1y2y3y4y5y6
y1y2y3y4y5y6
X
X
Gradually
changing P(X)
Distinct P(X) Cifar10*
Cub200*
Others Sequences
Class Incremental Domain Incremental
CLOC
Figure 1: Overview of proposed and common benchmarks for continual learning in computer vision, with a focus on classification problems. We
depict how data distributions Xand the available labels Ychange during training. The yellow circles areas are proportional to how many papers
published in ICCV ’21, CVPR ’21 and NeurIPS ’21 used each benchmark, see Section 2.2 for details. Black circles are recently proposed, more
realistic benchmarks that are currently less popular. ‘*’ indicates a ‘split’ version of the dataset. ‘Others’ refers to a set of benchmarks that were
only used in a single paper and ‘sequences’ refers to benchmarks where multiple datasets with the same labels are used. For more details and
explanation of dataset placements, see Appendix.
to [22], the best algorithm is one that has the highest
final accuracy, however we note that the BWT metric
is also able to benefit from low accuracy during train-
ing (i.e. this eect improves the metric). Instead, we
believe the best CL-algorithm is one that has high ac-
curacy, which can be improved by including new data,
without sudden performance drops. Ideally, this would
be measured continuously during training, and the area
below the accuracy curve used as a metric. Yet practi-
cally this has a high computational cost, and testing at
well-chosen discrete intervals is a reasonable approxi-
mation. For further discussions, see [23,24,12].
3. Continual Learning Benchmark for Autonomous
Driving
In this section, we describe the design and de-
velopment of our continual learning benchmark, us-
ing SODA10M [25], an industry-scale dataset for au-
tonomous driving. Self-driving vehicles can change ur-
ban mobility significantly, but there are still challenges
to overcome. One such challenge is demonstrating the
versatility of AI-based automated driving systems to
cope with challenging and dynamic real-world scenar-
ios. Several corporations are now driving many thou-
sands of miles a day autonomously, creating streams of
sensor measurements that form a natural source of con-
tinual learning data. This problem setting leads to natu-
ral and gradually changing distribution shifts, an excel-
lent benchmark for CL-algorithms. Next, we introduce
the SODA10M dataset, and provide a detailed overview
of our two challenge benchmarks that utilise the avail-
able data.
3.1. SODA10M Dataset
Both tracks build on the SODA10M dataset [25],
which contains 10M unlabelled images and 20k labelled
images. Image data consists of dash-camera recorded
footage, obtained from vehicles driving through four
Chinese cities, with images recorded at 10 second in-
tervals. Ordering images chronologically largely en-
tails visual footage of a car exploring the city and its
neighbourhoods. The image label set has bounding
box annotations for 6 object classes and covers dif-
ferent ‘domains’ (cities, weather conditions, time of
day and road type) – see Figure 2. See Figure 4for
some examples images (arranged in tasks for CLAD-
D, as will be discussed in Sec. 3.4). While self- and
semi-supervised learning are interesting research direc-
tions [26], we leave incorporating the unlabelled images
for future work, and make use of the labelled dataset
portion exclusively for our benchmark challenges.
3.2. Challenge Subtracks
As our goal involves working towards more realistic,
real-world settings for continual learning, we approach
the design of the challenge benchmarks with this mind-
set. As referred to in Section 2.1, there currently ex-
ist two main axes along which continual learning real-
world task realism increases: firstly, the problem formu-
lation itself and secondly, more realistic context shifts.
Ideally, we combine these aspects into a single com-
prehensive benchmark. Yet, given that continual object
4
摘要:

CLAD:ArealisticContinualLearningbenchmarkforAutonomousDrivingEliVerwimp1a,KuoYangb,SarahParisotb,LanqingHongb,StevenMcDonaghb,EduardoP´erez-Pelliterob,MatthiasDeLangea,TinneTuytelaarsaaPSI,ESAT,KULeuven,bHuaweiNoah'sArkLab,AbstractInthispaperwedescribethedesignandtheideasmotivatinganewContinualLearn...

展开>> 收起<<
CLAD A realistic Continual Learning benchmark for Autonomous Driving_2.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:850.64KB 格式:PDF 时间:2025-05-01

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注