CLAD A realistic Continual Learning benchmark for Autonomous Driving_2

2025-05-01 2 0 850.64KB 16 页 10玖币

侵权投诉

CLAD: A realistic Continual Learning benchmark for Autonomous Driving

Eli Verwimp1a, Kuo Yangb, Sarah Parisotb, Lanqing Hongb, Steven McDonaghb, Eduardo P´

erez-Pelliterob, Matthias

De Langea, Tinne Tuytelaarsa

aPSI, ESAT, KU Leuven,

bHuawei Noah’s Ark Lab,

Abstract

In this paper we describe the design and the ideas motivating a new Continual Learning benchmark for Autonomous

Driving (CLAD), that focuses on the problems of object classiﬁcation and object detection. The benchmark utilises

SODA10M, a recently released large-scale dataset that concerns autonomous driving related problems. First, we re-

view and discuss existing continual learning benchmarks, how they are related, and show that most are extreme cases

of continual learning. To this end, we survey the benchmarks used in continual learning papers at three highly ranked

computer vision conferences. Next, we introduce CLAD-C, an online classiﬁcation benchmark realised through a

chronological data stream that poses both class and domain incremental challenges; and CLAD-D, a domain in-

cremental continual object detection benchmark. We examine the inherent diﬃculties and challenges posed by the

benchmark, through a survey of the techniques and methods used by the top-3 participants in a CLAD-challenge

workshop at ICCV 2021. We conclude with possible pathways to improve the current continual learning state of the

art, and which directions we deem promising for future research.

Keywords: Continual Learning, Classiﬁcation, Object Detection, Challenge Report, Benchmark

1. Introduction

Today, if a team of engineers were given the task to

develop a new machine learning system for autonomous

driving, they would start by deﬁning the situations a car

might encounter and should be able to handle. Data that

ﬁts the problem description is gathered and annotated,

and a model is trained. Soon after deployment, reports

of failure cases come in, e.g. the system fails in snowy

weather, does not detect tricycles and misses passing

cars on the right-hand side. In good faith, the engineers

collect new data to include these situations, and retrain

the model from scratch. Yet soon after the second

version, reports come in of malfunctioning systems in

cities for which no training data was collected, and the

cycle starts again. Knowing how and where a model

is going to be used, and how it will fail, is a nearly

impossible task. While avoiding all possible failures is

unattainable, decreasing the cost of incorporating new

knowledge into a system is a much more feasible goal.

When trying to develop such systems, the assumption

is made that there exists a complete set of all the

object-label pairs in the world, which includes those

of interest. Observing these objects does not happen

randomly, but depends on the context in which they

are observed. For instance, the geo-location, the time

of the day (e.g. day vs. night) and the time period or

era (e.g. 1970’s vs. now) inﬂuence the probability with

which a certain type of vehicle is observed. Besides

appearance, context also inﬂuences the frequency

with which certain types of objects appear (e.g. fewer

pedestrians on a highway). This is also true for machine

learning datasets, where data is gathered within a

context that ultimately determines which objects of

the complete set are included in the machine learning

process.

Recognising that contexts are rarely constant, Contin-

ual Learning (CL) studies how we can enable neural

networks to include new knowledge from changing

contexts, at the lowest cost possible. Without continual

learning, solving failure cases and extending to new

domains requires retraining and tuning models from

scratch. Given the increasing sizes of contemporary

models, this has a signiﬁcant energy and time cost [1].

A simple and straight-forward idea is optimising the

model on the new data only. In 1989, McCloskey et

al. [2] were among the ﬁrst that observed that this tech-

Preprint submitted to Neural Networks October 10, 2022

arXiv:2210.03482v1 [cs.CV] 7 Oct 2022

nique, ﬁnetuning, leads to rapid performance decreases

on the old data. Today, this is referred to as catastrophic

forgetting, and represents the largest challenge for

continual learning: forgetting old knowledge cannot be

the consequence of incorporating new knowledge in a

system.

With growing interest in continual learning, there

is an increasing need for rigorous benchmarks and

routes to reliably assess the progress that is being made.

The earliest benchmarks focused on classiﬁcation

problems and split versions of popular datasets like

MNIST [3] and CIFAR10 [4]. In a typical split-dataset,

all available classes are artiﬁcially (and often randomly)

divided into diﬀerent tasks (or contexts). Then, a model

is trained task-by-task, without access to past or future

data. The performance of a CL-algorithm is assessed

by the ﬁnal accuracy on each task. While they are

useful for early methodological development, these

benchmarks pose an artiﬁcial challenge. Randomly

chosen distribution shifts are not assured to be aligned

with real-world context shifts, which are the result of

changing environments and model requirements. Addi-

tionally, datasets in image classiﬁcation (e.g. CIFAR10)

are often designed and simpliﬁed to only have a single

object in the foreground, with a relatively small range

of object scales. In Section 2, we describe how these

benchmarks are only a small fraction of all possible

CL-problems, and how other settings might be more

realistic.

In this paper we describe the design and the ideas

motivating a new Continual Learning benchmark

for Autonomous Driving (CLAD). CLAD-C tests a

continual classiﬁcation model on naturally occurring

context shifts along the time dimension, and CLAD-D

focuses on the more realistic problem of continual

object detection. Both settings have been introduced as

part of the ICCV SSLAD2workshop in October 2021,

for which we will present the top-3 submissions in

this work. We start with a discussion on the principles

we followed during the design of this challenge. We

continue with a introduction of naive baselines, which

highlight the diﬃculties and pitfalls of the proposed

benchmarks. Then we discuss the solutions proposed

by challenge participants and their results. Finally, we

conclude with further details and experiments on the

most promising submission ideas, discussion on future

research directions to improve on the benchmarks,

2https://sslad2021.github.io/pages/challenge.html

and continual learning in general. Our contributions

include:

•A review of current CL-benchmarks and their rela-

tions to one-another and shortcomings.

•Introduction of two new CL-benchmarks, with the

goal of moving closer to CL-scenarios encountered

in real-world problems. For both benchmarks code

is made publicly available3.

•Review of the top performing methods on the pro-

posed benchmarks, from the SSLAD ICCV ’21

challenge entries, highlighting concrete pathways

to progress the current state of continual learning.

2. A Continual Learning Framework

Research interest in Continual Learning has increased

substantially in recent years, resulting in the introduc-

tion of a growing number of benchmarks and desider-

ata. In this section we review the tasks and datasets that

have been proposed and highlight those we deem most

popular in terms of adoption at top tier Computer Vision

focused conference venues.

2.1. Continual Tasks

Starting from the complete set S={X,Y}of ob-

jects Xand their ground-truth labels Y, a machine learn-

ing dataset is deﬁned as a subset D⊆S, and charac-

terised by the probability distribution P(X,Y|C). The

context Cdetermines which object-label pair have non-

zero probability of being in the dataset. In i.i.d. training,

this context stays constant and therefore the probabil-

ity with which samples are observed by the model also

remain constant. In contrast, continual learning is char-

acterised by a changing context c, which induces a shift

in P(X,Y|C=c). Often, each of these context changes

is referred to as a task. While in theory all kinds of

context switches are a part of continual learning, only

two types of context variations are commonly used to

assess CL performance. In Class-incremental learning,

a new context causes P(X) to shift, which induces a

shift in P(Y), and typically supp (Yn)∩supp (Ym)=∅if

context-ID m,n, meaning each class only occurs (has

non-zero probability) during a single context. Domain-

incremental learning considers shifts in P(X) that do not

aﬀect P(Y), thus only the environment in which a class

3https://github.com/VerwimpEli/SSLAD_Track_3

is observed changes. In some settings, the model has ac-

cess to the context variable or context-ID during train-

ing and testing, which is referred to as task-incremental

learning. Finally, in this work we assume that P(Y|X)

is not subject to change, i.e. the label yof a sample x

never changes [5,6]. See Figure 1for an overview and

stratiﬁcation of common benchmarks, including those

proposed here.

2.2. Contemporary CL for Computer Vision

In theory, continual learning is concerned with any

kind of distribution change in the training data. Given

the many possible choices, many diﬀerent benchmarks

have been proposed and used. To understand which

ones are currently popular in computer vision, we sur-

veyed CVPR, ICCV and NeurIPS of 2021; three highly

ranked computer vision and machine learning confer-

ences. From their proceedings, we selected all pa-

pers with the words continual, lifelong, sequential, in-

cremental or forget in their titles. These keywords

were deﬁned using a manually collected list of 50 pa-

pers concerning CL, spanning publication year 2017-

2022, of which 98% had positive matches with our

keywords. After ﬁltering for false positives, 60 rele-

vant papers remain. Of these works, 73% included at

least one classiﬁcation problem, 10% a semantic seg-

mentation problem, 7% a generative problem, 3% an

object detection problem and 10% various other prob-

lems. In the papers that focused on classiﬁcation, 188

experiments (excluding ablations etc. ) were conducted.

Of these, 90% used a non-continual dataset, and ran-

domly change P(X) at discrete intervals, such that each

class has only non-zero probability in a single context

(strictly class-incremental). 8.5% changed P(X) without

aﬀecting P(Y) (strictly domain incremental), and only

1.5% included more gradual context switches, see Sec-

tion 2.3. See Figure 1for the distribution of datasets

used. While the random and discrete context switches

in these benchmarks are only a small part of the space

of CL-problems, they are currently used to assess the

quality of almost all new CL-algorithms.

2.3. Towards more realistic benchmarks

While discrete and random distribution shifts are an

interesting tool to study continual learning, they are,

as previously discussed, not necessarily representative

of the changing context in which continual learning

systems can be used. In fact, as shown in Figure 1,

there is a whole continuum of context changes between

strict class and domain incremental learning. To

aid progress towards applicable continual learning,

Lomonaco et al. introduced CORe50 [7], and claim

that realistic CL-benchmarks should have access to

multiple views of the same object. This coincides with

a more gradual changing context of the data distribution

P(X), an idea shared by [8,9,10], whom introduce

iCub World, Toys-200 and Stream-51, respectively.

Cossu et al. are critical of the lack of class repetition

in continual learning benchmarks, and claim that this

makes CL artiﬁcially diﬃcult, and unlike real-world

scenarios [11]. They consider it realistic that a class

has non-zero probability during multiple contexts,

albeit with diﬀerent frequencies. Additional support for

this hypothesis can be found in related works [12,7].

Repetition occurs naturally in benchmarks leveraging

temporal meta-data of images, an idea implemented

by the benchmarks Wanderlust [13], Clear [14] and

CLOC [15]. Using time of day as a context variable,

both the data and label distributions change gradually

and non-randomly, which comes closest to a real-world

setting, see Figure 1for how they compare to more

traditional benchmarks. Regardless of context changes,

some benchmarks only allow for online learning, where

only a single pass over the data within a context is

allowed [16]. This is regarded a realistic scenario in

other works [17,15,18,19]. Finally, using the context

as an input variable, such that the model knows the

context of a sample (task-incremental learning), has

been critiqued as too restricting [16]. Despite these

critiques and proposals to work towards more natural

benchmarks, Section 2.2 indicated that they are seldom

adopted in papers proposing new methods.

Most of these benchmarks focus on classiﬁcation

problems, as are most papers surveyed in Section 2.2.

Despite its prevalence, classiﬁcation is likely not the

only scenario where CL will be applied in practice,

since it often requires having a single (centered) object

per image. Recent works [20,21] started exploring

object detection and semantic segmentation in CL; two

problems that are more likely to practically beneﬁt from

CL.

2.4. Evaluation of Continual Learning

Besides good benchmarks, metrics that accurately re-

ﬂect the goals of continual learning are indispensable.

CL-methods are commonly evaluated using the average

accuracy of each task at the end of training and aver-

age backward transfer (BWT); the diﬀerence between

the accuracy of a task directly after it was trained and

after all tasks were trained [22]. These metrics are not

necessarily aligned with the CL-goal of including new

knowledge to an already working system. According

Cifar100*

MNIST*

ImageNet* Permuted

MNIST Rotated

MNIST

CORe50 Toys200 iCub CLAD-C CORe50.NI

Wanderlust Clear

y1y2y3y4y5y6

Gradually

changing P(X)

Distinct P(X) Cifar10*

Cub200*

Others Sequences

Class Incremental Domain Incremental

CLOC

Figure 1: Overview of proposed and common benchmarks for continual learning in computer vision, with a focus on classiﬁcation problems. We

depict how data distributions Xand the available labels Ychange during training. The yellow circles areas are proportional to how many papers

published in ICCV ’21, CVPR ’21 and NeurIPS ’21 used each benchmark, see Section 2.2 for details. Black circles are recently proposed, more

realistic benchmarks that are currently less popular. ‘*’ indicates a ‘split’ version of the dataset. ‘Others’ refers to a set of benchmarks that were

only used in a single paper and ‘sequences’ refers to benchmarks where multiple datasets with the same labels are used. For more details and

explanation of dataset placements, see Appendix.

to [22], the best algorithm is one that has the highest

ﬁnal accuracy, however we note that the BWT metric

is also able to beneﬁt from low accuracy during train-

ing (i.e. this eﬀect improves the metric). Instead, we

believe the best CL-algorithm is one that has high ac-

curacy, which can be improved by including new data,

without sudden performance drops. Ideally, this would

be measured continuously during training, and the area

below the accuracy curve used as a metric. Yet practi-

cally this has a high computational cost, and testing at

well-chosen discrete intervals is a reasonable approxi-

mation. For further discussions, see [23,24,12].

3. Continual Learning Benchmark for Autonomous

Driving

In this section, we describe the design and de-

velopment of our continual learning benchmark, us-

ing SODA10M [25], an industry-scale dataset for au-

tonomous driving. Self-driving vehicles can change ur-

ban mobility signiﬁcantly, but there are still challenges

to overcome. One such challenge is demonstrating the

versatility of AI-based automated driving systems to

cope with challenging and dynamic real-world scenar-

ios. Several corporations are now driving many thou-

sands of miles a day autonomously, creating streams of

sensor measurements that form a natural source of con-

tinual learning data. This problem setting leads to natu-

ral and gradually changing distribution shifts, an excel-

lent benchmark for CL-algorithms. Next, we introduce

the SODA10M dataset, and provide a detailed overview

of our two challenge benchmarks that utilise the avail-

able data.

3.1. SODA10M Dataset

Both tracks build on the SODA10M dataset [25],

which contains 10M unlabelled images and 20k labelled

images. Image data consists of dash-camera recorded

footage, obtained from vehicles driving through four

Chinese cities, with images recorded at 10 second in-

tervals. Ordering images chronologically largely en-

tails visual footage of a car exploring the city and its

neighbourhoods. The image label set has bounding

box annotations for 6 object classes and covers dif-

ferent ‘domains’ (cities, weather conditions, time of

day and road type) – see Figure 2. See Figure 4for

some examples images (arranged in tasks for CLAD-

D, as will be discussed in Sec. 3.4). While self- and

semi-supervised learning are interesting research direc-

tions [26], we leave incorporating the unlabelled images

for future work, and make use of the labelled dataset

portion exclusively for our benchmark challenges.

3.2. Challenge Subtracks

As our goal involves working towards more realistic,

real-world settings for continual learning, we approach

the design of the challenge benchmarks with this mind-

set. As referred to in Section 2.1, there currently ex-

ist two main axes along which continual learning real-

world task realism increases: ﬁrstly, the problem formu-

lation itself and secondly, more realistic context shifts.

Ideally, we combine these aspects into a single com-

prehensive benchmark. Yet, given that continual object

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CLAD:ArealisticContinualLearningbenchmarkforAutonomousDrivingEliVerwimp1a,KuoYangb,SarahParisotb,LanqingHongb,StevenMcDonaghb,EduardoP´erez-Pelliterob,MatthiasDeLangea,TinneTuytelaarsaaPSI,ESAT,KULeuven,bHuaweiNoah'sArkLab,AbstractInthispaperwedescribethedesignandtheideasmotivatinganewContinualLearn...

展开>> 收起<<

CLAD A realistic Continual Learning benchmark for Autonomous Driving_2.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

CLAD A realistic Continual Learning benchmark for Autonomous Driving_2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: