Few-Shot Continual Active Learning by a Robot Ali Ayub University of Waterloo

2025-04-27 0 0 1.97MB 18 页 10玖币

侵权投诉

Few-Shot Continual Active Learning by a Robot

Ali Ayub

University of Waterloo

Waterloo, ON N2L3G1, Canada

a9ayub@uwaterloo.ca

Carter Fendley

Capital One

New York, NY 10017, USA

ccf5164@psu.edu

Abstract

In this paper, we consider a challenging but realistic continual learning problem,

Few-Shot Continual Active Learning (FoCAL), where a CL agent is provided with

unlabeled data for a new or a previously learned task in each increment and the agent

only has limited labeling budget available. Towards this, we build on the continual

learning and active learning literature and develop a framework that can allow a CL

agent to continually learn new object classes from a few labeled training examples.

Our framework represents each object class using a uniform Gaussian mixture

model (GMM) and uses pseudo-rehearsal to mitigate catastrophic forgetting. The

framework also uses uncertainty measures on the Gaussian representations of

the previously learned classes to ﬁnd the most informative samples to be labeled

in an increment. We evaluate our approach on the CORe-50 dataset and on a

real humanoid robot for the object classiﬁcation task. The results show that our

approach not only produces state-of-the-art results on the dataset but also allows a

real robot to continually learn unseen objects in a real environment with limited

labeling supervision provided by its user1.

1 Introduction

Continual learning (CL) [

] has emerged as a popular area of research in recent years because

of its limitless real-world applications, such as domestic robots, autonomous cars, etc. Most continual

machine learning models [

], however, are developed for constrained task-based continual

learning setups, where a CL model continually learns a sequence of tasks, one at a time, with all

the data of the current task labeled and available in an increment. Real world systems, particularly

autonomous robots, do not have the luxury of getting a large amount of labeled data for each task. In

contrast, robots operating in real-world environments mostly have to learn from supervision provided

by their users [

]. Human teachers, however, would be unwilling answer a large number

of questions or label a large amount of data for the robot. It would therefore be useful for robots

to self-supervise their learning, and ask the human teachers to label the most informative training

samples from the environment, in each increment. In this paper, we focus on this challenging problem,

termed as Few-Shot Continual Active Learning (FoCAL).

One of the main problems faced by continual machine learning models is catastrophic forgetting,

in which the CL model forgets the previous learned tasks when learning new knowledge. In recent

years, several works in CL have focused on mitigating the catastrophic forgetting problem [

Most of these works, however, are developed for the task-based continual learning setup, where the

model assumes that all the data for a task is available in an increment and it is fully labeled. These

constraints are costly and limit the real-world application of CL models on robots. Active learning has

emerged as an area of research in recent years, where machine learning models can choose the most

informative samples to be labeled from a large corpus of unlabeled data, thus reducing the labelling

Preliminary ideas [

] related to this work were presented at workshops in RoMan 2020 and ICRA 2021.

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.04137v2 [cs.LG] 12 Oct 2022

Figure 1: Our overall framework for FoCAL. In each increment

, the features extracted for unlabeled objects

f(xt;θ)

are passed through the acquisition function

a(xt,M)

to get

most informative samples

xt∗

, which are

labeled by the oracle. The labeled feature vectors are used to update the GMM representation of the learned

classes

. Pseudo-rehearsal is used to replay old class data, and the classiﬁer model

C(.;W)

is trained on the

pseudo-samples of the old classes and the labeled feature vectors in the tth increment.

effort [

]. Most active learning techniques use uncertainty sampling to request labels for the

most uncertain objects [

]. These techniques, however, do not learn continually and thus

would suffer from catastrophic forgetting. These issues related to the development of "close-world“

techniques for continual learning, active learning and open-set recognition have been explored in

detail in [17].

In this paper, we consider FoCAL for the online continual learning scenario for the image classiﬁcation

task. In this setup, a CL model (applied on a robot) receives a small amount of unlabeled image data

of objects from the environment in an increment, where the objects can belong to the previously

learned classes by the model, or new classes. The model is allowed to get a small number of object

samples to be labeled by the user. As the model continues to learn from new training samples,

it does not have access to the raw image data of the previously learned objects. Overall, FoCAL

is a combination of multiple challenging problems in machine learning, mainly Few-Shot Class

Incremental Learning (FSCIL) [

], Active Learning [

], and online continual learning [

To solve FoCAL, we get inspiration from the continual learning and active learning literature, to

develop protocols for continual learning models so that they can actively choose informative samples

in an increment. Particularly, we take inspiration from FSCIL literature to develop a new FoCAL

model, in which we learn and preserve the feature representation of the previously learned objects

classes by modelling them as Gaussian mixture models. To mitigate catastrophic forgetting, we use

pseudo-rehearsal [

] using the samples generated from the Gaussian distributions of the old classes,

thus removing the need to store raw data for the classes. Further, to choose most informative samples

from an unlabeled set, we use a combination of predictive entropy [

] and viewpoint consistency

metrics [

] on the GMM representation of the previously learned classes. We perform extensive

evaluations of our proposed approach on the CORe-50 dataset [

], and on a real humanoid robot

in an indoor environment. Our approach outperforms state-of-the-art (SOTA) continual learning

approaches for FoCAL on the CORe-50 datast with signiﬁcant margins. Further, our approach can

also be integrated on a humanoid robot, and allow the robot to learn a large number of common

household objects over a long period of time with limited supervision provided by the user. Finally,

as a part of this work, we also release the object dataset collected by our robot as a benchmark for

future evaluations for FoCAL (available here: https://tinyurl.com/2vuwv8ye).

2 Few-Shot Continual Active Learning

We deﬁne the Few-Shot Continual Active Learning (FoCAL) problem as follows: Suppose that an AI

agent (e.g. a robot) gets a stream of unlabeled data sets

pool, D2

pool, ..., Dt

pool, ...

over

increments,

where

pool ={xt

i}|Dt

pool|

i=1

. In each increment, a continual learning model

with parameters

can only obtain a small number (

kt<|Dt

pool|

) of samples to be labeled. Given the model

, an

acquisition function

a(xt,M)

, where

xt∈Dt

pool

, is used by the AI agent to ﬁnd the most informative

samples to be labeled in an increment

xt∗= argmaxDt

pool a(xt, M)

. Therefore, in each increment

, the model

gets trained on small subsets of labeled data

Dt={(xt∗

i, yt

i)}|kt|

i=1

, where

i∈Yt

represents the class label of

xt∗

and

is the set of classes in the

-th increment. Note that unlike

most continual learning setups,

Yi∩Yj6=∅

∀i6=j

. After training on

, the model

is tested

to recognize all the encountered classes so far

Y1, Y 2, ..., Y t

. The main challenges of FoCAL are

three-fold: (1) avoid catastrophic forgetting, (2) prevent overﬁtting on the few training samples, (3)

efﬁciently choose most informative samples in each increment.

For FoCAL for the task of object classiﬁcation, we consider the model

(a CNN) as a composition

of a feature extractor

f(.;θ)

with parameters

and a classiﬁcation model with weights

. The

feature extractor transforms the input images into a feature space

F ∈ Rn

. The classiﬁcation model

takes the features generated by the feature extractor, and generates an output vector followed by a

softmax function to generate multi-class probabilities. In this paper, we use a pre-trained feature

extractor, therefore parameters

are ﬁxed. Thus, we incrementally ﬁnetune the classiﬁcation model

D1, D2, ...

and get parameters

W1, W 2, ...

. In an increment

, we expand the output layer by

|Yt|

neurons to incorporate new classes. Note that this setup does not alleviate the three challenges of

FoCAL mentioned above. The subsections below describe the main components of our framework

(Figure 1) to transform this setup for FoCAL.

2.1 GMM Based Continual Learning (GBCL)

We aim to develop a model that not only helps the system with continual learning but is also motivated

by the newness of an object. To accomplish this, we must evaluate how different an incoming object

is from previously learned object classes, ideally without any additional supervision. To accomplish

this, we consider a clustering-based approach to represent the distribution of object classes. Unlike

previous works on clustering-based approaches for continual learning [

] that represent the

object classes as mean feature vectors (centroids), we estimate the distribution of the each object class

using a uniform Gaussian mixture model (GMM). We believe that representing each class data as a

GMM may better represent the true distribution of the data rather than assuming that the distribution

is circular. We call our complete algorithm for continually learning GMMs of multiple object classes

as GMM based continual learning (GBCL).

Once the

feature vectors (

) selected by acquisition function (Section 2.2) as most informative

samples are labeled by the oracle in increment

, GBCL is applied to learn GMMs for the classes

For each

th feature vector

xt∗

labeled as

, if

is a new class never seen by the model before,

we initialize a new Gaussian distribution

N(xt

i, O)

for class

with

as the mean (centroid) and a

zero matrix (

) as the covariance matrix

. Otherwise, if

is a known class, we ﬁnd the probabilities

N(xt

i|cy

1, σy

1), ..., N(xt

i|cy

j, σy

j), ..., N(xt

i|cy

ny, σy

ny)

for

to belong to all the previously learned

Gaussian distributions for class

, where

is the total number of mixture components in the GMM

for class

, and

and

σy

represent the centroid and covariance matrix for the

th mixture component

of class

, respectively. If the maximum probability among the calculated probabilities for all the

distributions is higher than a pre-deﬁned probability threshold

is used to update the parameters

(centroid and covariance matrix) of the most probable distribution (

N(cy

j, σy

) in class

. The

updated centroid ˆcy

jis calculated as a weighted mean between the previous centroid cy

jand xt

ˆcy

j=wy

j×cy

j+xt

j+ 1 (1)

where,

is the number of images already clustered in the

th (most probable) Gaussian distribution.

The updated covariance matrix ˆσy

jis calculated based on the procedure described in [23]):

ˆσy

j=wy

j−1

σy

j+wy

j−1

2(xt

i−ˆ

j)T(xt

i−ˆcy

j)(2)

where,

σy

is the previous covariance matrix and

(xt

i−ˆ

(c)y

j)T(xt

i−ˆcy

represents the covariance

between

and

ˆcy

. If, on the other hand, the maximum probability among the calculated probabilities

We do not describe mixing coefﬁcients here, as they will always be

1/n

for a uniform GMM, where

is the

number of mixture components.

for all the distributions is lower than

, a new Gaussian distribution

N(xt

i, O)

is created for class

with xt

ias the centroid and Oas the covariance matrix.

The result of this process is a set of

uniform GMMs with parameters

φ1, φ2, ..., φNt

for

classes learned up till increment

. Note that instead of using the number of mixture components as a

hyperparameter, we use the probability threshold. This way we can start with a simple distribution

model for each class (a single mixture component) and add more mixture components only when the

new images of the class are too dissimilar from the previous mixture components, and thus cannot

be modeled by the GMM. Therefore, the total number of mixture components for each class can be

different based on the similarity between the images of the class. In section 2.2, we use the same idea

of dissimilarity between an unlabeled image and a GMM to predict most informative samples.

2.1.1 Pseudo-rehearsal and Classiﬁer Training

To avoid catastrophic forgetting, we use pseudo-rehearsal [

] to replay the old classes when learning

from new data in increment

. For pseudo-rehearsal, we sample the Gaussian distributions in the

GMMs of all the previously learned classes to generate a set of pseudo-feature vectors. Note that

we also store the total number of images clustered in each Gaussian distribution of the classes (

)

during the GMM learning phase (Section 2.1). Therefore, we generate the same number of pseudo-

feature vectors as the original number of images for each class. After generating the pseudo-feature

vectors, the classiﬁer model

C(.;W)

is trained using the labeled dataset

in increment

, and the

pseudo-feature vectors of the previous classes.

For classiﬁcation of a test image

, it is ﬁrst passed through the feature extractor

f(x, θ)

and then

through the classiﬁer

C(f(x, θ), W )

. Softmax function (

) is then applied on the output to generate

class probabilities, and the class

y∗

with the maximum probability is predicted as the label for the

test image: y∗= argmaxyσ(WTf(x, θ)).

2.2 Active Learning using GMMs

We quantify the novelty of an object in terms of how much the model is uncertain about the object.

Unlike most active learning setups [

], in FoCAL the model does not have access to a training

set in each increment for training the model to predict uncertain object classes. Further, even if the

model does get trained to predict unknown object classes in the ﬁrst increment, it will catastrophically

forget the criterion of novelty as it continually learns new object classes in the subsequent increments

(unknown classes in the ﬁrst increment become known to the model in the subsequent increments).

Therefore, we do not train our model for active learning, and instead use the GMM representations of

the previously learned object classes to predict the most uncertain objects.

Considering the FoCAL setup (as described in Section 2), in an increment

, the AI agent gets an

unlabeled dataset

pool

and it must ﬁnd

k < |Dt

pool|

most informative object samples from the

dataset to be labeled. To develop an acquisition function for this, we use a combination of two active

learning techniques applied to the GMM representations of the previously learned object classes.

First, we use the prediction entropy H[y∗|xt

i]of an object as the acquisition function [21]:

H[y∗|xt

i] = −

Nt−1

y=1

p(y∗=y|xt

i)logp(y∗=y|xt

i)(3)

For an unlabeled data point

i∈Dt

pool

, we ﬁnd the predictive probability of

using the GMM

representation of the object classes learned in the previous increments. The predictive probability

p(xt

i|φy)for of xt

ito belong to the GMM of a class ycan be deﬁned as:

p(xt

i|φy) = 1

j=1

N(xt

i|cy

j, σy

j)(4)

Intuitively, if a sample

is already learned by the AI agent, then its probability to belong to one of

the previously learned class GMMs must be high, and thus entropy for

must be low. Therefore,

top ksamples with the highest entropy can be chosen as the most informative samples.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Few-ShotContinualActiveLearningbyaRobotAliAyubUniversityofWaterlooWaterloo,ONN2L3G1,Canadaa9ayub@uwaterloo.caCarterFendleyCapitalOneNewYork,NY10017,USAccf5164@psu.eduAbstractInthispaper,weconsiderachallengingbutrealisticcontinuallearningproblem,Few-ShotContinualActiveLearning(FoCAL),whereaCLagentisp...

展开>> 收起<<

Few-Shot Continual Active Learning by a Robot Ali Ayub University of Waterloo.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Few-Shot Continual Active Learning by a Robot Ali Ayub University of Waterloo

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: