Few-Shot Continual Active Learning by a Robot Ali Ayub University of Waterloo

2025-04-27 0 0 1.97MB 18 页 10玖币
侵权投诉
Few-Shot Continual Active Learning by a Robot
Ali Ayub
University of Waterloo
Waterloo, ON N2L3G1, Canada
a9ayub@uwaterloo.ca
Carter Fendley
Capital One
New York, NY 10017, USA
ccf5164@psu.edu
Abstract
In this paper, we consider a challenging but realistic continual learning problem,
Few-Shot Continual Active Learning (FoCAL), where a CL agent is provided with
unlabeled data for a new or a previously learned task in each increment and the agent
only has limited labeling budget available. Towards this, we build on the continual
learning and active learning literature and develop a framework that can allow a CL
agent to continually learn new object classes from a few labeled training examples.
Our framework represents each object class using a uniform Gaussian mixture
model (GMM) and uses pseudo-rehearsal to mitigate catastrophic forgetting. The
framework also uses uncertainty measures on the Gaussian representations of
the previously learned classes to find the most informative samples to be labeled
in an increment. We evaluate our approach on the CORe-50 dataset and on a
real humanoid robot for the object classification task. The results show that our
approach not only produces state-of-the-art results on the dataset but also allows a
real robot to continually learn unseen objects in a real environment with limited
labeling supervision provided by its user1.
1 Introduction
Continual learning (CL) [
3
,
4
,
5
,
6
] has emerged as a popular area of research in recent years because
of its limitless real-world applications, such as domestic robots, autonomous cars, etc. Most continual
machine learning models [
7
,
8
,
9
,
10
], however, are developed for constrained task-based continual
learning setups, where a CL model continually learns a sequence of tasks, one at a time, with all
the data of the current task labeled and available in an increment. Real world systems, particularly
autonomous robots, do not have the luxury of getting a large amount of labeled data for each task. In
contrast, robots operating in real-world environments mostly have to learn from supervision provided
by their users [
11
,
5
,
12
]. Human teachers, however, would be unwilling answer a large number
of questions or label a large amount of data for the robot. It would therefore be useful for robots
to self-supervise their learning, and ask the human teachers to label the most informative training
samples from the environment, in each increment. In this paper, we focus on this challenging problem,
termed as Few-Shot Continual Active Learning (FoCAL).
One of the main problems faced by continual machine learning models is catastrophic forgetting,
in which the CL model forgets the previous learned tasks when learning new knowledge. In recent
years, several works in CL have focused on mitigating the catastrophic forgetting problem [
13
,
3
,
6
].
Most of these works, however, are developed for the task-based continual learning setup, where the
model assumes that all the data for a task is available in an increment and it is fully labeled. These
constraints are costly and limit the real-world application of CL models on robots. Active learning has
emerged as an area of research in recent years, where machine learning models can choose the most
informative samples to be labeled from a large corpus of unlabeled data, thus reducing the labelling
1
Preliminary ideas [
1
,
2
] related to this work were presented at workshops in RoMan 2020 and ICRA 2021.
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.04137v2 [cs.LG] 12 Oct 2022
Figure 1: Our overall framework for FoCAL. In each increment
t
, the features extracted for unlabeled objects
f(xt;θ)
are passed through the acquisition function
a(xt,M)
to get
k
most informative samples
xt
, which are
labeled by the oracle. The labeled feature vectors are used to update the GMM representation of the learned
classes
Yt
. Pseudo-rehearsal is used to replay old class data, and the classifier model
C(.;W)
is trained on the
pseudo-samples of the old classes and the labeled feature vectors in the tth increment.
effort [
14
,
15
]. Most active learning techniques use uncertainty sampling to request labels for the
most uncertain objects [
15
,
14
,
16
]. These techniques, however, do not learn continually and thus
would suffer from catastrophic forgetting. These issues related to the development of "close-world“
techniques for continual learning, active learning and open-set recognition have been explored in
detail in [17].
In this paper, we consider FoCAL for the online continual learning scenario for the image classification
task. In this setup, a CL model (applied on a robot) receives a small amount of unlabeled image data
of objects from the environment in an increment, where the objects can belong to the previously
learned classes by the model, or new classes. The model is allowed to get a small number of object
samples to be labeled by the user. As the model continues to learn from new training samples,
it does not have access to the raw image data of the previously learned objects. Overall, FoCAL
is a combination of multiple challenging problems in machine learning, mainly Few-Shot Class
Incremental Learning (FSCIL) [
9
,
10
], Active Learning [
14
,
18
], and online continual learning [
19
].
To solve FoCAL, we get inspiration from the continual learning and active learning literature, to
develop protocols for continual learning models so that they can actively choose informative samples
in an increment. Particularly, we take inspiration from FSCIL literature to develop a new FoCAL
model, in which we learn and preserve the feature representation of the previously learned objects
classes by modelling them as Gaussian mixture models. To mitigate catastrophic forgetting, we use
pseudo-rehearsal [
20
] using the samples generated from the Gaussian distributions of the old classes,
thus removing the need to store raw data for the classes. Further, to choose most informative samples
from an unlabeled set, we use a combination of predictive entropy [
21
,
18
] and viewpoint consistency
metrics [
14
,
16
] on the GMM representation of the previously learned classes. We perform extensive
evaluations of our proposed approach on the CORe-50 dataset [
22
], and on a real humanoid robot
in an indoor environment. Our approach outperforms state-of-the-art (SOTA) continual learning
approaches for FoCAL on the CORe-50 datast with significant margins. Further, our approach can
also be integrated on a humanoid robot, and allow the robot to learn a large number of common
household objects over a long period of time with limited supervision provided by the user. Finally,
as a part of this work, we also release the object dataset collected by our robot as a benchmark for
future evaluations for FoCAL (available here: https://tinyurl.com/2vuwv8ye).
2 Few-Shot Continual Active Learning
We define the Few-Shot Continual Active Learning (FoCAL) problem as follows: Suppose that an AI
agent (e.g. a robot) gets a stream of unlabeled data sets
D1
pool, D2
pool, ..., Dt
pool, ...
over
t
increments,
where
Dt
pool ={xt
i}|Dt
pool|
i=1
. In each increment, a continual learning model
M
with parameters
Θ
can only obtain a small number (
kt<|Dt
pool|
) of samples to be labeled. Given the model
M
, an
acquisition function
a(xt,M)
, where
xtDt
pool
, is used by the AI agent to find the most informative
samples to be labeled in an increment
t
:
xt= argmaxDt
pool a(xt, M)
. Therefore, in each increment
t
, the model
M
gets trained on small subsets of labeled data
Dt={(xt
i, yt
i)}|kt|
i=1
, where
yt
iYt
2
represents the class label of
xt
i
and
Yt
is the set of classes in the
t
-th increment. Note that unlike
most continual learning setups,
YiYj6=
,
i6=j
. After training on
Dt
, the model
M
is tested
to recognize all the encountered classes so far
Y1, Y 2, ..., Y t
. The main challenges of FoCAL are
three-fold: (1) avoid catastrophic forgetting, (2) prevent overfitting on the few training samples, (3)
efficiently choose most informative samples in each increment.
For FoCAL for the task of object classification, we consider the model
M
(a CNN) as a composition
of a feature extractor
f(.;θ)
with parameters
θ
and a classification model with weights
W
. The
feature extractor transforms the input images into a feature space
F Rn
. The classification model
takes the features generated by the feature extractor, and generates an output vector followed by a
softmax function to generate multi-class probabilities. In this paper, we use a pre-trained feature
extractor, therefore parameters
θ
are fixed. Thus, we incrementally finetune the classification model
on
D1, D2, ...
and get parameters
W1, W 2, ...
. In an increment
t
, we expand the output layer by
|Yt|
neurons to incorporate new classes. Note that this setup does not alleviate the three challenges of
FoCAL mentioned above. The subsections below describe the main components of our framework
(Figure 1) to transform this setup for FoCAL.
2.1 GMM Based Continual Learning (GBCL)
We aim to develop a model that not only helps the system with continual learning but is also motivated
by the newness of an object. To accomplish this, we must evaluate how different an incoming object
is from previously learned object classes, ideally without any additional supervision. To accomplish
this, we consider a clustering-based approach to represent the distribution of object classes. Unlike
previous works on clustering-based approaches for continual learning [
10
,
11
] that represent the
object classes as mean feature vectors (centroids), we estimate the distribution of the each object class
using a uniform Gaussian mixture model (GMM). We believe that representing each class data as a
GMM may better represent the true distribution of the data rather than assuming that the distribution
is circular. We call our complete algorithm for continually learning GMMs of multiple object classes
as GMM based continual learning (GBCL).
Once the
k
feature vectors (
Dt
) selected by acquisition function (Section 2.2) as most informative
samples are labeled by the oracle in increment
t
, GBCL is applied to learn GMMs for the classes
Yt
.
For each
i
th feature vector
xt
i
in
Dt
labeled as
yt
i
, if
yt
i
is a new class never seen by the model before,
we initialize a new Gaussian distribution
N(xt
i, O)
for class
y
with
xt
i
as the mean (centroid) and a
zero matrix (
O
) as the covariance matrix
2
. Otherwise, if
yt
i
is a known class, we find the probabilities
N(xt
i|cy
1, σy
1), ..., N(xt
i|cy
j, σy
j), ..., N(xt
i|cy
ny, σy
ny)
for
xy
i
to belong to all the previously learned
Gaussian distributions for class
y
, where
ny
is the total number of mixture components in the GMM
for class
y
, and
cy
j
and
σy
j
represent the centroid and covariance matrix for the
j
th mixture component
of class
y
, respectively. If the maximum probability among the calculated probabilities for all the
distributions is higher than a pre-defined probability threshold
P
,
xt
i
is used to update the parameters
(centroid and covariance matrix) of the most probable distribution (
N(cy
j, σy
j)
) in class
y
. The
updated centroid ˆcy
jis calculated as a weighted mean between the previous centroid cy
jand xt
i:
ˆcy
j=wy
j×cy
j+xt
i
wy
j+ 1 (1)
where,
wy
j
is the number of images already clustered in the
j
th (most probable) Gaussian distribution.
The updated covariance matrix ˆσy
jis calculated based on the procedure described in [23]):
ˆσy
j=wy
j1
wy
j
σy
j+wy
j1
wy
j
2(xt
iˆ
cy
j)T(xt
iˆcy
j)(2)
where,
σy
j
is the previous covariance matrix and
(xt
iˆ
(c)y
j)T(xt
iˆcy
j)
represents the covariance
between
xt
i
and
ˆcy
j
. If, on the other hand, the maximum probability among the calculated probabilities
2
We do not describe mixing coefficients here, as they will always be
1/n
for a uniform GMM, where
n
is the
number of mixture components.
3
for all the distributions is lower than
P
, a new Gaussian distribution
N(xt
i, O)
is created for class
y
with xt
ias the centroid and Oas the covariance matrix.
The result of this process is a set of
Nt
uniform GMMs with parameters
φ1, φ2, ..., φNt
for
Nt
classes learned up till increment
t
. Note that instead of using the number of mixture components as a
hyperparameter, we use the probability threshold. This way we can start with a simple distribution
model for each class (a single mixture component) and add more mixture components only when the
new images of the class are too dissimilar from the previous mixture components, and thus cannot
be modeled by the GMM. Therefore, the total number of mixture components for each class can be
different based on the similarity between the images of the class. In section 2.2, we use the same idea
of dissimilarity between an unlabeled image and a GMM to predict most informative samples.
2.1.1 Pseudo-rehearsal and Classifier Training
To avoid catastrophic forgetting, we use pseudo-rehearsal [
20
] to replay the old classes when learning
from new data in increment
t
. For pseudo-rehearsal, we sample the Gaussian distributions in the
GMMs of all the previously learned classes to generate a set of pseudo-feature vectors. Note that
we also store the total number of images clustered in each Gaussian distribution of the classes (
wy
j
)
during the GMM learning phase (Section 2.1). Therefore, we generate the same number of pseudo-
feature vectors as the original number of images for each class. After generating the pseudo-feature
vectors, the classifier model
C(.;W)
is trained using the labeled dataset
Dt
in increment
t
, and the
pseudo-feature vectors of the previous classes.
For classification of a test image
x
, it is first passed through the feature extractor
f(x, θ)
and then
through the classifier
C(f(x, θ), W )
. Softmax function (
σ
) is then applied on the output to generate
class probabilities, and the class
y
with the maximum probability is predicted as the label for the
test image: y= argmaxyσ(WTf(x, θ)).
2.2 Active Learning using GMMs
We quantify the novelty of an object in terms of how much the model is uncertain about the object.
Unlike most active learning setups [
18
,
14
], in FoCAL the model does not have access to a training
set in each increment for training the model to predict uncertain object classes. Further, even if the
model does get trained to predict unknown object classes in the first increment, it will catastrophically
forget the criterion of novelty as it continually learns new object classes in the subsequent increments
(unknown classes in the first increment become known to the model in the subsequent increments).
Therefore, we do not train our model for active learning, and instead use the GMM representations of
the previously learned object classes to predict the most uncertain objects.
Considering the FoCAL setup (as described in Section 2), in an increment
t
, the AI agent gets an
unlabeled dataset
Dt
pool
and it must find
k < |Dt
pool|
most informative object samples from the
dataset to be labeled. To develop an acquisition function for this, we use a combination of two active
learning techniques applied to the GMM representations of the previously learned object classes.
First, we use the prediction entropy H[y|xt
i]of an object as the acquisition function [21]:
H[y|xt
i] =
Nt1
X
y=1
p(y=y|xt
i)logp(y=y|xt
i)(3)
For an unlabeled data point
xt
iDt
pool
, we find the predictive probability of
xt
i
using the GMM
representation of the object classes learned in the previous increments. The predictive probability
p(xt
i|φy)for of xt
ito belong to the GMM of a class ycan be defined as:
p(xt
i|φy) = 1
ny
ny
X
j=1
N(xt
i|cy
j, σy
j)(4)
Intuitively, if a sample
xt
i
is already learned by the AI agent, then its probability to belong to one of
the previously learned class GMMs must be high, and thus entropy for
xt
i
must be low. Therefore,
top ksamples with the highest entropy can be chosen as the most informative samples.
4
摘要:

Few-ShotContinualActiveLearningbyaRobotAliAyubUniversityofWaterlooWaterloo,ONN2L3G1,Canadaa9ayub@uwaterloo.caCarterFendleyCapitalOneNewYork,NY10017,USAccf5164@psu.eduAbstractInthispaper,weconsiderachallengingbutrealisticcontinuallearningproblem,Few-ShotContinualActiveLearning(FoCAL),whereaCLagentisp...

展开>> 收起<<
Few-Shot Continual Active Learning by a Robot Ali Ayub University of Waterloo.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:1.97MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注