Efficient Gaussian Process Model on Class-Imbalanced Datasets for Generalized Zero-Shot Learning

2025-04-26 0 0 1.31MB 12 页 10玖币
侵权投诉
Efficient Gaussian Process Model on
Class-Imbalanced Datasets
for Generalized Zero-Shot Learning
Changkun Ye
Australian National University & Data61 CSIRO
Canberra, ACT, Australia
Email: changkun.ye@anu.edu.au
Nick Barnes
Australian National University
Canberra, ACT, Australia
Email: nick.barnes@anu.edu.au
Lars Petersson and Russell Tsuchida
Data61 CSIRO
Canberra, ACT, Australia
Email: lars.petersson@data61.csiro.au
russell.tsuchida@data61.csiro.au
Abstract—Zero-Shot Learning (ZSL) models aim to classify
object classes that are not seen during the training process.
However, the problem of class imbalance is rarely discussed,
despite its presence in several ZSL datasets. In this paper, we
propose a Neural Network model that learns a latent feature
embedding and a Gaussian Process (GP) regression model that
predicts latent feature prototypes of unseen classes. A calibrated
classifier is then constructed for ZSL and Generalized ZSL tasks.
Our Neural Network model is trained efficiently with a simple
training strategy that mitigates the impact of class-imbalanced
training data. The model has an average training time of 5
minutes and can achieve state-of-the-art (SOTA) performance
on imbalanced ZSL benchmark datasets like AWA2, AWA1 and
APY, while having relatively good performance on the SUN and
CUB datasets.
I. INTRODUCTION
Zero-Shot Learning (ZSL) requires a model to be trained on
images that show examples from one set of classes, referred
to as seen classes, while being tested on images that show
examples from another set of classes, referred to as unseen
classes. During training, semantic information for both seen
classes and unseen classes is provided to help infer the
appearance of unseen classes.
Many previous works, such as [1], [2], [3], [4], [5], focus on
learning a mapping between image features depicting certain
classes and their corresponding semantic vectors. GFZSL [3]
proposed a model similar to Kernel Ridge Regression to
predict image features of unseen classes. GDAN [4] and f-
CLSWGAN [5] utilize generative models like GAN [6] and
VAE [7] to achieve the same objective.
On the basis of these approaches, recent papers further learn
a Neural Network (NN) projection from image feature space
to a latent embedding space, where inter-class features can
be better separated within each ZSL dataset [8], [9], [10],
[11], [12]. For example, in [9], image features are projected
to a latent space in order to “remove redundant information”.
FREE [10] adopts the same structure for “feature refinement”
purposes. CE-GZSL [12] also proposes a similar approach to
generate a “contrastive embedding” of image features.
Previous models, however, do not typically concern them-
selves with the class-imbalanced data distributions of ZSL
Class-Balanced
Triplet Loss Gaussian Process
Regression
Seen Classes Unseen Classes
Fig. 1: We first train a latent embedding model for image fea-
tures. The model is trained with Class-Balanced Triplet loss in
order to separate inter-class features, which is robust to class-
imbalanced datasets. Then a Gaussian Process Regression
model is proposed to predict unseen class prototypes based
on seen class prototypes and semantic correlations between
classes. Finally, our ZSL classifier is constructed based on the
prototypes.
datasets. In the real visual world, visual datasets usually
exhibit an imbalanced data distribution among categories [13].
In supervised learning, the class imbalance problem can have
significant impact on the performance of classification models
[14], [15]. For the ZSL problem, the APY [16] dataset has
nearly 1/3of samples belonging to the same class. AWA2
[17] has 1645 samples in one class and only 100 samples in
another. Clearly, the class imbalance problem is not negligible
when training a classification model on these datasets.
On the other hand, recent models usually have complicated
structures that require strong regularizers in order to prevent
overfitting on seen class samples. As a consequence, these
models usually have long training times and heavy GPU
memory usage. The average training time for DVBE [11] is
over 2 hours on each ZSL dataset. This fact motivates us to
search for alternative models that are simpler and less prone
to overfitting.
arXiv:2210.06120v1 [cs.CV] 11 Oct 2022
In this paper, we adopt the idea of projecting image features
in a latent embedding space via a Neural Network (NN) model.
We propose a class-balanced triplet loss that separate image
features in a latent embedding space for class-imbalanced
datasets. We also propose a Gaussian Process (GP) model to
learn a mapping between features and a semantic space. The
classical Gaussian Process (GP), when used in the setting of
regression, is robust to overfitting [18]. If training and testing
data come from the same distribution, a PAC-Bayesian Bound
[19] guarantees that the training error will be close to the
testing error.
Our experiments demonstrate that our model, though em-
ploying a simple design, can reach SOTA performance on the
class-imbalanced ZSL datasets AWA1, AWA2 and APY in the
Generalized ZSL setting.
The main contributions of our work are:
1) We propose a novel simple framework for ZSL, where
image features from a deep Neural Network are mapped
into a latent embedding space to generate latent pro-
totypes for each seen class by a novel triplet training
model. Then a Gaussian Process (GP) regression model
is trained via maximizing the marginal likelihood to
predict latent prototypes of unseen classes.
2) The mapping from image features to a latent space is
performed by our proposed triplet training model for
ZSL learning, using a novel triplet loss that is robust on
class-imbalanced ZSL datasets. Our experiments show
improved performance over the traditional triplet loss
on all ZSL datasets, including SOTA performance on
class-imbalanced datasets, specifically, AWA1, AWA2
and APY.
3) Given feature vectors extracted by a pre-trained ResNet,
our model has an average training time of 5 minutes on
all ZSL datasets, faster than several SOTA models that
have high accuracy.
II. RELATED WORKS
Traditional and Generalized ZSL: Early ZSL research
adopts a so-called Traditional ZSL setting [1], [20]. The
Traditional ZSL requires the model to train on images of
seen classes and semantic vectors of seen and unseen classes.
Test images are restricted to the unseen classes. However, in
practice, test images may also come from the seen classes
[17]. The Generalized ZSL setting was proposed to address
the problem of including both seen and unseen images in the
test set. According to Xian et al. [17], models that have good
performance in the Traditional ZSL setting may not work well
in the Generalized ZSL setting.
Prototypical Methods. Our classification model is related
to prototypical methods proposed in Zero-Shot and Few-
Shot learning [21], [22], [23]. In the prototypical methods,
a prototype is learned for each class to help classification. For
example, Snell et al. [21] propose a neural network to learn a
projection from semantic vectors to feature prototypes of each
class. Test samples are classified via Nearest Neighbor among
prototypes. While the classification process of our model is
similar to prototypical methods, our model uses a Gaussian
Process Regression instead of Neural Networks to predict
prototypes of unseen classes.
Inductive and Transductive ZSL: Similar to most ZSL
models, the model we propose is an inductive ZSL model.
Inductive ZSL requires that no feature information of unseen
classes is present during the training phase [17]. Models that
introduce unlabeled unseen images during the training phase
are called transductive ZSL models [24]. Ensuring a fair
comparison, results from such models are usually compared
separately to inductive models since additional information is
introduced [3], [25], [26].
Triplet Loss. Many ZSL models have proposed a triplet
loss in their framework to help separate samples from different
classes. Chacheux et al. [8] proposed a variant of a triplet loss
in their model to learn feature prototypes for different classes.
Han et al. [9] adopt an improved version of the triplet loss
called “center loss” proposed in [27] that separates samples
in a latent space. Unlike their models, we notice that current
triplet losses proposed for the ZSL problem may not perform
well on class-imbalanced datasets like AWA2, AWA1 and
APY. An improved version of the triplet loss training model
is proposed to mitigate this problem.
Gaussian Process Regression. For the ZSL problem,
Dolma et al. [28] proposed a model that performs k-nearest
neighbor search for test samples over training samples and per-
forms a GP regression based on the search result. Mukherjee
et al. [29] model image features and semantic vectors for each
class with Gaussians, and learns a linear projection between
the two distributions. Our model is closest to Elhoseiny
et al. [30], where Gaussian Process Regression is used to
predict unseen class prototypes based on seen class prototypes.
However, they used a Gaussian Process directly without the
benefit of a learned network model for feature embedding, and
showed relatively poor results. Verma and Rai [3] proposed a
Kernel Ridge Regression (KRR) approach called GFZSL for
the traditional ZSL problem. Our experiment demonstrates that
our model outperforms GFZSL by a large margin.
III. PROPOSED APPROACH
We propose a hybrid model for the ZSL problem: a Latent
Feature Embedding model to separate inter-class features that
is robust to class-imbalanced datasets, a GP Regression model
to predict prototypes of unseen classes based on seen classes
and semantic information and a calibrated classifier to balance
the trade-off between seen and unseen class accuracy.
A. Latent Feature Embedding Model
Model Structure We propose to learn a linear NN mapping
from image features to latent embeddings. We argue that for
the ZSL task, a linear projection with limited flexibility can
help prevent the model from overfitting on seen class training
samples. Following others [3], [28], we model feature vectors
from each class using the multivariate Gaussian distribution.
We exploit the fact that Gaussian random vectors are closed
under linear transformations.
Calibration Model
Gaussian Process Regression Model
Latent Feature Embedding Model
Gaussian Process
Latent Embedding Seen Class
Prototypes
CNN Backbone
Train
Predict
Seen Class
Attributes
Unseen Class
Attributes
Unseen Class
Prototypes
GZSL
Classifier
Class Balanced
Triplet Loss
Train Images
Fig. 2: Structure of our proposed model. Feature vectors fare projected to a latent embedding space xwhich is trained using
proposed Class-Balanced Triplet Loss. A GP model is proposed to predict latent prototypes of unseen classes µU, based on
latent prototypes of seen class µSand semantic vectors from seen and unseen class sS,sU.
For each feature vector fRNf eature , the latent embedding
xRNlatent can be written as:
x=wf +b.(1)
Here wRNlatent×Nf eature is a weight parameter matrix and
bRNlatent is a bias parameter vector.
Triplet Loss Revisited Triplet loss is often used to separate
samples from different training classes in the dataset [31].
The standard triplet loss aims to decrease distances between
intra-class samples and increase distances between inter-class
samples.
In each iteration, a mini-batch is sampled uniformly from
training data as: {xc1
1,xc1
2, ..., xc1
nc1,xc2
1,xc2
2, ...xc2
nc2, ...xcL
ncL}.
Here xci
jdenotes the jth our of total nciZ+samples that
belongs to training class ciC, i = 1,2, ..., L in mini-batch.
The batch size is N=Pncj. Then all possible triplet pairs
{xci
l,xci
m,xcj
n}are constructed within the given mini-batch.
In each triplet, xci
l,xci
mare different samples from the same
class ci,l, m [1,2, ..., nci],xcj
ncome from a different class
ci6=cj,n[1,2, ..., ncj]. The triplet loss is written as:
LT=X
ci,cj
nci
X
l=1
nci1
X
m=1
ncj
X
n=1
max(0,∆+(xci
lxci
m)2(xci
lxcj
n)2).
(2)
The Pci,cjdenotes summation over all training class pairs
ci, cjCthat have ci6=cj. Hyperparameter R+is
a positive threshold that balances the inter and intra class
distances [8], [32].
The class imbalance problem is not considered in the orig-
inal triplet loss. Moreover, models trained with a triplet loss
usually require many iterations until convergence, expensive
memory requirements and a high variance [32]. We thus
propose a new Class-Balanced Triplet loss to mitigate these
problems.
Class-Balanced Triplet Loss When training a model with
a triplet loss, a straight forward approach to tackle the class
imbalance problem is to sample class-balanced mini-batch
data. The model will not be affected by the class imbalance
problem if it is trained using class-balanced data.
In every iteration, unlike for the traditional triplet loss, we
generate a class-balanced mini-batch by sampling nCB Z+
data points from each one of Ltraining classes in the
training set as {xc1
1,xc1
2, ..., xc1
nCB ,xc2
1,xc2
2, ...xc2
nCB , ...xcL
nCB }.
The batch size becomes N=nCB ×L. In a supervised
classification setting, similar approaches have shown to be
effective [14].
We then propose a modified triplet loss LBT to train the
model on the mini-batch. For every mini-batch, the loss has
the form:
LBT =X
ci,cj
nCB
X
l=1
max(0,∆+(xci
lxci)2min
n(xcj
nxci)2).
(3)
The term xci=1
ncPnxci
ndenotes the average of samples
from class ciin the mini-batch. Replacing the term xci
mwith
xciin the original triplet loss can help reduce the variance
in the loss during training, which is similar to “center loss”
[27]. However, unlike their method, we are not adding extra
trainable parameters into the model. The min() operation is
performed over all samples xcj
min class cjin the mini-batch,
which can efficiently reduce computational costs.
With the help of the proposed triplet loss LBT , our model
can efficiently learn a latent embedding that separates samples
from different classes and maintains a good performance on
imbalanced datasets.
B. Gaussian Process (GP) Regression Model
We propose a GP Regression model to predict prototypes
of unseen classes, leveraging the generalization ability of GP
models. Like Mukherjee et al. [29], we obtain the average of
all latent features in each class µci=1
NciPxcias a prototype
for the corresponding class.
We also denote semantic vector of each class sci
RNsemantic . Given the semantic vectors sS= [sc1, ...scL]T
and feature prototypes µS= [µc1, ...µcL]Tfor seen class
c1, c2, ...cLCS, along with semantic vectors sU=
[scL+1 , ...scL+K]Tfor unseen classes cL+1, cL+2, ...cL+K
CU, we can use the GPR model to regress prototypes µU=
[µcL+1 , ...µcL+K]Tfor unseen classes cCU:
µU=fGP (sU|θ) + . (4)
Here fGP (s|θ)is the regression function, N (0, σ2)de-
notes the Gaussian random noise and θis the hyperparameter
in the model. θis trained given seen class semantic vectors
sSand corresponding prototypes µS.
摘要:

EfcientGaussianProcessModelonClass-ImbalancedDatasetsforGeneralizedZero-ShotLearningChangkunYeAustralianNationalUniversity&Data61CSIROCanberra,ACT,AustraliaEmail:changkun.ye@anu.edu.auNickBarnesAustralianNationalUniversityCanberra,ACT,AustraliaEmail:nick.barnes@anu.edu.auLarsPeterssonandRussellTsuc...

展开>> 收起<<
Efficient Gaussian Process Model on Class-Imbalanced Datasets for Generalized Zero-Shot Learning.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:1.31MB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注