Efﬁcient Gaussian Process Model on Class-Imbalanced Datasets for Generalized Zero-Shot Learning

2025-04-26 1 0 1.31MB 12 页 10玖币

侵权投诉

Efﬁcient Gaussian Process Model on

Class-Imbalanced Datasets

for Generalized Zero-Shot Learning

Changkun Ye

Australian National University & Data61 CSIRO

Canberra, ACT, Australia

Email: changkun.ye@anu.edu.au

Nick Barnes

Australian National University

Canberra, ACT, Australia

Email: nick.barnes@anu.edu.au

Lars Petersson and Russell Tsuchida

Data61 CSIRO

Canberra, ACT, Australia

Email: lars.petersson@data61.csiro.au

russell.tsuchida@data61.csiro.au

Abstract—Zero-Shot Learning (ZSL) models aim to classify

object classes that are not seen during the training process.

However, the problem of class imbalance is rarely discussed,

despite its presence in several ZSL datasets. In this paper, we

propose a Neural Network model that learns a latent feature

embedding and a Gaussian Process (GP) regression model that

predicts latent feature prototypes of unseen classes. A calibrated

classiﬁer is then constructed for ZSL and Generalized ZSL tasks.

Our Neural Network model is trained efﬁciently with a simple

training strategy that mitigates the impact of class-imbalanced

training data. The model has an average training time of 5

minutes and can achieve state-of-the-art (SOTA) performance

on imbalanced ZSL benchmark datasets like AWA2, AWA1 and

APY, while having relatively good performance on the SUN and

CUB datasets.

I. INTRODUCTION

Zero-Shot Learning (ZSL) requires a model to be trained on

images that show examples from one set of classes, referred

to as seen classes, while being tested on images that show

examples from another set of classes, referred to as unseen

classes. During training, semantic information for both seen

classes and unseen classes is provided to help infer the

appearance of unseen classes.

Many previous works, such as [1], [2], [3], [4], [5], focus on

learning a mapping between image features depicting certain

classes and their corresponding semantic vectors. GFZSL [3]

proposed a model similar to Kernel Ridge Regression to

predict image features of unseen classes. GDAN [4] and f-

CLSWGAN [5] utilize generative models like GAN [6] and

VAE [7] to achieve the same objective.

On the basis of these approaches, recent papers further learn

a Neural Network (NN) projection from image feature space

to a latent embedding space, where inter-class features can

be better separated within each ZSL dataset [8], [9], [10],

[11], [12]. For example, in [9], image features are projected

to a latent space in order to “remove redundant information”.

FREE [10] adopts the same structure for “feature reﬁnement”

purposes. CE-GZSL [12] also proposes a similar approach to

generate a “contrastive embedding” of image features.

Previous models, however, do not typically concern them-

selves with the class-imbalanced data distributions of ZSL

Class-Balanced

Triplet Loss Gaussian Process

Regression

Seen Classes Unseen Classes

Fig. 1: We ﬁrst train a latent embedding model for image fea-

tures. The model is trained with Class-Balanced Triplet loss in

order to separate inter-class features, which is robust to class-

imbalanced datasets. Then a Gaussian Process Regression

model is proposed to predict unseen class prototypes based

on seen class prototypes and semantic correlations between

classes. Finally, our ZSL classiﬁer is constructed based on the

prototypes.

datasets. In the real visual world, visual datasets usually

exhibit an imbalanced data distribution among categories [13].

In supervised learning, the class imbalance problem can have

signiﬁcant impact on the performance of classiﬁcation models

[14], [15]. For the ZSL problem, the APY [16] dataset has

nearly 1/3of samples belonging to the same class. AWA2

[17] has 1645 samples in one class and only 100 samples in

another. Clearly, the class imbalance problem is not negligible

when training a classiﬁcation model on these datasets.

On the other hand, recent models usually have complicated

structures that require strong regularizers in order to prevent

overﬁtting on seen class samples. As a consequence, these

models usually have long training times and heavy GPU

memory usage. The average training time for DVBE [11] is

over 2 hours on each ZSL dataset. This fact motivates us to

search for alternative models that are simpler and less prone

to overﬁtting.

arXiv:2210.06120v1 [cs.CV] 11 Oct 2022

In this paper, we adopt the idea of projecting image features

in a latent embedding space via a Neural Network (NN) model.

We propose a class-balanced triplet loss that separate image

features in a latent embedding space for class-imbalanced

datasets. We also propose a Gaussian Process (GP) model to

learn a mapping between features and a semantic space. The

classical Gaussian Process (GP), when used in the setting of

regression, is robust to overﬁtting [18]. If training and testing

data come from the same distribution, a PAC-Bayesian Bound

[19] guarantees that the training error will be close to the

testing error.

Our experiments demonstrate that our model, though em-

ploying a simple design, can reach SOTA performance on the

class-imbalanced ZSL datasets AWA1, AWA2 and APY in the

Generalized ZSL setting.

The main contributions of our work are:

1) We propose a novel simple framework for ZSL, where

image features from a deep Neural Network are mapped

into a latent embedding space to generate latent pro-

totypes for each seen class by a novel triplet training

model. Then a Gaussian Process (GP) regression model

is trained via maximizing the marginal likelihood to

predict latent prototypes of unseen classes.

2) The mapping from image features to a latent space is

performed by our proposed triplet training model for

ZSL learning, using a novel triplet loss that is robust on

class-imbalanced ZSL datasets. Our experiments show

improved performance over the traditional triplet loss

on all ZSL datasets, including SOTA performance on

class-imbalanced datasets, speciﬁcally, AWA1, AWA2

and APY.

3) Given feature vectors extracted by a pre-trained ResNet,

our model has an average training time of 5 minutes on

all ZSL datasets, faster than several SOTA models that

have high accuracy.

II. RELATED WORKS

Traditional and Generalized ZSL: Early ZSL research

adopts a so-called Traditional ZSL setting [1], [20]. The

Traditional ZSL requires the model to train on images of

seen classes and semantic vectors of seen and unseen classes.

Test images are restricted to the unseen classes. However, in

practice, test images may also come from the seen classes

[17]. The Generalized ZSL setting was proposed to address

the problem of including both seen and unseen images in the

test set. According to Xian et al. [17], models that have good

performance in the Traditional ZSL setting may not work well

in the Generalized ZSL setting.

Prototypical Methods. Our classiﬁcation model is related

to prototypical methods proposed in Zero-Shot and Few-

Shot learning [21], [22], [23]. In the prototypical methods,

a prototype is learned for each class to help classiﬁcation. For

example, Snell et al. [21] propose a neural network to learn a

projection from semantic vectors to feature prototypes of each

class. Test samples are classiﬁed via Nearest Neighbor among

prototypes. While the classiﬁcation process of our model is

similar to prototypical methods, our model uses a Gaussian

Process Regression instead of Neural Networks to predict

prototypes of unseen classes.

Inductive and Transductive ZSL: Similar to most ZSL

models, the model we propose is an inductive ZSL model.

Inductive ZSL requires that no feature information of unseen

classes is present during the training phase [17]. Models that

introduce unlabeled unseen images during the training phase

are called transductive ZSL models [24]. Ensuring a fair

comparison, results from such models are usually compared

separately to inductive models since additional information is

introduced [3], [25], [26].

Triplet Loss. Many ZSL models have proposed a triplet

loss in their framework to help separate samples from different

classes. Chacheux et al. [8] proposed a variant of a triplet loss

in their model to learn feature prototypes for different classes.

Han et al. [9] adopt an improved version of the triplet loss

called “center loss” proposed in [27] that separates samples

in a latent space. Unlike their models, we notice that current

triplet losses proposed for the ZSL problem may not perform

well on class-imbalanced datasets like AWA2, AWA1 and

APY. An improved version of the triplet loss training model

is proposed to mitigate this problem.

Gaussian Process Regression. For the ZSL problem,

Dolma et al. [28] proposed a model that performs k-nearest

neighbor search for test samples over training samples and per-

forms a GP regression based on the search result. Mukherjee

et al. [29] model image features and semantic vectors for each

class with Gaussians, and learns a linear projection between

the two distributions. Our model is closest to Elhoseiny

et al. [30], where Gaussian Process Regression is used to

predict unseen class prototypes based on seen class prototypes.

However, they used a Gaussian Process directly without the

beneﬁt of a learned network model for feature embedding, and

showed relatively poor results. Verma and Rai [3] proposed a

Kernel Ridge Regression (KRR) approach called GFZSL for

the traditional ZSL problem. Our experiment demonstrates that

our model outperforms GFZSL by a large margin.

III. PROPOSED APPROACH

We propose a hybrid model for the ZSL problem: a Latent

Feature Embedding model to separate inter-class features that

is robust to class-imbalanced datasets, a GP Regression model

to predict prototypes of unseen classes based on seen classes

and semantic information and a calibrated classiﬁer to balance

the trade-off between seen and unseen class accuracy.

A. Latent Feature Embedding Model

Model Structure We propose to learn a linear NN mapping

from image features to latent embeddings. We argue that for

the ZSL task, a linear projection with limited ﬂexibility can

help prevent the model from overﬁtting on seen class training

samples. Following others [3], [28], we model feature vectors

from each class using the multivariate Gaussian distribution.

We exploit the fact that Gaussian random vectors are closed

under linear transformations.

Calibration Model

Gaussian Process Regression Model

Latent Feature Embedding Model

Gaussian Process

Latent Embedding Seen Class

Prototypes

CNN Backbone

Train

Predict

Seen Class

Attributes

Unseen Class

Attributes

Unseen Class

Prototypes

GZSL

Classifier

Class Balanced

Triplet Loss

Train Images

Fig. 2: Structure of our proposed model. Feature vectors fare projected to a latent embedding space xwhich is trained using

proposed Class-Balanced Triplet Loss. A GP model is proposed to predict latent prototypes of unseen classes µU, based on

latent prototypes of seen class µSand semantic vectors from seen and unseen class sS,sU.

For each feature vector f∈RNf eature , the latent embedding

x∈RNlatent can be written as:

x=wf +b.(1)

Here w∈RNlatent×Nf eature is a weight parameter matrix and

b∈RNlatent is a bias parameter vector.

Triplet Loss Revisited Triplet loss is often used to separate

samples from different training classes in the dataset [31].

The standard triplet loss aims to decrease distances between

intra-class samples and increase distances between inter-class

samples.

In each iteration, a mini-batch is sampled uniformly from

training data as: {xc1

1,xc1

2, ..., xc1

nc1,xc2

1,xc2

2, ...xc2

nc2, ...xcL

ncL}.

Here xci

jdenotes the jth our of total nci∈Z+samples that

belongs to training class ci∈C, i = 1,2, ..., L in mini-batch.

The batch size is N=Pncj. Then all possible triplet pairs

{xci

l,xci

m,xcj

n}are constructed within the given mini-batch.

In each triplet, xci

l,xci

mare different samples from the same

class ci,l, m ∈[1,2, ..., nci],xcj

ncome from a different class

ci6=cj,n∈[1,2, ..., ncj]. The triplet loss is written as:

LT=X

ci,cj

nci

l=1

nci−1

m=1

ncj

n=1

max(0,∆+(xci

l−xci

m)2−(xci

l−xcj

n)2).

(2)

The Pci,cjdenotes summation over all training class pairs

ci, cj∈Cthat have ci6=cj. Hyperparameter ∆∈R+is

a positive threshold that balances the inter and intra class

distances [8], [32].

The class imbalance problem is not considered in the orig-

inal triplet loss. Moreover, models trained with a triplet loss

usually require many iterations until convergence, expensive

memory requirements and a high variance [32]. We thus

propose a new Class-Balanced Triplet loss to mitigate these

problems.

Class-Balanced Triplet Loss When training a model with

a triplet loss, a straight forward approach to tackle the class

imbalance problem is to sample class-balanced mini-batch

data. The model will not be affected by the class imbalance

problem if it is trained using class-balanced data.

In every iteration, unlike for the traditional triplet loss, we

generate a class-balanced mini-batch by sampling nCB ∈Z+

data points from each one of Ltraining classes in the

training set as {xc1

1,xc1

2, ..., xc1

nCB ,xc2

1,xc2

2, ...xc2

nCB , ...xcL

nCB }.

The batch size becomes N=nCB ×L. In a supervised

classiﬁcation setting, similar approaches have shown to be

effective [14].

We then propose a modiﬁed triplet loss LBT to train the

model on the mini-batch. For every mini-batch, the loss has

the form:

LBT =X

ci,cj

nCB

l=1

max(0,∆+(xci

l−xci)2−min

n(xcj

n−xci)2).

(3)

The term xci=1

ncPnxci

ndenotes the average of samples

from class ciin the mini-batch. Replacing the term xci

mwith

xciin the original triplet loss can help reduce the variance

in the loss during training, which is similar to “center loss”

[27]. However, unlike their method, we are not adding extra

trainable parameters into the model. The min() operation is

performed over all samples xcj

min class cjin the mini-batch,

which can efﬁciently reduce computational costs.

With the help of the proposed triplet loss LBT , our model

can efﬁciently learn a latent embedding that separates samples

from different classes and maintains a good performance on

imbalanced datasets.

B. Gaussian Process (GP) Regression Model

We propose a GP Regression model to predict prototypes

of unseen classes, leveraging the generalization ability of GP

models. Like Mukherjee et al. [29], we obtain the average of

all latent features in each class µci=1

NciPxcias a prototype

for the corresponding class.

We also denote semantic vector of each class sci∈

RNsemantic . Given the semantic vectors sS= [sc1, ...scL]T

and feature prototypes µS= [µc1, ...µcL]Tfor seen class

c1, c2, ...cL∈CS, along with semantic vectors sU=

[scL+1 , ...scL+K]Tfor unseen classes cL+1, cL+2, ...cL+K∈

CU, we can use the GPR model to regress prototypes µU=

[µcL+1 , ...µcL+K]Tfor unseen classes c∈CU:

µU=fGP (sU|θ) + . (4)

Here fGP (s|θ)is the regression function, ∼ N (0, σ2)de-

notes the Gaussian random noise and θis the hyperparameter

in the model. θis trained given seen class semantic vectors

sSand corresponding prototypes µS.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EfcientGaussianProcessModelonClass-ImbalancedDatasetsforGeneralizedZero-ShotLearningChangkunYeAustralianNationalUniversity&Data61CSIROCanberra,ACT,AustraliaEmail:changkun.ye@anu.edu.auNickBarnesAustralianNationalUniversityCanberra,ACT,AustraliaEmail:nick.barnes@anu.edu.auLarsPeterssonandRussellTsuc...

展开>> 收起<<

Efﬁcient Gaussian Process Model on Class-Imbalanced Datasets for Generalized Zero-Shot Learning.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Efﬁcient Gaussian Process Model on Class-Imbalanced Datasets for Generalized Zero-Shot Learning

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: