Adversarial Lagrangian Integrated Contrastive Embedding for Limited Size Datasets_2

2025-04-27 0 0 785.3KB 36 页 10玖币

侵权投诉

Adversarial Lagrangian Integrated Contrastive

Embedding for Limited Size Datasets

Amin Jalalia, Minho Leea,b

aKNU-LG Electronics Convergence Research Center, AI Institute of Technology, Kyungpook

National University, Daegu, 41566, South Korea

bGraduate School of Artiﬁcial Intelligence, Kyungpook National University, Daegu, 41566,

South Korea

Abstract

Certain datasets contain a limited number of samples with highly various styles

and complex structures. This study presents a novel adversarial Lagrangian

integrated contrastive embedding (ALICE) method for small-sized datasets.

First, the accuracy improvement and training convergence of the proposed pre-

trained adversarial transfer are shown on various subsets of datasets with few

samples. Second, a novel adversarial integrated contrastive model using vari-

ous augmentation techniques is investigated. The proposed structure considers

the input samples with diﬀerent appearances and generates a superior repre-

sentation with adversarial transfer contrastive training. Finally, multi-objective

augmented Lagrangian multipliers encourage the low-rank and sparsity of the

presented adversarial contrastive embedding to adaptively estimate the coeﬃ-

cients of the regularizers automatically to the optimum weights. The sparsity

constraint suppresses less representative elements in the feature space. The

low-rank constraint eliminates trivial and redundant components and enables

superior generalization. The performance of the proposed model is veriﬁed by

conducting ablation studies by using benchmark datasets for scenarios with

small data samples.

Keywords: Deep learning, adversarial transfer contrastive embedding, small

Email address: max.jalali@gmail.com, mholee@gmail.com (Corresponding author)

(Minho Lee)

Preprint submitted to Journal of Neural Networks October 10, 2022

arXiv:2210.03261v1 [cs.CV] 6 Oct 2022

and limited datasets, sparsity and low-rank constraints, augmented

Lagrangian multipliers.

1. Introduction

Recent developments in deep learning can be attributed to the increased

availability of big data, improvements in hardware and software, and increased

speed of the training processes. However, deep neural networks still pose chal-

lenges when trained on small datasets. Fine-tuning in small-sample scenarios

is prone to overﬁtting because ﬁtting the model on large-scale parameters with

a small number of samples is ill-posed. The limited amount of data samples

impedes the use of deep structures in a wide range of applications because

achieving good performance requires a large dataset (Jiang et al., 2022; Jalali

& Lee, 2020). Moreover, the ﬁne-tuning of a model by using a limited datasets

can cause overﬁtting in the downstream target task, particularly when a dis-

tributional data gap exists between the pre-trained model and the target task

(Aghajanyan et al., 2020; Jalali et al., 2017; Keisham et al., 2022). It is also

explored that by increasing the model size and training time, the performance of

the target tasks gradually saturates to a ﬁxed output (Abnar et al., 2021), and

in some cases, the performance of the target task is at odds with the pre-trained

models. Chen et al. (2019) noted that when a suﬃcient number of training sam-

ples is available, the spectral components with tiny singular values disappear

during the ﬁne-tuning process, implying that small singular values correlate to

undesirable pre-trained transfer and may result in a negative transfer. In the

ﬁne-tuning process with a supervised objective, self-tuning (Wang et al., 2021)

can be used to introduce a pseudo-group contrastive approach for evaluating the

intrinsic structure of the target domain. Contrastive learning (CL) can be em-

ployed to develop generalizable visual features with superior eﬃciency on target

tasks by ﬁne-tuning a classiﬁer on the top of model representations. Fan et al.

(2021) analyzed CL in terms of robustness improvement, demonstrating that

high-frequency contrastive visual components and the use of feature clustering

are advantageous to model robustness without compromising the accuracy.

In this study, we propose a novel augmented Lagrangian tuning method for

an adversarial integrated contrastive embedding model. Adversarially trained

models usually have lower accuracy than those trained using the natural training

paradigm. However, they perform better when employed for transfer learning

to downstream tasks because they generate richer features. We attempt to

generate a high feature representation by considering adversarial examples as

additional data samples in the training process and do not intend to increase the

model robustness against the adversarial examples. The adversarial contrastive

mechanism has better transferable knowledge that enables better generalization

of small samples. The Lagrangian multiplier intends to adaptively tune the

sparsity, low-rank, and accuracy coeﬃcients of the model.

We summarize our main contributions as follows:

•We propose a novel adversarial Lagrangian integrated contrastive embed-

ding called ALICE for small-sized datasets and show the eﬀectiveness

of the proposed model on benchmark datasets, namely, CIFAR100, CI-

FAR10, SVHN, Aircraft, Pets, and Nancho. We introduce the Lagrangian

algorithm in the context of adversarial contrastive embedding (AdvCont)

to adaptively estimate the coeﬃcients of the constraints.

•We show that the proposed adversarial transfer representation aﬀects

downstream target datasets by improving the accuracy and converging

faster than the standard training process on diﬀerent subsets of the datasets

with fewer samples.

•We present the novel adversarial integrated contrastive model with various

augmentation techniques that are ﬁne-tuned with sparsity and low-rank

constraint regularizations. The sparsity constraint suppresses less repre-

sentative elements in the feature space. The low-rank constraint eliminates

the trivial and redundant components and facilitates good generalization.

The remainder of this paper is organized as follows. Related works are

explained in Section 2. The proposed ALICE method is described in Section 3.

The experiments are detailed in Section 4. Finally, the conclusions are presented

in Section 5.

2. Related Works

In this section, related works regarding transfer learning on small datasets,

contrastive learning, and adversarial learning are explained.

2.1. Transfer learning on small datasets

Transfer learning is a technique for achieving high performance in a range

of tasks with limited training data. Variant styles, size variability, and short-

age of class samples (Jalali & Lee, 2019; Jalali et al., 2021) make it challeng-

ing to achieve good recognition of small-sized datasets. Instead of discovering

completely diﬀerent representations, ﬁne-tuning makes greater use of existing

internal representations (Li et al., 2020a). The model layers have diﬀerent trans-

fer abilities. The ﬁrst layers contain general features, middle layers consist of

semantic features, and ﬁnal layers comprise task-speciﬁc features. Therefore,

diﬀerent layers should be treated based on the knowledge that is retrained to

appropriately ﬁt a target task. Speciﬁcally, the ﬁrst- and mid-layer knowledge

should be retained, whereas the last-layer knowledge is adapted to the down-

stream target tasks. L2 norm with starting point (L2-SP) method utilized L2

constraint regularization to explicitly favor the ﬁnal target weights to be simi-

lar to the pre-trained features (Xuhong et al., 2018; Jalali et al., 2015). L2-SP

demonstrated the eﬀectiveness of establishing an explicit inductive bias towards

the original model, and it suggested the L2regularization penalty term for trans-

fer learning, with the pre-trained model functioning as a baseline model. Mix

& Match (Zhan et al., 2018) employed proxy tasks that were able to generate

discriminative representations in the target domain tasks. They developed a

mixed process that sparsely selects and combines patches from the target do-

main to generate diversiﬁed features from local patch attributes. Subsequently,

a matching process constructed a class-wise linked graph, which helped derive

a triplet discriminative objective function to ﬁne-tune the network. To prevent

negative transfer, batch spectral shrinkage (BSS) method (Chen et al., 2019) pe-

nalized lower singular values to decrease non-transferable spectral components.

BSS is a regularization method that suppresses and shrinks non-transferable

spectral elements for better ﬁne-tuning. DELTA (Li et al., 2020b) presented

an attention mechanism to select more discriminative features such that the

distance between the features of the pre-trained and target model is regular-

ized. DELTA attempted to preserve higher transferability from a source to a

target task. During the ﬁne-tuning process, RIFLE (Re-initializing the fully

connected Layer) (Li et al., 2020a) allowed in-depth back-propagation in the

transfer learning context by regularly re-initializing the fully connected layers

with randomized weight values. RIFLE used an explicit algorithmic regular-

ization strategy to improve low-level feature learning and the accuracy of deep

transfer learning. Bi-tuning (Zhong et al., 2020) is a method for ﬁne-tuning that

incorporates two heads into the backbone of pre-trained models. A projector

head with a categorical contrastive loss is used to utilize the intrinsic structure

of the data samples, and a classiﬁer head with a contrastive loss function is

used to consider label information in a contrasting manner. HEAD2TOE (Evci

et al., 2022) investigated selecting valuable intermediate features from all levels

of a pre-trained structure. This approach achieved superior transfer perfor-

mance when the target aﬃnity diﬀered from the source domain, meaning that

the distribution shift was high, and the source-target domain overlap was low.

2.2. Contrastive learning

Compared with the standard training process, contrastive training resulted

in better performance on downstream classiﬁcation tasks (Khosla et al., 2021).

The standard supervised pre-training transferred high-level feature representa-

tions, whereas contrastive counterpart transferred low- and mid-level feature

representations. When the downstream task was diﬀerent from the pre-trained

task, the standard pre-training approach had the risk of overﬁtting high-level

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AdversarialLagrangianIntegratedContrastiveEmbeddingforLimitedSizeDatasetsAminJalalia,MinhoLeea,baKNU-LGElectronicsConvergenceResearchCenter,AIInstituteofTechnology,KyungpookNationalUniversity,Daegu,41566,SouthKoreabGraduateSchoolofArticialIntelligence,KyungpookNationalUniversity,Daegu,41566,SouthKo...

展开>> 收起<<

Adversarial Lagrangian Integrated Contrastive Embedding for Limited Size Datasets_2.pdf

共36页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Adversarial Lagrangian Integrated Contrastive Embedding for Limited Size Datasets_2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: