Adversarial Lagrangian Integrated Contrastive Embedding for Limited Size Datasets_2

2025-04-27 0 0 785.3KB 36 页 10玖币
侵权投诉
Adversarial Lagrangian Integrated Contrastive
Embedding for Limited Size Datasets
Amin Jalalia, Minho Leea,b
aKNU-LG Electronics Convergence Research Center, AI Institute of Technology, Kyungpook
National University, Daegu, 41566, South Korea
bGraduate School of Artificial Intelligence, Kyungpook National University, Daegu, 41566,
South Korea
Abstract
Certain datasets contain a limited number of samples with highly various styles
and complex structures. This study presents a novel adversarial Lagrangian
integrated contrastive embedding (ALICE) method for small-sized datasets.
First, the accuracy improvement and training convergence of the proposed pre-
trained adversarial transfer are shown on various subsets of datasets with few
samples. Second, a novel adversarial integrated contrastive model using vari-
ous augmentation techniques is investigated. The proposed structure considers
the input samples with different appearances and generates a superior repre-
sentation with adversarial transfer contrastive training. Finally, multi-objective
augmented Lagrangian multipliers encourage the low-rank and sparsity of the
presented adversarial contrastive embedding to adaptively estimate the coeffi-
cients of the regularizers automatically to the optimum weights. The sparsity
constraint suppresses less representative elements in the feature space. The
low-rank constraint eliminates trivial and redundant components and enables
superior generalization. The performance of the proposed model is verified by
conducting ablation studies by using benchmark datasets for scenarios with
small data samples.
Keywords: Deep learning, adversarial transfer contrastive embedding, small
Email address: max.jalali@gmail.com, mholee@gmail.com (Corresponding author)
(Minho Lee)
Preprint submitted to Journal of Neural Networks October 10, 2022
arXiv:2210.03261v1 [cs.CV] 6 Oct 2022
and limited datasets, sparsity and low-rank constraints, augmented
Lagrangian multipliers.
1. Introduction
Recent developments in deep learning can be attributed to the increased
availability of big data, improvements in hardware and software, and increased
speed of the training processes. However, deep neural networks still pose chal-
lenges when trained on small datasets. Fine-tuning in small-sample scenarios
is prone to overfitting because fitting the model on large-scale parameters with
a small number of samples is ill-posed. The limited amount of data samples
impedes the use of deep structures in a wide range of applications because
achieving good performance requires a large dataset (Jiang et al., 2022; Jalali
& Lee, 2020). Moreover, the fine-tuning of a model by using a limited datasets
can cause overfitting in the downstream target task, particularly when a dis-
tributional data gap exists between the pre-trained model and the target task
(Aghajanyan et al., 2020; Jalali et al., 2017; Keisham et al., 2022). It is also
explored that by increasing the model size and training time, the performance of
the target tasks gradually saturates to a fixed output (Abnar et al., 2021), and
in some cases, the performance of the target task is at odds with the pre-trained
models. Chen et al. (2019) noted that when a sufficient number of training sam-
ples is available, the spectral components with tiny singular values disappear
during the fine-tuning process, implying that small singular values correlate to
undesirable pre-trained transfer and may result in a negative transfer. In the
fine-tuning process with a supervised objective, self-tuning (Wang et al., 2021)
can be used to introduce a pseudo-group contrastive approach for evaluating the
intrinsic structure of the target domain. Contrastive learning (CL) can be em-
ployed to develop generalizable visual features with superior efficiency on target
tasks by fine-tuning a classifier on the top of model representations. Fan et al.
(2021) analyzed CL in terms of robustness improvement, demonstrating that
high-frequency contrastive visual components and the use of feature clustering
2
are advantageous to model robustness without compromising the accuracy.
In this study, we propose a novel augmented Lagrangian tuning method for
an adversarial integrated contrastive embedding model. Adversarially trained
models usually have lower accuracy than those trained using the natural training
paradigm. However, they perform better when employed for transfer learning
to downstream tasks because they generate richer features. We attempt to
generate a high feature representation by considering adversarial examples as
additional data samples in the training process and do not intend to increase the
model robustness against the adversarial examples. The adversarial contrastive
mechanism has better transferable knowledge that enables better generalization
of small samples. The Lagrangian multiplier intends to adaptively tune the
sparsity, low-rank, and accuracy coefficients of the model.
We summarize our main contributions as follows:
We propose a novel adversarial Lagrangian integrated contrastive embed-
ding called ALICE for small-sized datasets and show the effectiveness
of the proposed model on benchmark datasets, namely, CIFAR100, CI-
FAR10, SVHN, Aircraft, Pets, and Nancho. We introduce the Lagrangian
algorithm in the context of adversarial contrastive embedding (AdvCont)
to adaptively estimate the coefficients of the constraints.
We show that the proposed adversarial transfer representation affects
downstream target datasets by improving the accuracy and converging
faster than the standard training process on different subsets of the datasets
with fewer samples.
We present the novel adversarial integrated contrastive model with various
augmentation techniques that are fine-tuned with sparsity and low-rank
constraint regularizations. The sparsity constraint suppresses less repre-
sentative elements in the feature space. The low-rank constraint eliminates
the trivial and redundant components and facilitates good generalization.
The remainder of this paper is organized as follows. Related works are
3
explained in Section 2. The proposed ALICE method is described in Section 3.
The experiments are detailed in Section 4. Finally, the conclusions are presented
in Section 5.
2. Related Works
In this section, related works regarding transfer learning on small datasets,
contrastive learning, and adversarial learning are explained.
2.1. Transfer learning on small datasets
Transfer learning is a technique for achieving high performance in a range
of tasks with limited training data. Variant styles, size variability, and short-
age of class samples (Jalali & Lee, 2019; Jalali et al., 2021) make it challeng-
ing to achieve good recognition of small-sized datasets. Instead of discovering
completely different representations, fine-tuning makes greater use of existing
internal representations (Li et al., 2020a). The model layers have different trans-
fer abilities. The first layers contain general features, middle layers consist of
semantic features, and final layers comprise task-specific features. Therefore,
different layers should be treated based on the knowledge that is retrained to
appropriately fit a target task. Specifically, the first- and mid-layer knowledge
should be retained, whereas the last-layer knowledge is adapted to the down-
stream target tasks. L2 norm with starting point (L2-SP) method utilized L2
constraint regularization to explicitly favor the final target weights to be simi-
lar to the pre-trained features (Xuhong et al., 2018; Jalali et al., 2015). L2-SP
demonstrated the effectiveness of establishing an explicit inductive bias towards
the original model, and it suggested the L2regularization penalty term for trans-
fer learning, with the pre-trained model functioning as a baseline model. Mix
& Match (Zhan et al., 2018) employed proxy tasks that were able to generate
discriminative representations in the target domain tasks. They developed a
mixed process that sparsely selects and combines patches from the target do-
main to generate diversified features from local patch attributes. Subsequently,
4
a matching process constructed a class-wise linked graph, which helped derive
a triplet discriminative objective function to fine-tune the network. To prevent
negative transfer, batch spectral shrinkage (BSS) method (Chen et al., 2019) pe-
nalized lower singular values to decrease non-transferable spectral components.
BSS is a regularization method that suppresses and shrinks non-transferable
spectral elements for better fine-tuning. DELTA (Li et al., 2020b) presented
an attention mechanism to select more discriminative features such that the
distance between the features of the pre-trained and target model is regular-
ized. DELTA attempted to preserve higher transferability from a source to a
target task. During the fine-tuning process, RIFLE (Re-initializing the fully
connected Layer) (Li et al., 2020a) allowed in-depth back-propagation in the
transfer learning context by regularly re-initializing the fully connected layers
with randomized weight values. RIFLE used an explicit algorithmic regular-
ization strategy to improve low-level feature learning and the accuracy of deep
transfer learning. Bi-tuning (Zhong et al., 2020) is a method for fine-tuning that
incorporates two heads into the backbone of pre-trained models. A projector
head with a categorical contrastive loss is used to utilize the intrinsic structure
of the data samples, and a classifier head with a contrastive loss function is
used to consider label information in a contrasting manner. HEAD2TOE (Evci
et al., 2022) investigated selecting valuable intermediate features from all levels
of a pre-trained structure. This approach achieved superior transfer perfor-
mance when the target affinity differed from the source domain, meaning that
the distribution shift was high, and the source-target domain overlap was low.
2.2. Contrastive learning
Compared with the standard training process, contrastive training resulted
in better performance on downstream classification tasks (Khosla et al., 2021).
The standard supervised pre-training transferred high-level feature representa-
tions, whereas contrastive counterpart transferred low- and mid-level feature
representations. When the downstream task was different from the pre-trained
task, the standard pre-training approach had the risk of overfitting high-level
5
摘要:

AdversarialLagrangianIntegratedContrastiveEmbeddingforLimitedSizeDatasetsAminJalalia,MinhoLeea,baKNU-LGElectronicsConvergenceResearchCenter,AIInstituteofTechnology,KyungpookNationalUniversity,Daegu,41566,SouthKoreabGraduateSchoolofArti cialIntelligence,KyungpookNationalUniversity,Daegu,41566,SouthKo...

展开>> 收起<<
Adversarial Lagrangian Integrated Contrastive Embedding for Limited Size Datasets_2.pdf

共36页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:36 页 大小:785.3KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 36
客服
关注