Adversarial Pretraining of Self-Supervised Deep Networks Past Present and Future

2025-05-06 0 0 911.76KB 21 页 10玖币
侵权投诉
Adversarial Pretraining of Self-Supervised Deep Networks: Past,
Present and Future
GUO-JUN QI, Laboratory for Machine Perception and Learning, USA
MUBARAK SHAH, University of Central Florida, USA
In this paper, we review adversarial pretraining of self-supervised deep networks including both convolutional neural
networks and vision transformers. Unlike the adversarial training with access to labeled examples, adversarial pretraining
is complicated as it only has access to unlabeled examples. To incorporate adversaries into pretraining models on either
input or feature level, we nd that existing approaches are largely categorized into two groups: memory-free instance-wise
attacks imposing worst-case perturbations on individual examples, and memory-based adversaries shared across examples
over iterations. In particular, we review several representative adversarial pretraining models based on Contrastive Learning
(CL) and Masked Image Modeling (MIM), respectively, two popular self-supervised pretraining methods in literature. We
also review miscellaneous issues about computing overheads, input-/feature-level adversaries, as well as other adversarial
pretraining approaches beyond the above two groups. Finally, we discuss emerging trends and future directions about the
relations between adversarial and cooperative pretraining, unifying adversarial CL and MIM pretraining, and the trade-o
between accuracy and robustness in adversarial pretraining.
CCS Concepts: Computing methodologies Computer vision representations;Unsupervised learning.
Additional Key Words and Phrases: adversarial pretraining, contrastive learning, masked image modeling, memory-free vs.
memory-based adversaries, instance-wise perturbations
ACM Reference Format:
Guo-Jun Qi and Mubarak Shah. 2022. Adversarial Pretraining of Self-Supervised Deep Networks: Past, Present and Future. 1,
1 (October 2022), 21 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
1 INTRODUCTION
Adversarial pretraining aspires to learn an unsupervised deep networks without access to labels. In contrast,
adversarial training in literature [
11
,
31
,
48
,
53
,
55
,
58
,
65
,
70
,
77
,
82
] seeks to nd worst-case adversarial examples
and use them to train neural networks robust to the corresponding attacks. While some ndings [
65
,
70
] revealed
that the adversarially trained networks can be robust to adversarial attacks or gain improved standard accuracy,
however, there are limited reviews of adversarially pretrained networks by classifying and evaluating existing
approaches, assessing their advantages and shortcomings, as well as charting future directions.
At the start, we want to clarify a common misunderstanding about the role of adversarial pretraining. The goal
of an adversarial approach is not limited to learning robust representation against potential attacks. Instead, it is
also employed to improve the generalization accuracy in downstream tasks, especially when the adversarial model
attacks on feature levels [
40
,
43
,
61
,
62
] rather than on raw inputs (e.g., image pixels) of individual instances (aka
instance-wise attacks [
39
,
42
,
45
]). When not attacking on the raw inputs, the adversarial pretraining often cares
Authors’ addresses: Guo-Jun Qi, guojunq@gmail.com, Laboratory for Machine Perception and Learning, 10940 NE 33RD PLACE, SUITE
202, Bellevue, Washington, USA, 98004; Mubarak Shah, University of Central Florida, 4328 Scorpius St., Orlando, Florida, USA, 32816,
shah@crcv.ucf.edu.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that
copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst
page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy
otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from
permissions@acm.org.
©2022 Association for Computing Machinery.
XXXX-XXXX/2022/10-ART $15.00
https://doi.org/10.1145/nnnnnnn.nnnnnnn
, Vol. 1, No. 1, Article . Publication date: October 2022.
arXiv:2210.13463v1 [cs.LG] 23 Oct 2022
2Qi and Shah
about whether the learned features are generalizable to future problems, aiming to avoid learning trivial solutions
that merely use low-level features to bypass a pretext task. For example, easy negatives in contrastive learning
could result in less discriminative features to distinguish between positive and negative samples for a query
[
40
,
43
]; in masked imaging modeling (MIM), the network may learn low-level features to reconstruct missing
patches by simply using the similarity between locally correlated patches [
5
,
64
] if the MIM objective is not
suciently hard. In these cases, it is benecial to explore adversarial approaches to improve the generalizability
of learned representations. In other words, learning more generalizable representations through adversarial
pretraining is an equally important goal in literature as learning robust representations against some presumptive
attacks.
1.1 Instance-wise Perturbations from Adversarial Training to Pretraining
Let us begin by revisiting the adversarial training. Formally, the adversarial training (instead of pretraining) gets
access to the labeled examples, and it maximizes an associated classication loss such as the cross entropy
L𝑐𝑒
over a constraint perturbation 𝜹of magnitude 𝜀to nd an adversarial example for an input x[53], i.e.,
𝜹
=arg max
𝜹𝑝𝜀L𝑐𝑒 (x+𝜹,y;𝜽),(1)
where the network weights 𝜽can be learned by
𝜽
=arg min
𝜽
E
(x,y)∼D
L𝑐𝑒 (x+𝜹,y;𝜽)(2)
with the expectation taken over labeled examples
(x,y)
sampled from a distribution
D
. More detailed reviews of
adversarial training of deep networks can be found in [2, 3].
Instead, the adversarial pretraining aligns with unsupervised representation learning without access to label
y
as in the above formulation [
1
,
14
,
24
,
30
,
44
,
49
,
59
,
60
,
75
,
83
], and has separate pretext and downstream tasks
that are often more complex than the adversarial training. First, there are many dierent ways to dene a pretext
task to unsupervisedly learn a deep network. A variety of pretext tasks result in dierent approaches to perform
meaningful adversarial pretraining, and some pretext tasks may make the pretrained representation vulnerable
to potential attacks [
34
]. In this review, we will show that a straight extension of adversarial training is to add
perturbations on unlabeled examples and solve a similar minimax problem for adversarial pretraining. Particularly,
among popular pretext tasks are Contrastive Learning (CL) [
14
,
16
,
38
,
56
,
75
] and Masked Image Modeling (MIM)
[7, 35, 76] for Convolutional Nets (ConvNet) [37, 47] and vision transformers [25, 69, 72], respectively.
We will review how adversarial perturbations can be generalized to these pretext tasks on either instance
or feature level. Particularly, we will review the related works from a novel perspective by grouping them
into memory-free and memory-based adversarial pretraining. We will show that the memory-free adversarial
approaches usually consist of instance-wise perturbations as in the adversarial training methods, where the
perturbation is constructed on raw inputs of individual examples. Pretraining a deep network is memory-free
in the sense that the perturbations and the associated adversarial examples are not kept over epochs. Many
adversarial approaches belong to this category including [39, 42, 45].
1.2 Feature-level adversarial Pretraining with Memory-based Adversaries
In contrast, the feature-level adversarial pretraining abandons construction of instance-wise perturbations, and
instead uses a shared memory as adversarial players. A natural choice of such adversaries is the memory bank
widely used in contrastive learning [
36
], and thus most of feature-level adversarial pretraining is memory-based.
For example, AdCo [
40
] treats all examples in the memory bank (cf. [
36
]) as learnable negatives, and directly
learns them by maximizing the contrastive loss. This results in hard negatives that are continuously updated
to be mixed with their positive counterparts. Hence, the learned negatives are hard to be distinguished from
, Vol. 1, No. 1, Article . Publication date: October 2022.
Adversarial Pretraining of Self-Supervised Deep Networks: Past, Present and Future 3
positives so more discriminative representations must be learned that are generalizable to downstream tasks.
IFM [
62
] extends the idea of AdCo by considering implicit feature-level modication to contrastive pairs subject
to assigned budgets. These methods give rise to a family of feature-level adversarial learning to generate hard
negatives in a principled manner, establishing strong connections with hard negative mining and sampling
methods [43, 61] that we will also review in this survey.
1.3 MIM-based Adversarial Pretraining
Moreover, Masked Image Modeling (MIM) [
7
,
35
,
76
] has attracted increasing attentions for pretraining vision
transformers. While most of existing adversarial pretraining methods are built upon contrastive learning, this
opens up unprecedented research opportunities to study adversarial pretraining of vision transformers in the
MIM framework. We will review existing adversarial MIM-pretraining approaches for transformers [
5
,
64
], and
point out some natural extension of adversarial perturbations to the MIM. Both instance-wise (e.g., FGSM-based
approach) and feature-level adversarial MIM-pretraining approaches will be reviewed and discussed.
1.4 Future Directions and Survey Structure
Finally, we will review some emerging trends and future directions on adversarial pretraining. In particular, we
will focus on three aspects of future directions.
Adversarial vs. cooperative pretraining.
We will review related works and discuss the connection
between adversarial (that maximizes the training loss) and cooperative (that minimizes the training loss)
pretraining. We will point out a direction to study when dierent parts of a pretrained network ought to be
learn adversarially or cooperatively. In particular, depending on dierent modes of adversarial pretraining
(instance-wise vs. memory-based, and input-level vs. feature-level), a positive query in contrastive learning
can be treated as either an adversary in a hybrid model or a cooperator through a shared memory bank. We
will show that it could be benecial to combine these two modes through various pretraining approaches.
Unifying adversarial contrastive learning and masked image modeling.
Existing pretraining ap-
proaches [
41
,
84
] have shown that combining contrastive learning and masked image modeling leads to
more powerful representation of vision transformers with improved generalizability. The second direction
worthy of studying is to explore an elegant way of designing adversaries to integrate both approaches. It is
expected that the unied adversarial pretraining can make the learned representation more powerful and
generalizable to downstream tasks.
Accuracy vs. robustness.
In adversarial training, the relationship between standard accuracy and model
robustness to adversarial perturbations has been extensively studied. Existing works have demonstrated that
the network robustness may not always be indicative of improved accuracy [
70
]. The problem becomes more
complicated for adversarial pretraining, since the pretext tasks used to pretrain a network are usually not
related to the downstream objectives. Revealing the underlying connections between network robustness
to pretext task adversaries and generalization accuracy in downstream tasks should be a direction worthy
of exploring from both theoretical and practical perspectives.
The remainder of this paper is organized as follows. In Section 2, we will review the contrastive pretraining
methods by grouping them into instance-wise memory-free and feature-level memory-based models. We will
discuss some miscellaneous issues regarding the computing overheads, and combined objectives for pretraining.
In Section 3, we will review the MIM-based adversarial pre-training, followed by other related methods beyond
contrastive and MIM pretraining in Section 4. We will discuss the evaluation protocols and review existing results
in Section 5. Emerging trends and future directions will be discussed in Section 6, and we will conclude the paper
in Section 7.
, Vol. 1, No. 1, Article . Publication date: October 2022.
4Qi and Shah
Table 1. Dierent types of adversarial pre-training models discussed in the paper. For each model, we denote its model type
(memory-free vs. memory-based), base model(CL, MIM, or others), the formula for adversaries, pretraining adversaries on
input or feature levels (Input/Feature), and if using a pre-training objective combining adversarial and standard losses (in the
column “Comb. obj." ). The table compares the similarities and dierences across dierent models in forms of the formula for
adversaries.
Model Type Base Adversaries Input/Feature Comb. obj.
RoCL [45]
memory-free CL
𝑡(x) 𝑡(x) · · · FGSM
+𝜖sign(𝑡(x)L𝑐𝑜𝑛 (𝑡(x),{x𝑝𝑜𝑠 },{x𝑛𝑒𝑔 }))
or
𝑡(x) ← Π
S (𝑡(x),𝜖)
[𝑡(x) · · · PGD
+𝛼sign(𝑡(x)L𝑐𝑜𝑛 (𝑡(x),{x𝑝𝑜𝑠 },{x𝑛𝑒𝑔 }))]
input-level yes
CLAE [39]
ACL [42]
ADVCL [29] memory-free CL xx+𝜖sign(¯
xL𝑐𝑜𝑛 (𝑡(x), 𝑡 (x),¯
x,x))
by FGSM aacks.input-level yes
ARoCL [34] memory-free CL false negative removal input-level yes
AACL [34]
AdCo [40] memory-based CL z𝑛𝑒𝑔 z𝑛𝑒𝑔 +𝛼
𝜏E
xD 𝑝(z𝑛𝑒𝑔 |z)zfeature-level no
CaCo [71] memory-based CL z𝑝𝑜𝑠 z𝑝𝑜𝑠 +𝛼
𝜏[1𝑝(z𝑝𝑜𝑠 |z)]zfeature-level no
AdPE [5] memory-based MIM 𝜹Π
𝜹𝑞𝜖
[𝜹+𝛼𝜹· · · · PGD
L𝑀𝐼 𝑀 ({p[e𝑔(𝑥𝑖𝑥𝑗+𝛿𝑥),e𝑔(𝑦𝑖𝑦𝑗+𝛿𝑦) ] |𝑖M};𝜽)]
feature-level no
IFM [62] memory-free CL z𝑝𝑜𝑠 z𝑝𝑜𝑠 𝜖z
z𝑛𝑒𝑔 z𝑛𝑒𝑔 +𝜖zfeature-level yes
MoCHi [43] CL hard negative mixing feature-level yes
HCL [61] CL hard negative sampling feature-level yes
BYORL [32] memory-free BYOL 𝜹Π𝜹𝑝𝜖[𝜹+𝛼𝜹L𝑏𝑦𝑜𝑙 (𝑡(x) + 𝜹, 𝑡 (x))] input-level no
RUSH [57] memory-free CL randomized smoothing input-level no
ADIOS [64] memory-free MIM mask generating network M𝜙input-level no
Remarks:
1) Memory-free: instance-wise perturbations imposed on individual examples without carrying over iterations;
2) Memory-based: adversarial samples are stored in a shared memory to challenge the pretrained network;
3) Input-level: adversaries imposed on raw inputs of instances;
4) Feature-level: adversaries imposed on feature representations.
, Vol. 1, No. 1, Article . Publication date: October 2022.
Adversarial Pretraining of Self-Supervised Deep Networks: Past, Present and Future 5
Table 2. A summary of source code links.
Models Links
RoCL [45] https://github.com/Kim-Minseon/RoCL
CLAE [39] https://github.com/chihhuiho/CLAE
ACL [42] https://github.com/VITA-Group/Adversarial-Contrastive-Learning
AdCo [40] https://github.com/maple-research-lab/AdCo
CaCo [71] https://github.com/maple-research-lab/CaCo
IFM [62] https://github.com/joshr17/IFM
MoCHi [43] https://europe.naverlabs.com/research/computer-vision/mochi/
HCL [61] https://github.com/joshr17/HCL
ADIOS [64] https://github.com/yugeten/adios
ADVCL [29] https://github.com/LijieFan/AdvCL
2 ADVERSARIAL CONTRASTIVE PRETRAINING
Contrastive learning has become the state-of-the-art method for pretraining a variety of deep networks ranging
from convolutional networks [
16
,
36
] to vision transformers [
13
]. Incorporating adversaries into contrastive
learning for pretraining deep networks has also been intensively studied in literature [39, 40, 42, 45].
In this section, we will review existing methods from a novel perspective by categorizing them into two large
groups: memory-free methods with instance-wise perturbations, memory-based methods on feature level by
learning a shared memory of adversarial negatives. Readers may take a glance at Table 1 that summarizes dierent
adversarial pretraining models we will discuss in this paper.
2.1 Background: Contrastive Learning
Contrastive learning seeks to pre-train an encoder network by maximizing the agreement of representations
between a pair of samples transformed from the same instance, while pushing apart the representations of those
transformed from dierent ones [75].
Formally, given an instance
x
, it is randomly transformed into a pair of examples
𝑡(x)
and
𝑡(x)
with two
transformations
𝑡
and
𝑡
drawn from a distribution
T
. An encoder network
𝑓𝜃
and a following projector
𝑔𝜃
map the pair into two latent vectors
z=𝑔𝜃(𝑓𝜃(𝑡(x)))
and
z=𝑔𝜃(𝑓𝜃(𝑡(x)))
, respectively. Then, the contrastive
learning is designed to minimize the following contrastive loss
E
xD
L𝑐𝑜𝑛 (𝑡(x),{x𝑝𝑜𝑠 },{x𝑛𝑒𝑔 };𝜃)log
Í
z{z𝑝𝑜𝑠 }
exp(𝑠𝑖𝑚(z,z)/𝜏)
Í
z{z𝑝𝑜𝑠 }{z𝑛𝑒𝑔 }
exp(𝑠𝑖𝑚(z,z′′)/𝜏),(3)
where
D
is the data distribution,
{x𝑝𝑜𝑠 }
and
{x𝑛𝑒𝑔 }
are the set of positive and negative examples for
𝑡(x)
, and
{z𝑝𝑜𝑠 }
and
{z𝑛𝑒𝑔 }
are their representations. Usually, the positive set has a single example
𝑡(x)
transformed from
the same instance, and the negative set contains all examples transformed from dierent instances. The
𝑠𝑖𝑚
is a
similarity function usually chosen as cosine similarity, and 𝜏is the temperature for the loss.
2.2 Memory-free Methods: Instance-wise Perturbation
The idea of instance-wise attack against the contrastive pre-training is to seek a worst-case perturbation
𝜹
onto
the transformed example
𝑡(x)
by maximizing the above contrastive loss, yielding the most dissimilar positive
sample
𝑡(x) + 𝜹
. Then, the contrastive loss is minimized by maximizing the agreement between the adversarially
, Vol. 1, No. 1, Article . Publication date: October 2022.
摘要:

AdversarialPretrainingofSelf-SupervisedDeepNetworks:Past,PresentandFutureGUO-JUNQI,LaboratoryforMachinePerceptionandLearning,USAMUBARAKSHAH,UniversityofCentralFlorida,USAInthispaper,wereviewadversarialpretrainingofself-superviseddeepnetworksincludingbothconvolutionalneuralnetworksandvisiontransforme...

展开>> 收起<<
Adversarial Pretraining of Self-Supervised Deep Networks Past Present and Future.pdf

共21页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:21 页 大小:911.76KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 21
客服
关注