Similarity of Neural Architectures using Adversarial Attack Transferability Jaehui Hwang12Dongyoon Han3Byeongho Heo3Song Park3

2025-05-03 0 0 2.82MB 35 页 10玖币
侵权投诉
Similarity of Neural Architectures
using Adversarial Attack Transferability
Jaehui Hwang1,2,Dongyoon Han3Byeongho Heo3Song Park3
Sanghyuk Chun3,Jong-Seok Lee1,2,
1School of Integrated Technology, Yonsei University
2BK21 Graduate Program in Intelligent Semiconductor Technology, Yonsei University
3NAVER AI Lab
Works done during an internship at NAVER AI Lab. Corresponding authors
Abstract. In recent years, many deep neural architectures have been
developed for image classification. Whether they are similar or dissimi-
lar and what factors contribute to their (dis)similarities remains curious.
To address this question, we aim to design a quantitative and scalable
similarity measure between neural architectures. We propose Similarity
by Attack Transferability (SAT) from the observation that adversarial
attack transferability contains information related to input gradients and
decision boundaries widely used to understand model behaviors. We con-
duct a large-scale analysis on 69 state-of-the-art ImageNet classifiers us-
ing our SAT to answer the question. In addition, we provide interesting
insights into ML applications using multiple models, such as model en-
semble and knowledge distillation. Our results show that using diverse
neural architectures with distinct components can benefit such scenarios.
Keywords: Architecture Similarity ·Adversarial Attack Transferability
1 Introduction
1. ViT-B
2. Swin-T
3. PiT-S
4. HaloNet-50
5. LamHaloBoTNet-50
6. ResNet-50
7. RegNetY-32
8. NFNet-L0
9. ResNet-101-c 10. CSPResNet-50
1. DeiT-B
2. ConvNeXt-T
3. XCiT-T12
4. BoTNet-26
5. GC-ResNet-50
6. ResNet-V2-50
7. ReXNet (x1.5)
8. ResNeSt-50
9. RegNetX-320x
10. CSPDarkNet-53
Fig. 1: t-SNE plot showing 10 clus-
ters of 69 neural networks using
our similarity function, SAT.
The advances in deep neural networks
(DNN) architecture design have taken a
key role in their success by making the
learning process easier (e.g., normaliza-
tion [3,52,116] or skip connection [42]),
enforcing human inductive bias [60], or in-
creasing model capability with the self-
attention mechanism [106]. With different
architectural components containing ar-
chitectural design principles and elements,
a number of different neural architectures
have been proposed. They have differ-
ent accuracies, but several researches have
pointed out that their predictions are not
significantly different [35,71,72].
arXiv:2210.11407v4 [cs.LG] 17 Jul 2024
2 J. Hwang et al.
By this, can we say that recently developed DNN models with different archi-
tectural components are similar or the same? The answer is no. It is because
a model prediction is not the only characteristic to compare their similarities.
Existing studies have found differences by focusing on different features, such
as layer-by-layer network component [58,82], a high-level understanding by vi-
sualization of loss surface [28], input gradient [89,91], and decision boundary
[90]. Researchers could understand the similarity between models through these
trials; however, the similarity comparison methods from previous studies are in-
sufficient for facilitating comprehensive studies because they do not satisfy two
criteria that practical metrics should meet: (1) providing a quantitative similar-
ity score and (2) being compatible with different base architectures (e.g., CNN
and Transformer). Recently, Tramèr et al. [103] and Somepalli et al. [90] sug-
gested a quantitative similarity metric based on measuring differences in decision
boundaries. However, these methods have limitations due to the non-tractable
decision boundaries and limited computations as shown in Sec. 3.
We propose a quantitative similarity that is scalable and easily applicable
to diverse architectures, named Similarity by Attack Transferability (SAT). We
focus on adversarial attack transferability (AT), which indicates how generated
adversarial perturbation is transferable between two different architectures. It is
widely studied that the vulnerability of DNNs depends on their own architectural
property or how models capture the features from inputs, such as the usage
of self-attention [33], the stem layer [50], and the dependency on high or low-
frequency components of input [4,57]. Thus, if two different models are similar,
the AT between the models is high because they share similar vulnerability
[84]. Furthermore, AT can be a reliable approximation for comparing the input
gradients [70], decision boundary [56], and loss landscape [26]. All of them are
widely-used frameworks to understand model behavior and differences between
models and used to measure the similarity of models in previous works [6,18,28,
64,89,90,91,94,103,113]; namely, SAT can capture various model properties.
We quantitatively measure pairwise SATs of 69 different ImageNet-trained
neural architectures from [114]. We analyze what components among 13 archi-
tectural components (e.g., normalization, activation, . . . ) that consist of neural
architectures largely affect model diversity. Furthermore, we observe relation-
ships between SAT and practical applications, such as ensemble and distillation.
2 Related Work
Similarity between DNNs has been actively explored recently. Several studies
focused on comparing intermediate features to understand the behavior of DNNs.
Raghu et al. [82] observed the difference between layers, training methods, and
architectures (e.g., CNN and ViT) based on layer-by-layer comparison [58].
Some studies have focused on loss landscapes by visualizing the loss of models
on the parameter space [28,64,78]. Although these methods show a visual
inspection, they cannot support quantitative measurements. On the other hand,
our goal is to support a quantitative similarity by SAT.
Similarity of Neural Architectures using Adversarial Attack Transferability 3
Another line of research has been focused on prediction-based statistics,
e.g., comparing wrong and correct predictions [34,35,61,86]. However, as recent
complex DNNs are getting almost perfect, just focusing on prediction values can
be misleading; Meding et al. [72] observed that recent DNNs show highly similar
predictions. In this case, prediction-based methods will be no more informative.
Meanwhile, our SAT can provide meaningful findings for 69 recent NNs.
Input gradient is another popular framework to understand model behavior
by observing how a model will change predictions by local pixel changes [6,
88,89,91,94]. If two models are similar, their input gradients will also be
similar. These methods are computationally efficient, and no additional training
is required; they can provide a visual understanding of the given input. However,
input gradients are inherently noisy; thus, these methods will need additional
pre-processing, such as smoothing, for a stable computation [18]. Also, these
methods usually measure how the input gradient matches the actual foreground,
i.e., we need ground-truth foreground masks for measuring such scores. On the
contrary, SAT needs no additional pre-processing and mask annotations.
Comparing the decision boundaries will provide a high-level understand-
ing of how models behave differently for input changes and how models extract
features from complicated data dimensions. Recent works [103,113] suggested
measuring similarity by comparing distances between predictions and decision
boundaries. Meanwhile, Somepalli et al. [90] analyzed models by comparing
their decision boundaries on the on-manifold plane constructed by three ran-
dom images. However, these approaches suffer from inaccurate approximation,
non-tractable decision boundaries, and finite pairs of inputs and predictions.
Finally, different behaviors of CNNs and Transformers have been studied in
specific tasks, such as robustness [5,74], layer-by-layer comparison [78,82] or
decision-making process [53]. Our work aims to quantify the similarity between
general NNs, not only focusing on limited groups of architecture.
3 Similarity by Attack Transferability (SAT)
Here, we propose a quantitative similarity between two architectures using ad-
versarial attack transferability, which indicates whether an adversarial sample
from a model can fool another model. The concept of adversarial attack has ef-
fectively pointed out the vulnerabilities of DNNs by input gradient [36,70,95].
Interestingly, these vulnerabilities have been observed to be intricately linked
to architectural properties. For example, Fu et al. [33] demonstrated the effect
of the attention modules in architecture on attack success rate. Hwang et al. [50]
analyzed that the stem layer structure causes models to have different adver-
sarial vulnerable points in the input space, e.g., video models periodically have
vulnerable frames, such as every four frames. Namely, an adversarial sample to
a model highly depends on the inherent architectural property of the model.
Another perspective emphasized the dissimilarities in dependencies on high-
frequency and low-frequency components between CNN-based and transformer-
based models, showing different vulnerabilities to different adversarial attacks
4 J. Hwang et al.
[4,57]. Different architectural choices behave as different frequency filters (e.g.,
the self-attention works as a low-pass filter, while the convolution works as a
high-pass filter) [78]; thus, we can expect that the different architectural com-
ponent choices will affect the model vulnerability, e.g., vulnerability to high-
frequency perturbations. If we can measure how the adversarial vulnerabilities
of the models are different, we also can measure how the networks are dissimilar.
To measure how model vulnerabilities differ, we employ adversarial at-
tack transferability (AT), where it indicates whether an adversarial sample
from a model can fool another model. If two models are more similar, their
AT gets higher [26,65,84]. On the other hand, because the adversarial at-
tack targets vulnerable points varying by architectural components of DNNs
[33,49,50,57], if two different models are dissimilar, the AT between them
gets lower. Furthermore, attack transferability can be a good approximation for
measuring the differences in input gradients [70], decision boundaries [56], and
loss landscape [26], where they are widely used techniques for understanding
model behavior and similarity between models as discussed in the related work
section. While previous approaches are limited to non-quantitative analysis, in-
herent noisy property, and computational costs, adversarial transferability can
provide quantitative measures with low variances and low computational costs.
We propose a new similarity function that utilizes attack transferability,
named Similarity by Attack Transferability (SAT), providing a reliable,
easy-to-conduct, and scalable method for measuring the similarity between neu-
ral architectures. Formally, we generate adversarial samples xAand xBof model
Aand Bfor the given input x. Then, we measure the accuracy of model A
using the adversarial sample for model B(called accBA). If Aand Bare the
same, then accBAwill be zero if the adversary can fool model B perfectly. On
the other hand, if the input gradients of Aand Bdiffer significantly, then the
performance drop will be neglectable because the adversarial sample is almost
similar to the original image (i.e., xxB∥ ≤ ε). Let XAB be the set of inputs
where both Aand Bpredict correctly, ybe the ground truth label, and I(·)be
the indicator function. We measure SAT between two different models by:
SAT(A, B) = log max εs,100×1
2|XAB |X
xXAB
{I(A(xB)̸=y) + I(B(xA)̸=y)},
(1)
where εsis a small scalar value. If A=Band we have an oracle adversary, then
SAT(A, A) = log 100. In practice, a strong adversary (e.g., PGD [70] or Au-
toAttack [23]) can easily achieve a nearly-zero accuracy if a model is not trained
by an adversarial attack-aware strategy [22,70]. Meanwhile, if the adversarial
attacks on Aare not transferable to Band vice versa, then SAT(A, B) = log εs.
Ideally, we aim to define a similarity dbetween two models with the following
properties: (1) n= arg minmd(n, m), (2) d(n, m) = d(m, n)and (3) d(n, m)>
d(n, n)if n̸=m. If the adversary is perfect, then accAAwill be zero, and it
will be the minimum because accuracy is non-negative. “accAB+accBA” is
symmetric thereby SAT is symmetric. Finally, SAT satisfies d(n, m)d(n, n)if
n̸=mwhere it is a weaker condition than (3).
Similarity of Neural Architectures using Adversarial Attack Transferability 5
decision boundary of model A
predictions of model A
predictions of model B
decision boundary of model B
Non-transferred
adv samples
adv attack
adv attack
“Okay” with R3a
failed with R3a
Fig. 2: How SAT works? Conceptual figure to understand SAT by the lens of the
decision boundary. Each line denotes the decision boundary of a binary classification
model, and each dot denotes individual prediction for given inputs.
Comparison with other methods. Here, we compare SAT with prediction-based
measurements [34,35,61,86] and similarity measurements by comparing decision
boundaries (Tramèr et al. [103] and Somepalli et al. [90]). We first define two
binary classifiers fand gand their predicted values fp(x)and gp(x)for input x
(See Fig. 2). fclassifies xas positive if fp(x)> fd(x)where fd(x)is a decision
boundary of f. We aim to measure the difference between decision boundaries,
namely Rx|fd(x)gd(x)|dx to measure differences between models. However,
DNNs have a non-tractable decision boundary function, thus, fdand gdare not
tractable. Furthermore, the space of xis too large to compute explicitly. Instead,
we may assume that we only have finite and sparingly sampled x.
In this scenario, we can choose three strategies. First, we can count the
number of samples whose predicted labels are different for given x, which is
prediction-based measurements or Somepalli et al. [90]. As we assumed sparsity
of x, this approach cannot measure the area of uncovered xdomain, hence,
its approximation will be incorrect (purple box in Fig. 2) or needs too many
perturbations to search uncovered x. In Appendix A.1, we empirically show that
Somepalli et al. [90] suffers from the high variance even with a large number of
samples while SAT shows a low variance with a small number of samples.
Second, we can measure the minimum distance between fp(x)and fdas
Tramèr et al. [103]. This only measures the distance to its closest decision bound-
ary without considering the other model. The yellow box of Fig. 2shows if two
predictions are similar at x, it would compute an approximation of |fd(x)gd(x)|
for x. However, if two predictions are different, it will compute a wrong approx-
imation. Moreover, in practice, searching ϵis unstable and expensive.
Lastly, we can count the number of non-transferred adversarial samples (red
box in Fig. 2), which is our method, SAT. If we have an oracle attack method
that exactly moves the point right beyond the decision boundary, our SAT will
measure the 0approximation of min(|fd(x)gd(x)|, ϵ)for given x. Namely,
SAT can measure whether two decision boundaries are different by more than ϵ
for each x. If we assume that the difference between decision boundaries is not
significantly large and ϵis properly chosen, SAT will compute an approximated
decision boundary difference. We also compare SAT and other methods from the
viewpoint of stability and practical usability in Sec. 5.1 and Appendix A.
摘要:

SimilarityofNeuralArchitecturesusingAdversarialAttackTransferabilityJaehuiHwang1,2,†DongyoonHan3ByeonghoHeo3SongPark3SanghyukChun3,∗Jong-SeokLee1,2,∗1SchoolofIntegratedTechnology,YonseiUniversity2BK21GraduatePrograminIntelligentSemiconductorTechnology,YonseiUniversity3NAVERAILab†Worksdoneduringanint...

展开>> 收起<<
Similarity of Neural Architectures using Adversarial Attack Transferability Jaehui Hwang12Dongyoon Han3Byeongho Heo3Song Park3.pdf

共35页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:35 页 大小:2.82MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 35
客服
关注