Similarity of Neural Architectures using Adversarial Attack Transferability Jaehui Hwang12Dongyoon Han3Byeongho Heo3Song Park3

2025-05-03 0 0 2.82MB 35 页 10玖币

侵权投诉

Similarity of Neural Architectures

using Adversarial Attack Transferability

Jaehui Hwang1,2,†Dongyoon Han3Byeongho Heo3Song Park3

Sanghyuk Chun3,∗Jong-Seok Lee1,2,∗

1School of Integrated Technology, Yonsei University

2BK21 Graduate Program in Intelligent Semiconductor Technology, Yonsei University

3NAVER AI Lab

†Works done during an internship at NAVER AI Lab. ∗Corresponding authors

Abstract. In recent years, many deep neural architectures have been

developed for image classiﬁcation. Whether they are similar or dissimi-

lar and what factors contribute to their (dis)similarities remains curious.

To address this question, we aim to design a quantitative and scalable

similarity measure between neural architectures. We propose Similarity

by Attack Transferability (SAT) from the observation that adversarial

attack transferability contains information related to input gradients and

decision boundaries widely used to understand model behaviors. We con-

duct a large-scale analysis on 69 state-of-the-art ImageNet classiﬁers us-

ing our SAT to answer the question. In addition, we provide interesting

insights into ML applications using multiple models, such as model en-

semble and knowledge distillation. Our results show that using diverse

neural architectures with distinct components can beneﬁt such scenarios.

Keywords: Architecture Similarity ·Adversarial Attack Transferability

1 Introduction

1. ViT-B

2. Swin-T

3. PiT-S

4. HaloNet-50

5. LamHaloBoTNet-50

6. ResNet-50

7. RegNetY-32

8. NFNet-L0

9. ResNet-101-c 10. CSPResNet-50

1. DeiT-B

2. ConvNeXt-T

3. XCiT-T12

4. BoTNet-26

5. GC-ResNet-50

6. ResNet-V2-50

7. ReXNet (x1.5)

8. ResNeSt-50

9. RegNetX-320x

10. CSPDarkNet-53

Fig. 1: t-SNE plot showing 10 clus-

ters of 69 neural networks using

our similarity function, SAT.

The advances in deep neural networks

(DNN) architecture design have taken a

key role in their success by making the

learning process easier (e.g., normaliza-

tion [3,52,116] or skip connection [42]),

enforcing human inductive bias [60], or in-

creasing model capability with the self-

attention mechanism [106]. With diﬀerent

architectural components containing ar-

chitectural design principles and elements,

a number of diﬀerent neural architectures

have been proposed. They have diﬀer-

ent accuracies, but several researches have

pointed out that their predictions are not

signiﬁcantly diﬀerent [35,71,72].

arXiv:2210.11407v4 [cs.LG] 17 Jul 2024

2 J. Hwang et al.

By this, can we say that recently developed DNN models with diﬀerent archi-

tectural components are similar or the same? The answer is no. It is because

a model prediction is not the only characteristic to compare their similarities.

Existing studies have found diﬀerences by focusing on diﬀerent features, such

as layer-by-layer network component [58,82], a high-level understanding by vi-

sualization of loss surface [28], input gradient [89,91], and decision boundary

[90]. Researchers could understand the similarity between models through these

trials; however, the similarity comparison methods from previous studies are in-

suﬃcient for facilitating comprehensive studies because they do not satisfy two

criteria that practical metrics should meet: (1) providing a quantitative similar-

ity score and (2) being compatible with diﬀerent base architectures (e.g., CNN

and Transformer). Recently, Tramèr et al. [103] and Somepalli et al. [90] sug-

gested a quantitative similarity metric based on measuring diﬀerences in decision

boundaries. However, these methods have limitations due to the non-tractable

decision boundaries and limited computations as shown in Sec. 3.

We propose a quantitative similarity that is scalable and easily applicable

to diverse architectures, named Similarity by Attack Transferability (SAT). We

focus on adversarial attack transferability (AT), which indicates how generated

adversarial perturbation is transferable between two diﬀerent architectures. It is

widely studied that the vulnerability of DNNs depends on their own architectural

property or how models capture the features from inputs, such as the usage

of self-attention [33], the stem layer [50], and the dependency on high or low-

frequency components of input [4,57]. Thus, if two diﬀerent models are similar,

the AT between the models is high because they share similar vulnerability

[84]. Furthermore, AT can be a reliable approximation for comparing the input

gradients [70], decision boundary [56], and loss landscape [26]. All of them are

widely-used frameworks to understand model behavior and diﬀerences between

models and used to measure the similarity of models in previous works [6,18,28,

64,89,90,91,94,103,113]; namely, SAT can capture various model properties.

We quantitatively measure pairwise SATs of 69 diﬀerent ImageNet-trained

neural architectures from [114]. We analyze what components among 13 archi-

tectural components (e.g., normalization, activation, . . . ) that consist of neural

architectures largely aﬀect model diversity. Furthermore, we observe relation-

ships between SAT and practical applications, such as ensemble and distillation.

2 Related Work

Similarity between DNNs has been actively explored recently. Several studies

focused on comparing intermediate features to understand the behavior of DNNs.

Raghu et al. [82] observed the diﬀerence between layers, training methods, and

architectures (e.g., CNN and ViT) based on layer-by-layer comparison [58].

Some studies have focused on loss landscapes by visualizing the loss of models

on the parameter space [28,64,78]. Although these methods show a visual

inspection, they cannot support quantitative measurements. On the other hand,

our goal is to support a quantitative similarity by SAT.

Similarity of Neural Architectures using Adversarial Attack Transferability 3

Another line of research has been focused on prediction-based statistics,

e.g., comparing wrong and correct predictions [34,35,61,86]. However, as recent

complex DNNs are getting almost perfect, just focusing on prediction values can

be misleading; Meding et al. [72] observed that recent DNNs show highly similar

predictions. In this case, prediction-based methods will be no more informative.

Meanwhile, our SAT can provide meaningful ﬁndings for 69 recent NNs.

Input gradient is another popular framework to understand model behavior

by observing how a model will change predictions by local pixel changes [6,

88,89,91,94]. If two models are similar, their input gradients will also be

similar. These methods are computationally eﬃcient, and no additional training

is required; they can provide a visual understanding of the given input. However,

input gradients are inherently noisy; thus, these methods will need additional

pre-processing, such as smoothing, for a stable computation [18]. Also, these

methods usually measure how the input gradient matches the actual foreground,

i.e., we need ground-truth foreground masks for measuring such scores. On the

contrary, SAT needs no additional pre-processing and mask annotations.

Comparing the decision boundaries will provide a high-level understand-

ing of how models behave diﬀerently for input changes and how models extract

features from complicated data dimensions. Recent works [103,113] suggested

measuring similarity by comparing distances between predictions and decision

boundaries. Meanwhile, Somepalli et al. [90] analyzed models by comparing

their decision boundaries on the on-manifold plane constructed by three ran-

dom images. However, these approaches suﬀer from inaccurate approximation,

non-tractable decision boundaries, and ﬁnite pairs of inputs and predictions.

Finally, diﬀerent behaviors of CNNs and Transformers have been studied in

speciﬁc tasks, such as robustness [5,74], layer-by-layer comparison [78,82] or

decision-making process [53]. Our work aims to quantify the similarity between

general NNs, not only focusing on limited groups of architecture.

3 Similarity by Attack Transferability (SAT)

Here, we propose a quantitative similarity between two architectures using ad-

versarial attack transferability, which indicates whether an adversarial sample

from a model can fool another model. The concept of adversarial attack has ef-

fectively pointed out the vulnerabilities of DNNs by input gradient [36,70,95].

Interestingly, these vulnerabilities have been observed to be intricately linked

to architectural properties. For example, Fu et al. [33] demonstrated the eﬀect

of the attention modules in architecture on attack success rate. Hwang et al. [50]

analyzed that the stem layer structure causes models to have diﬀerent adver-

sarial vulnerable points in the input space, e.g., video models periodically have

vulnerable frames, such as every four frames. Namely, an adversarial sample to

a model highly depends on the inherent architectural property of the model.

Another perspective emphasized the dissimilarities in dependencies on high-

frequency and low-frequency components between CNN-based and transformer-

based models, showing diﬀerent vulnerabilities to diﬀerent adversarial attacks

4 J. Hwang et al.

[4,57]. Diﬀerent architectural choices behave as diﬀerent frequency ﬁlters (e.g.,

the self-attention works as a low-pass ﬁlter, while the convolution works as a

high-pass ﬁlter) [78]; thus, we can expect that the diﬀerent architectural com-

ponent choices will aﬀect the model vulnerability, e.g., vulnerability to high-

frequency perturbations. If we can measure how the adversarial vulnerabilities

of the models are diﬀerent, we also can measure how the networks are dissimilar.

To measure how model vulnerabilities diﬀer, we employ adversarial at-

tack transferability (AT), where it indicates whether an adversarial sample

from a model can fool another model. If two models are more similar, their

AT gets higher [26,65,84]. On the other hand, because the adversarial at-

tack targets vulnerable points varying by architectural components of DNNs

[33,49,50,57], if two diﬀerent models are dissimilar, the AT between them

gets lower. Furthermore, attack transferability can be a good approximation for

measuring the diﬀerences in input gradients [70], decision boundaries [56], and

loss landscape [26], where they are widely used techniques for understanding

model behavior and similarity between models as discussed in the related work

section. While previous approaches are limited to non-quantitative analysis, in-

herent noisy property, and computational costs, adversarial transferability can

provide quantitative measures with low variances and low computational costs.

We propose a new similarity function that utilizes attack transferability,

named Similarity by Attack Transferability (SAT), providing a reliable,

easy-to-conduct, and scalable method for measuring the similarity between neu-

ral architectures. Formally, we generate adversarial samples xAand xBof model

Aand Bfor the given input x. Then, we measure the accuracy of model A

using the adversarial sample for model B(called accB→A). If Aand Bare the

same, then accB→Awill be zero if the adversary can fool model B perfectly. On

the other hand, if the input gradients of Aand Bdiﬀer signiﬁcantly, then the

performance drop will be neglectable because the adversarial sample is almost

similar to the original image (i.e., ∥x−xB∥ ≤ ε). Let XAB be the set of inputs

where both Aand Bpredict correctly, ybe the ground truth label, and I(·)be

the indicator function. We measure SAT between two diﬀerent models by:

SAT(A, B) = log max εs,100×1

2|XAB |X

x∈XAB

{I(A(xB)̸=y) + I(B(xA)̸=y)},

(1)

where εsis a small scalar value. If A=Band we have an oracle adversary, then

SAT(A, A) = log 100. In practice, a strong adversary (e.g., PGD [70] or Au-

toAttack [23]) can easily achieve a nearly-zero accuracy if a model is not trained

by an adversarial attack-aware strategy [22,70]. Meanwhile, if the adversarial

attacks on Aare not transferable to Band vice versa, then SAT(A, B) = log εs.

Ideally, we aim to deﬁne a similarity dbetween two models with the following

properties: (1) n= arg minmd(n, m), (2) d(n, m) = d(m, n)and (3) d(n, m)>

d(n, n)if n̸=m. If the adversary is perfect, then accA→Awill be zero, and it

will be the minimum because accuracy is non-negative. “accA→B+accB→A” is

symmetric thereby SAT is symmetric. Finally, SAT satisﬁes d(n, m)≥d(n, n)if

n̸=mwhere it is a weaker condition than (3).

Similarity of Neural Architectures using Adversarial Attack Transferability 5

decision boundary of model A

predictions of model A

predictions of model B

decision boundary of model B

Non-transferred

adv samples

adv attack

“Okay” with R3a

failed with R3a

Fig. 2: How SAT works? Conceptual ﬁgure to understand SAT by the lens of the

decision boundary. Each line denotes the decision boundary of a binary classiﬁcation

model, and each dot denotes individual prediction for given inputs.

Comparison with other methods. Here, we compare SAT with prediction-based

measurements [34,35,61,86] and similarity measurements by comparing decision

boundaries (Tramèr et al. [103] and Somepalli et al. [90]). We ﬁrst deﬁne two

binary classiﬁers fand gand their predicted values fp(x)and gp(x)for input x

(See Fig. 2). fclassiﬁes xas positive if fp(x)> fd(x)where fd(x)is a decision

boundary of f. We aim to measure the diﬀerence between decision boundaries,

namely Rx|fd(x)−gd(x)|dx to measure diﬀerences between models. However,

DNNs have a non-tractable decision boundary function, thus, fdand gdare not

tractable. Furthermore, the space of xis too large to compute explicitly. Instead,

we may assume that we only have ﬁnite and sparingly sampled x.

In this scenario, we can choose three strategies. First, we can count the

number of samples whose predicted labels are diﬀerent for given x, which is

prediction-based measurements or Somepalli et al. [90]. As we assumed sparsity

of x, this approach cannot measure the area of uncovered xdomain, hence,

its approximation will be incorrect (purple box in Fig. 2) or needs too many

perturbations to search uncovered x. In Appendix A.1, we empirically show that

Somepalli et al. [90] suﬀers from the high variance even with a large number of

samples while SAT shows a low variance with a small number of samples.

Second, we can measure the minimum distance between fp(x)and fdas

Tramèr et al. [103]. This only measures the distance to its closest decision bound-

ary without considering the other model. The yellow box of Fig. 2shows if two

predictions are similar at x, it would compute an approximation of |fd(x)−gd(x)|

for x. However, if two predictions are diﬀerent, it will compute a wrong approx-

imation. Moreover, in practice, searching ϵis unstable and expensive.

Lastly, we can count the number of non-transferred adversarial samples (red

box in Fig. 2), which is our method, SAT. If we have an oracle attack method

that exactly moves the point right beyond the decision boundary, our SAT will

measure the ℓ0approximation of min(|fd(x)−gd(x)|, ϵ)for given x. Namely,

SAT can measure whether two decision boundaries are diﬀerent by more than ϵ

for each x. If we assume that the diﬀerence between decision boundaries is not

signiﬁcantly large and ϵis properly chosen, SAT will compute an approximated

decision boundary diﬀerence. We also compare SAT and other methods from the

viewpoint of stability and practical usability in Sec. 5.1 and Appendix A.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SimilarityofNeuralArchitecturesusingAdversarialAttackTransferabilityJaehuiHwang1,2,†DongyoonHan3ByeonghoHeo3SongPark3SanghyukChun3,∗Jong-SeokLee1,2,∗1SchoolofIntegratedTechnology,YonseiUniversity2BK21GraduatePrograminIntelligentSemiconductorTechnology,YonseiUniversity3NAVERAILab†Worksdoneduringanint...

展开>> 收起<<

Similarity of Neural Architectures using Adversarial Attack Transferability Jaehui Hwang12Dongyoon Han3Byeongho Heo3Song Park3.pdf

共35页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Similarity of Neural Architectures using Adversarial Attack Transferability Jaehui Hwang12Dongyoon Han3Byeongho Heo3Song Park3

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: