How Does a Deep Learning Model Architecture Impact Its Privacy A Comprehensive Study of Privacy Attacks on CNNs and Transformers Guangsheng Zhang1Bo Liu1Huan Tian1Tianqing Zhu1

2025-05-06 0 0 3.29MB 19 页 10玖币

侵权投诉

How Does a Deep Learning Model Architecture Impact Its Privacy?

A Comprehensive Study of Privacy Attacks on CNNs and Transformers

Guangsheng Zhang1Bo Liu1Huan Tian1Tianqing Zhu1

Ming Ding2Wanlei Zhou3

1University of Technology Sydney 2Data 61, Australia 3City University of Macau

Abstract

As a booming research area in the past decade, deep learning

technologies have been driven by big data collected and pro-

cessed on an unprecedented scale. However, privacy concerns

arise due to the potential leakage of sensitive information

from the training data. Recent research has revealed that deep

learning models are vulnerable to various privacy attacks,

including membership inference attacks, attribute inference

attacks, and gradient inversion attacks. Notably, the efﬁcacy

of these attacks varies from model to model. In this paper,

we answer a fundamental question: Does model architecture

affect model privacy? By investigating representative model

architectures from convolutional neural networks (CNNs) to

Transformers, we demonstrate that Transformers generally

exhibit higher vulnerability to privacy attacks than CNNs.

Additionally, we identify the micro design of activation lay-

ers, stem layers, and LN layers, as major factors contributing

to the resilience of CNNs against privacy attacks, while the

presence of attention modules is another main factor that

exacerbates the privacy vulnerability of Transformers. Our

discovery reveals valuable insights for deep learning models

to defend against privacy attacks and inspires the research

community to develop privacy-friendly model architectures.

1 Introduction

Deep learning has been gaining massive attention over the

past several years. Training deep learning models requires

collecting and processing user data, which raises signiﬁcant

privacy concerns. The data gathered during the training phase

often contains sensitive information that malicious parties

can access or retrieve. Various privacy attacks targeting deep

learning models have demonstrated this vulnerability exten-

sively. One prominent type of attack is membership inference,

which focuses on determining whether a speciﬁc data sample

belongs to the training data [60,62]. Another attack is attribute

inference, which aims to uncover implicit attributes learned

by the model beyond the intended target attribute [52,65]. Ad-

ditionally, gradient inversion attacks pose a signiﬁcant threat

by attempting to reconstruct the information of the training

data from the gradients of the model [20,22]. These attacks

empower adversaries to exploit deep learning models for ex-

tracting sensitive data.

Prior research has established that overﬁtting is one of

the primary causes of privacy leakage in deep learning mod-

els [10,27,43,62]. In general, overﬁtting occurs when models

excessively learn speciﬁc details from the training data, which

can lead to inadvertent privacy breaches. Surprisingly, we dis-

cover that even when models exhibit comparable levels of

overﬁtting, the effectiveness of attacks varies across differ-

ent models. This observation raises intriguing questions as

to why certain deep learning models are more susceptible

to privacy attacks than others, a puzzle that researchers have

not fully comprehended. Consequently, we conjecture that

other factors beyond overﬁtting might also contribute to the

increased vulnerability of some deep learning models to pri-

vacy attacks. Though existing literature has explored model

robustness and explainability [4,57], the privacy leakage of

the model architectures remains underexplored. Therefore,

we are motivated to address this critical gap by answering the

following question: How does a model’s architecture affect

its privacy preservation capability?

In this paper, we approach this question by comprehen-

sively analyzing different deep learning models under various

state-of-the-art privacy attacks. Our investigation focuses on

two widely adopted deep learning model architectures: con-

volutional neural networks (CNNs) and Transformers. CNN-

based models have been dominant in computer vision, thanks

to its sliding-window strategy, which extracts local informa-

tion from images effectively. Transformers, initially intro-

duced in natural language processing (NLP), have gained

popularity in computer vision by capturing large receptive

ﬁelds through attention mechanisms, resulting in compara-

ble accuracy performance against CNNs. The tremendous

achievements and wide usage of these two model architec-

tures provide an excellent opportunity for us to make a com-

parative analysis regarding model privacy risks. Through our

investigation, we make an intriguing discovery: Transformers,

arXiv:2210.11049v3 [cs.CR] 2 Feb 2024

in general, exhibit higher vulnerability to mainstream privacy

attacks than CNNs.

While Transformers and CNNs have different designs in

many aspects, we investigate whether some key modules in

the model architecture have a major impact on privacy risks.

To this end, we evaluate the privacy leakage of several ma-

jor modules in a Transformer architecture by sending only

selected gradients to the gradient inversion attacks and dis-

cover that attention modules cause signiﬁcant privacy leakage.

Moreover, we start with a popular CNN-based model, ResNet-

50 [26], and gradually morph the model to incorporate the

key designs of Transformers. This leads us to the structure

of ConvNeXt [46]. We evaluate the privacy leakage through

this process and identify several key components that have

a signiﬁcant impact on privacy risks: (1) the design of the

activation layers; (2) the design of stem layers; (3) the design

of LN layers. We further conduct ablation studies to verify

our discoveries and propose solutions to mitigate the privacy

risks.

In summary, our contributions in this paper are summarized

as follows:

•

For the ﬁrst time, we investigate the impact of model

architectures and micro designs on privacy risks.

•

We evaluate the privacy vulnerabilities of two widely

adopted model architectures, i.e., CNNs and Transform-

ers, using three prominent privacy attack methods: (1)

membership inference attacks, (2) attribute inference at-

tacks, and (3) gradient inversion attacks. Our analysis

reveals that Transformers exhibit higher vulnerabilities

to these privacy attacks than CNNs.

•

We identify three key factors: (1) the design of activation

layers, (2) the design of stem layers, and (3) the design of

LN layers, that signiﬁcantly contribute to the enhanced

resilience of CNNs in comparison to Transformers. We

also discover that the presence of attention modules in

Transformers could make them susceptible to privacy

attacks.

•

We propose solutions to mitigate the vulnerabilities of

model architectures: modifying model components and

adding perturbations as defense mechanisms.

2 Related Work

2.1 CNNs and Vision Transformers

Convolutional Neural Networks (CNNs) are a type of neu-

ral network that employs convolutional layers to extract fea-

tures from input data. In contrast to fully connected networks,

CNNs use convolutional kernels to connect small samples

to neurons for feature extraction, reducing the number of

model parameters and enabling the recognition of local fea-

tures. Various techniques are employed to construct a CNN

model, including padding, pooling, dilated convolution, group

convolution, and more.

The concept of convolutional neural networks (CNNs)

dates back to the 1980s [38]. However, the invention of

AlexNet [37] makes CNNs the most prominent models in

computer vision. Subsequent research improved the accuracy

and efﬁciency of models [63,68]. ResNet [26] addressed

the challenge of training deep networks using skip connec-

tions. Other notable networks consist of Inception [69], Mo-

bileNet [30], ResNeXt [79], EfﬁcientNet [70], RegNet [56],

ConvNeXt [46].

Vision Transformers, originating from natural language

processing, divide the input image into multiple patches, form-

ing a one-dimensional sequence of token embeddings. Their

exceptional performance can be attributed to the multi-head

self-attention modules [74]. The attention mechanism has sig-

niﬁcantly contributed to the advancement of natural language

processing [5,14,81], subsequently leading to the introduc-

tion of Transformers in the ﬁeld of computer vision as Vision

Transformers (ViT) [16]. Research has shown that ViTs can

surpass CNNs in various downstream tasks [16,67]. Later

Transfomer models have focused on numerous improvements

of ViTs, such as Tokens-to-Token ViT [85], Swin Transform-

ers [45], DeiT [71], MViT [40], DaViT [15].

Numerous studies have compared CNNs and Transformers

from the perspectives of robustness [4,55,75] and explain-

ability [57]. However, our research diverges from previous

works by concentrating on the privacy leakage inherent in

both CNNs and Transformers.

2.2

Privacy Attacks on Deep Learning Models

A primary concern in deep learning privacy is that the model

may reveal sensitive information from the training dataset.

An adversary can exploit various approaches to compromise

privacy, including predicting whether a particular sample is

in the model’s training dataset via membership inference

attacks, or disclosing the implicit attributes of data samples

via attribute inference attacks, or even recovering private data

samples utilized in training a neural network through gradient

inversion attacks.

Membership inference attacks were initially introduced

in [62], where an attack model was employed to distinguish

member samples from non-member samples in the training

data. To execute these attacks, shadow models would mimic

the behavior of victim models [60,62]. Prediction results

from victim models were gathered for attack model training.

Usually, the conﬁdence scores or losses were utilized [62],

but more recent work (label-only attacks) applied prediction

labels to launch attacks successfully [11,41]. The attacks

could also be executed by designing a metric with a thresh-

old by querying the shadow model [66]. Some researchers

expanded the attacks into new domains, including genera-

tive models [9,25], semantic segmentation [28,87], federated

learning [54,73], and transfer learning [64,94]. Other re-

searchers relaxed the attack assumptions and improved the

attacks, including discussion on white-box/black-box access

for the attacks [59], providing more metrics (i.e. ROC curves

and the true positive rate at a low false positive rate) to mea-

sure the attack performance more accurately [7,34,48,77,82].

We select [7,60,62] as our baseline methods.

Attribute inference attacks, another signiﬁcant category

of privacy attack methods, attempt to reveal a speciﬁc sensi-

tive attribute of a data sample by analyzing the posteriors of

the victim model trained by the victim dataset. Some early re-

search launched the attacks by generating input samples with

different sensitive attributes and observed the victim model

output [21,83]. However, these methods could only work in

structured data. Later research improved the attacks with vic-

tim model representations [52,65]. They also claimed that

the overlearning feature of deep learning models caused the

execution of the attacks [65]. Attributes could also be inferred

through a relaxed notion [91], model explanations [18], label-

only settings [51], or imputation analysis [33]. As we aim to

infer attributes from visual data, we select [52,65] as baseline

methods.

Gradient inversion attacks primarily aim to reconstruct

training samples at the local clients in federated learning. Us-

ing the publicly shared gradients in the server, adversaries can

execute the attacks by reconstructing the training samples us-

ing gradient matching. DLG [93] and its variant, iDLG [92],

were the early attacks to employ an optimization-based tech-

nique to reconstruct the training samples. Later research like

Inverting Gradients [22] and GradInversion [84] improved the

attack performance by incorporating regularizations into the

optimization process. APRIL [49] and GradViT [24] further

developed the attack methods to extract sensitive information

from Transformers. The use of Generative Adversarial Net-

works (GANs) in some gradient inversion attack methods [42]

can have a signiﬁcant impact on reconstructed results, making

it difﬁcult to isolate the inﬂuence of other factors on privacy

leakage. Therefore, we use conventional gradient inversion

attack methods [22] that do not involve the use of GANs.

There have been several evaluations and reviews of these

privacy attacks against deep learning models [27,31,43,44,66,

88,90]. However, we aim to evaluate the model architectures

leveraging these privacy attacks. To sum up, we utilize con-

ventional privacy attacks [7,22,52,60,62,65] as the baseline

attacks in our analysis, for these attack methods have inspired

many follow-up research works, and they are suitable for

evaluation on various models and datasets.

Methodology of Evaluating the Impact of the

Model Architecture on Privacy

In this section, we present our approach to assessing the im-

pact of model architectures on privacy leakage. In order to

organize our study in a thorough and logical manner, We aim

to answer the following research questions sequentially:

•

RQ1: How to analyze the privacy leakage in model ar-

chitectures?

•

RQ2: What model architectures of CNNs and Trans-

formers should we choose to evaluate these attacks?

•

RQ3: What performance aspects should we focus on

when evaluating the privacy attacks on model architec-

tures?

•

RQ4: How should we investigate what designs in model

architectures contribute to privacy leakage?

In this work, we focus on classiﬁer or feature representation

models such as CNNs and Transformers, which are subject to

the investigated privacy attacks. A new line of generative AI

models, such as generative adversarial networks (GANs) and

diffusion models, are vulnerable to different privacy attacks

and thus out of the scope of this paper. We believe our eval-

uation methodology could shed light on the effect of model

privacy from the perspective of model architectures.

3.1 Privacy Threat Models

To answer the ﬁrst research question (RQ1), we choose three

prominent privacy attack methods: membership inference

attacks, attribute inference attacks, and gradient inversion at-

tacks.

3.1.1 Membership Inference Attacks

Network-Based Attacks. Initiating a network-based mem-

bership inference attack [60,62] requires three models: the

victim model

(the target), the shadow model

(the model

to mimic the behavior of the victim model), and the attack

model

(the classiﬁer to give results whether the sample

belongs to the member or non-member data). The following

paragraphs provide explanations of how the attacks work.

The ﬁrst step is the attack preparation. Since the adversary

has only black-box access to the victim model

, they can

only query the model and record prediction results. To launch

a membership inference attack, the adversary needs to create a

shadow model

, which behaves similarly to the victim model

. This involves collecting a shadow dataset

, usually

from the same data distribution as the victim dataset

. The

shadow dataset

is then divided into two subsets:

Dtrain

for

training and Dtest

Sfor testing.

Once the preparation is complete, the adversary trains the

attack model. The shadow model

and shadow dataset

are used to train the attack model

. Each prediction result

of a data sample from the shadow dataset

is a vector of

conﬁdence scores for each class, which is concatenated with

a binary label indicating whether the prediction is correct or

not. The resulting vector, denoted as

, is collected for all

samples, forming the input set

PS={Pi

S,i=1,...,n}

for the

attack model

. Since

is a binary classiﬁer, a three-layer

MLP (multi-layer perceptron) model is employed to train it.

At last, the adversary launches the attack model inference.

The adversary queries the victim model

with the victim

dataset

and records the prediction results, which are used

as the input for the attack model

. The attack model then

predicts whether a data sample is a member or non-member

data sample.

Likelihood-Based Attacks. The Likelihood Ratio Attack

(LiRA) [7] is a state-of-the-art attack method that employs

both model posteriors and their likelihoods based on shadow

models. In contrast to attacks relying on a single shadow

model, LiRA requires the adversary to train multiple shadow

models

S={S1,...,Sn}

. This ensures that a target sample

(from the victim dataset

) is included in half of the mod-

els

and excluded from the other half. The adversary then

queries the shadow models with the target sample and calcu-

lates the logits for each model. Using these logits, the adver-

sary calculates the probability density function to determine

the likelihood ratio of the target sample, which corresponds

to its membership status.

There are other kinds of membership inference attacks,

including metric-based attacks and label-only attacks [11,

41,66]. Instead of using a neural network to be the attack

model, metric-based attacks [66] launch the attacks using a

certain metric and threshold to separate member data from

non-member data. Label-only attacks [11,41] relax the as-

sumptions of the threat model leveraging only prediction la-

bels as the input of the attack model. Our study focuses on

two types of membership inference attacks: network-based

and likelihood-based attacks. We chose these two types of at-

tacks because the network-based attack is commonly used as

a baseline in many research papers, making it a conventional

attack to consider. Additionally, the likelihood-based attack

is a more recent state-of-the-art attack that has demonstrated

high effectiveness, making it an important attack to evaluate

as well. By considering these two types of attacks, we can

effectively represent the performance of membership infer-

ence attacks against various victim models and gain insights

into potential privacy risks associated with different machine

learning models.

3.1.2 Attribute Inference Attacks

The goal of attribute inference attacks [52,65] is to extract

sensitive attributes from a victim model, which may inadver-

tently reveal information about the training data. For instance,

suppose the victim model is trained to classify whether a per-

son has a beard or not. In that case, an adversary may infer

the person’s race based on the model’s learned representation.

At the attack preparation stage, the victim model

trained by the victim dataset

with two subsets

Dtrain

and

Dtest

Vfor the training and testing.

The second step is also the attack model training. To train

the attack model

, the adversary uses an auxiliary dataset

Dtrain

, which includes pairs of the representation

and the

attribute a, i.e., (h,a)∈DA.

At last, the adversary launches the attack. The adversary

takes a data sample’s representation

as the input and uses

the attack model Ato infer the attribute result.

3.1.3 Gradient Inversion Attacks

Launching the gradient inversion attack [22,92,93] involves

solving an optimization problem, which aims to minimize the

difference between the calculated model gradients and the

original model gradients. The optimization process continues

for a certain number of iterations, after which the input data

sample can be reconstructed.

The adversary operates within a federated learning sce-

nario. In the attack preparation stage, the adversary operates

from the central server, aggregating model gradients to create

a centralized model. Since the adversary has access to the

communication channels used during the federated learning

process, they can retrieve the model gradients and prepare to

extract sensitive information from the training samples. This

allows the adversary to launch attacks against the federated

learning system.

In the step of gradient reconstruction, the aggregated model

gradients are denoted as

∇θLθ(x,y)

, where

is the model

parameters,

and

are the original input image and its ground

truth in a local client, and

represents the cost function for

the model. To initiate the reconstruction process, the adversary

generates a dummy image

x∗

. The adversary tries to minimize

this cost function:

argminx||∇θLθ(x,y)−∇θLθ(x∗,y)||2

. The

dummy image x∗is reconstructed to resemble xclosely.

3.2 CNNs vs Transformers

To answer the second research question (RQ2), we investi-

gate the privacy of two mainstream architectures: CNNs and

Transformers. We carefully select several popular CNNs and

Transformers for the attacks to analyze the privacy leakage.

For CNNs, we choose ResNets [26] as baseline models,

which are known for incorporating residual blocks and have

become widely used in various computer vision tasks. We

speciﬁcally select ResNet-50 (23.52 million parameters) and

ResNet-101 (42.51 million parameters) to represent CNN

architectures in our analysis. Regarding Transformers, we

focus on Swin Transformers [45], which have gained attention

for their innovative design incorporating attention modules

and shifted window mechanisms. We analyze Swin-T (27.51

million parameters) and Swin-S (48.80 million parameters)

as representatives of Transformer architectures.

To ensure fair comparisons, we organize the evaluation

of the four models based on their parameter sizes, grouping

models with similar parameter sizes together. This approach

allows us to compare models that exhibit comparable task

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HowDoesaDeepLearningModelArchitectureImpactItsPrivacy?AComprehensiveStudyofPrivacyAttacksonCNNsandTransformersGuangshengZhang1BoLiu1HuanTian1TianqingZhu1MingDing2WanleiZhou31UniversityofTechnologySydney2Data61,Australia3CityUniversityofMacauAbstractAsaboomingresearchareainthepastdecade,deeplearningt...

展开>> 收起<<

How Does a Deep Learning Model Architecture Impact Its Privacy A Comprehensive Study of Privacy Attacks on CNNs and Transformers Guangsheng Zhang1Bo Liu1Huan Tian1Tianqing Zhu1.pdf

共19页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

How Does a Deep Learning Model Architecture Impact Its Privacy A Comprehensive Study of Privacy Attacks on CNNs and Transformers Guangsheng Zhang1Bo Liu1Huan Tian1Tianqing Zhu1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: