How Does a Deep Learning Model Architecture Impact Its Privacy A Comprehensive Study of Privacy Attacks on CNNs and Transformers Guangsheng Zhang1Bo Liu1Huan Tian1Tianqing Zhu1

2025-05-06 0 0 3.29MB 19 页 10玖币
侵权投诉
How Does a Deep Learning Model Architecture Impact Its Privacy?
A Comprehensive Study of Privacy Attacks on CNNs and Transformers
Guangsheng Zhang1Bo Liu1Huan Tian1Tianqing Zhu1
Ming Ding2Wanlei Zhou3
1University of Technology Sydney 2Data 61, Australia 3City University of Macau
Abstract
As a booming research area in the past decade, deep learning
technologies have been driven by big data collected and pro-
cessed on an unprecedented scale. However, privacy concerns
arise due to the potential leakage of sensitive information
from the training data. Recent research has revealed that deep
learning models are vulnerable to various privacy attacks,
including membership inference attacks, attribute inference
attacks, and gradient inversion attacks. Notably, the efficacy
of these attacks varies from model to model. In this paper,
we answer a fundamental question: Does model architecture
affect model privacy? By investigating representative model
architectures from convolutional neural networks (CNNs) to
Transformers, we demonstrate that Transformers generally
exhibit higher vulnerability to privacy attacks than CNNs.
Additionally, we identify the micro design of activation lay-
ers, stem layers, and LN layers, as major factors contributing
to the resilience of CNNs against privacy attacks, while the
presence of attention modules is another main factor that
exacerbates the privacy vulnerability of Transformers. Our
discovery reveals valuable insights for deep learning models
to defend against privacy attacks and inspires the research
community to develop privacy-friendly model architectures.
1 Introduction
Deep learning has been gaining massive attention over the
past several years. Training deep learning models requires
collecting and processing user data, which raises significant
privacy concerns. The data gathered during the training phase
often contains sensitive information that malicious parties
can access or retrieve. Various privacy attacks targeting deep
learning models have demonstrated this vulnerability exten-
sively. One prominent type of attack is membership inference,
which focuses on determining whether a specific data sample
belongs to the training data [60,62]. Another attack is attribute
inference, which aims to uncover implicit attributes learned
by the model beyond the intended target attribute [52,65]. Ad-
ditionally, gradient inversion attacks pose a significant threat
by attempting to reconstruct the information of the training
data from the gradients of the model [20,22]. These attacks
empower adversaries to exploit deep learning models for ex-
tracting sensitive data.
Prior research has established that overfitting is one of
the primary causes of privacy leakage in deep learning mod-
els [10,27,43,62]. In general, overfitting occurs when models
excessively learn specific details from the training data, which
can lead to inadvertent privacy breaches. Surprisingly, we dis-
cover that even when models exhibit comparable levels of
overfitting, the effectiveness of attacks varies across differ-
ent models. This observation raises intriguing questions as
to why certain deep learning models are more susceptible
to privacy attacks than others, a puzzle that researchers have
not fully comprehended. Consequently, we conjecture that
other factors beyond overfitting might also contribute to the
increased vulnerability of some deep learning models to pri-
vacy attacks. Though existing literature has explored model
robustness and explainability [4,57], the privacy leakage of
the model architectures remains underexplored. Therefore,
we are motivated to address this critical gap by answering the
following question: How does a model’s architecture affect
its privacy preservation capability?
In this paper, we approach this question by comprehen-
sively analyzing different deep learning models under various
state-of-the-art privacy attacks. Our investigation focuses on
two widely adopted deep learning model architectures: con-
volutional neural networks (CNNs) and Transformers. CNN-
based models have been dominant in computer vision, thanks
to its sliding-window strategy, which extracts local informa-
tion from images effectively. Transformers, initially intro-
duced in natural language processing (NLP), have gained
popularity in computer vision by capturing large receptive
fields through attention mechanisms, resulting in compara-
ble accuracy performance against CNNs. The tremendous
achievements and wide usage of these two model architec-
tures provide an excellent opportunity for us to make a com-
parative analysis regarding model privacy risks. Through our
investigation, we make an intriguing discovery: Transformers,
1
arXiv:2210.11049v3 [cs.CR] 2 Feb 2024
in general, exhibit higher vulnerability to mainstream privacy
attacks than CNNs.
While Transformers and CNNs have different designs in
many aspects, we investigate whether some key modules in
the model architecture have a major impact on privacy risks.
To this end, we evaluate the privacy leakage of several ma-
jor modules in a Transformer architecture by sending only
selected gradients to the gradient inversion attacks and dis-
cover that attention modules cause significant privacy leakage.
Moreover, we start with a popular CNN-based model, ResNet-
50 [26], and gradually morph the model to incorporate the
key designs of Transformers. This leads us to the structure
of ConvNeXt [46]. We evaluate the privacy leakage through
this process and identify several key components that have
a significant impact on privacy risks: (1) the design of the
activation layers; (2) the design of stem layers; (3) the design
of LN layers. We further conduct ablation studies to verify
our discoveries and propose solutions to mitigate the privacy
risks.
In summary, our contributions in this paper are summarized
as follows:
For the first time, we investigate the impact of model
architectures and micro designs on privacy risks.
We evaluate the privacy vulnerabilities of two widely
adopted model architectures, i.e., CNNs and Transform-
ers, using three prominent privacy attack methods: (1)
membership inference attacks, (2) attribute inference at-
tacks, and (3) gradient inversion attacks. Our analysis
reveals that Transformers exhibit higher vulnerabilities
to these privacy attacks than CNNs.
We identify three key factors: (1) the design of activation
layers, (2) the design of stem layers, and (3) the design of
LN layers, that significantly contribute to the enhanced
resilience of CNNs in comparison to Transformers. We
also discover that the presence of attention modules in
Transformers could make them susceptible to privacy
attacks.
We propose solutions to mitigate the vulnerabilities of
model architectures: modifying model components and
adding perturbations as defense mechanisms.
2 Related Work
2.1 CNNs and Vision Transformers
Convolutional Neural Networks (CNNs) are a type of neu-
ral network that employs convolutional layers to extract fea-
tures from input data. In contrast to fully connected networks,
CNNs use convolutional kernels to connect small samples
to neurons for feature extraction, reducing the number of
model parameters and enabling the recognition of local fea-
tures. Various techniques are employed to construct a CNN
model, including padding, pooling, dilated convolution, group
convolution, and more.
The concept of convolutional neural networks (CNNs)
dates back to the 1980s [38]. However, the invention of
AlexNet [37] makes CNNs the most prominent models in
computer vision. Subsequent research improved the accuracy
and efficiency of models [63,68]. ResNet [26] addressed
the challenge of training deep networks using skip connec-
tions. Other notable networks consist of Inception [69], Mo-
bileNet [30], ResNeXt [79], EfficientNet [70], RegNet [56],
ConvNeXt [46].
Vision Transformers, originating from natural language
processing, divide the input image into multiple patches, form-
ing a one-dimensional sequence of token embeddings. Their
exceptional performance can be attributed to the multi-head
self-attention modules [74]. The attention mechanism has sig-
nificantly contributed to the advancement of natural language
processing [5,14,81], subsequently leading to the introduc-
tion of Transformers in the field of computer vision as Vision
Transformers (ViT) [16]. Research has shown that ViTs can
surpass CNNs in various downstream tasks [16,67]. Later
Transfomer models have focused on numerous improvements
of ViTs, such as Tokens-to-Token ViT [85], Swin Transform-
ers [45], DeiT [71], MViT [40], DaViT [15].
Numerous studies have compared CNNs and Transformers
from the perspectives of robustness [4,55,75] and explain-
ability [57]. However, our research diverges from previous
works by concentrating on the privacy leakage inherent in
both CNNs and Transformers.
2.2
Privacy Attacks on Deep Learning Models
A primary concern in deep learning privacy is that the model
may reveal sensitive information from the training dataset.
An adversary can exploit various approaches to compromise
privacy, including predicting whether a particular sample is
in the model’s training dataset via membership inference
attacks, or disclosing the implicit attributes of data samples
via attribute inference attacks, or even recovering private data
samples utilized in training a neural network through gradient
inversion attacks.
Membership inference attacks were initially introduced
in [62], where an attack model was employed to distinguish
member samples from non-member samples in the training
data. To execute these attacks, shadow models would mimic
the behavior of victim models [60,62]. Prediction results
from victim models were gathered for attack model training.
Usually, the confidence scores or losses were utilized [62],
but more recent work (label-only attacks) applied prediction
labels to launch attacks successfully [11,41]. The attacks
could also be executed by designing a metric with a thresh-
old by querying the shadow model [66]. Some researchers
expanded the attacks into new domains, including genera-
tive models [9,25], semantic segmentation [28,87], federated
2
learning [54,73], and transfer learning [64,94]. Other re-
searchers relaxed the attack assumptions and improved the
attacks, including discussion on white-box/black-box access
for the attacks [59], providing more metrics (i.e. ROC curves
and the true positive rate at a low false positive rate) to mea-
sure the attack performance more accurately [7,34,48,77,82].
We select [7,60,62] as our baseline methods.
Attribute inference attacks, another significant category
of privacy attack methods, attempt to reveal a specific sensi-
tive attribute of a data sample by analyzing the posteriors of
the victim model trained by the victim dataset. Some early re-
search launched the attacks by generating input samples with
different sensitive attributes and observed the victim model
output [21,83]. However, these methods could only work in
structured data. Later research improved the attacks with vic-
tim model representations [52,65]. They also claimed that
the overlearning feature of deep learning models caused the
execution of the attacks [65]. Attributes could also be inferred
through a relaxed notion [91], model explanations [18], label-
only settings [51], or imputation analysis [33]. As we aim to
infer attributes from visual data, we select [52,65] as baseline
methods.
Gradient inversion attacks primarily aim to reconstruct
training samples at the local clients in federated learning. Us-
ing the publicly shared gradients in the server, adversaries can
execute the attacks by reconstructing the training samples us-
ing gradient matching. DLG [93] and its variant, iDLG [92],
were the early attacks to employ an optimization-based tech-
nique to reconstruct the training samples. Later research like
Inverting Gradients [22] and GradInversion [84] improved the
attack performance by incorporating regularizations into the
optimization process. APRIL [49] and GradViT [24] further
developed the attack methods to extract sensitive information
from Transformers. The use of Generative Adversarial Net-
works (GANs) in some gradient inversion attack methods [42]
can have a significant impact on reconstructed results, making
it difficult to isolate the influence of other factors on privacy
leakage. Therefore, we use conventional gradient inversion
attack methods [22] that do not involve the use of GANs.
There have been several evaluations and reviews of these
privacy attacks against deep learning models [27,31,43,44,66,
88,90]. However, we aim to evaluate the model architectures
leveraging these privacy attacks. To sum up, we utilize con-
ventional privacy attacks [7,22,52,60,62,65] as the baseline
attacks in our analysis, for these attack methods have inspired
many follow-up research works, and they are suitable for
evaluation on various models and datasets.
3
Methodology of Evaluating the Impact of the
Model Architecture on Privacy
In this section, we present our approach to assessing the im-
pact of model architectures on privacy leakage. In order to
organize our study in a thorough and logical manner, We aim
to answer the following research questions sequentially:
RQ1: How to analyze the privacy leakage in model ar-
chitectures?
RQ2: What model architectures of CNNs and Trans-
formers should we choose to evaluate these attacks?
RQ3: What performance aspects should we focus on
when evaluating the privacy attacks on model architec-
tures?
RQ4: How should we investigate what designs in model
architectures contribute to privacy leakage?
In this work, we focus on classifier or feature representation
models such as CNNs and Transformers, which are subject to
the investigated privacy attacks. A new line of generative AI
models, such as generative adversarial networks (GANs) and
diffusion models, are vulnerable to different privacy attacks
and thus out of the scope of this paper. We believe our eval-
uation methodology could shed light on the effect of model
privacy from the perspective of model architectures.
3.1 Privacy Threat Models
To answer the first research question (RQ1), we choose three
prominent privacy attack methods: membership inference
attacks, attribute inference attacks, and gradient inversion at-
tacks.
3.1.1 Membership Inference Attacks
Network-Based Attacks. Initiating a network-based mem-
bership inference attack [60,62] requires three models: the
victim model
V
(the target), the shadow model
S
(the model
to mimic the behavior of the victim model), and the attack
model
A
(the classifier to give results whether the sample
belongs to the member or non-member data). The following
paragraphs provide explanations of how the attacks work.
The first step is the attack preparation. Since the adversary
has only black-box access to the victim model
V
, they can
only query the model and record prediction results. To launch
a membership inference attack, the adversary needs to create a
shadow model
S
, which behaves similarly to the victim model
V
. This involves collecting a shadow dataset
DS
, usually
from the same data distribution as the victim dataset
DV
. The
shadow dataset
DS
is then divided into two subsets:
Dtrain
S
for
training and Dtest
Sfor testing.
Once the preparation is complete, the adversary trains the
attack model. The shadow model
S
and shadow dataset
DS
are used to train the attack model
A
. Each prediction result
of a data sample from the shadow dataset
DS
is a vector of
confidence scores for each class, which is concatenated with
a binary label indicating whether the prediction is correct or
not. The resulting vector, denoted as
Pi
S
, is collected for all
n
samples, forming the input set
PS={Pi
S,i=1,...,n}
for the
3
attack model
A
. Since
A
is a binary classifier, a three-layer
MLP (multi-layer perceptron) model is employed to train it.
At last, the adversary launches the attack model inference.
The adversary queries the victim model
V
with the victim
dataset
DV
and records the prediction results, which are used
as the input for the attack model
A
. The attack model then
predicts whether a data sample is a member or non-member
data sample.
Likelihood-Based Attacks. The Likelihood Ratio Attack
(LiRA) [7] is a state-of-the-art attack method that employs
both model posteriors and their likelihoods based on shadow
models. In contrast to attacks relying on a single shadow
model, LiRA requires the adversary to train multiple shadow
models
S={S1,...,Sn}
. This ensures that a target sample
(from the victim dataset
DV
) is included in half of the mod-
els
S
and excluded from the other half. The adversary then
queries the shadow models with the target sample and calcu-
lates the logits for each model. Using these logits, the adver-
sary calculates the probability density function to determine
the likelihood ratio of the target sample, which corresponds
to its membership status.
There are other kinds of membership inference attacks,
including metric-based attacks and label-only attacks [11,
41,66]. Instead of using a neural network to be the attack
model, metric-based attacks [66] launch the attacks using a
certain metric and threshold to separate member data from
non-member data. Label-only attacks [11,41] relax the as-
sumptions of the threat model leveraging only prediction la-
bels as the input of the attack model. Our study focuses on
two types of membership inference attacks: network-based
and likelihood-based attacks. We chose these two types of at-
tacks because the network-based attack is commonly used as
a baseline in many research papers, making it a conventional
attack to consider. Additionally, the likelihood-based attack
is a more recent state-of-the-art attack that has demonstrated
high effectiveness, making it an important attack to evaluate
as well. By considering these two types of attacks, we can
effectively represent the performance of membership infer-
ence attacks against various victim models and gain insights
into potential privacy risks associated with different machine
learning models.
3.1.2 Attribute Inference Attacks
The goal of attribute inference attacks [52,65] is to extract
sensitive attributes from a victim model, which may inadver-
tently reveal information about the training data. For instance,
suppose the victim model is trained to classify whether a per-
son has a beard or not. In that case, an adversary may infer
the person’s race based on the model’s learned representation.
At the attack preparation stage, the victim model
V
is
trained by the victim dataset
DV
with two subsets
Dtrain
V
and
Dtest
Vfor the training and testing.
The second step is also the attack model training. To train
the attack model
A
, the adversary uses an auxiliary dataset
Dtrain
A
, which includes pairs of the representation
h
and the
attribute a, i.e., (h,a)DA.
At last, the adversary launches the attack. The adversary
takes a data sample’s representation
h
as the input and uses
the attack model Ato infer the attribute result.
3.1.3 Gradient Inversion Attacks
Launching the gradient inversion attack [22,92,93] involves
solving an optimization problem, which aims to minimize the
difference between the calculated model gradients and the
original model gradients. The optimization process continues
for a certain number of iterations, after which the input data
sample can be reconstructed.
The adversary operates within a federated learning sce-
nario. In the attack preparation stage, the adversary operates
from the central server, aggregating model gradients to create
a centralized model. Since the adversary has access to the
communication channels used during the federated learning
process, they can retrieve the model gradients and prepare to
extract sensitive information from the training samples. This
allows the adversary to launch attacks against the federated
learning system.
In the step of gradient reconstruction, the aggregated model
gradients are denoted as
θLθ(x,y)
, where
θ
is the model
parameters,
x
and
y
are the original input image and its ground
truth in a local client, and
L
represents the cost function for
the model. To initiate the reconstruction process, the adversary
generates a dummy image
x
. The adversary tries to minimize
this cost function:
argminx||θLθ(x,y)θLθ(x,y)||2
. The
dummy image xis reconstructed to resemble xclosely.
3.2 CNNs vs Transformers
To answer the second research question (RQ2), we investi-
gate the privacy of two mainstream architectures: CNNs and
Transformers. We carefully select several popular CNNs and
Transformers for the attacks to analyze the privacy leakage.
For CNNs, we choose ResNets [26] as baseline models,
which are known for incorporating residual blocks and have
become widely used in various computer vision tasks. We
specifically select ResNet-50 (23.52 million parameters) and
ResNet-101 (42.51 million parameters) to represent CNN
architectures in our analysis. Regarding Transformers, we
focus on Swin Transformers [45], which have gained attention
for their innovative design incorporating attention modules
and shifted window mechanisms. We analyze Swin-T (27.51
million parameters) and Swin-S (48.80 million parameters)
as representatives of Transformer architectures.
To ensure fair comparisons, we organize the evaluation
of the four models based on their parameter sizes, grouping
models with similar parameter sizes together. This approach
allows us to compare models that exhibit comparable task
4
摘要:

HowDoesaDeepLearningModelArchitectureImpactItsPrivacy?AComprehensiveStudyofPrivacyAttacksonCNNsandTransformersGuangshengZhang1BoLiu1HuanTian1TianqingZhu1MingDing2WanleiZhou31UniversityofTechnologySydney2Data61,Australia3CityUniversityofMacauAbstractAsaboomingresearchareainthepastdecade,deeplearningt...

展开>> 收起<<
How Does a Deep Learning Model Architecture Impact Its Privacy A Comprehensive Study of Privacy Attacks on CNNs and Transformers Guangsheng Zhang1Bo Liu1Huan Tian1Tianqing Zhu1.pdf

共19页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:19 页 大小:3.29MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 19
客服
关注