learning [54,73], and transfer learning [64,94]. Other re-
searchers relaxed the attack assumptions and improved the
attacks, including discussion on white-box/black-box access
for the attacks [59], providing more metrics (i.e. ROC curves
and the true positive rate at a low false positive rate) to mea-
sure the attack performance more accurately [7,34,48,77,82].
We select [7,60,62] as our baseline methods.
Attribute inference attacks, another significant category
of privacy attack methods, attempt to reveal a specific sensi-
tive attribute of a data sample by analyzing the posteriors of
the victim model trained by the victim dataset. Some early re-
search launched the attacks by generating input samples with
different sensitive attributes and observed the victim model
output [21,83]. However, these methods could only work in
structured data. Later research improved the attacks with vic-
tim model representations [52,65]. They also claimed that
the overlearning feature of deep learning models caused the
execution of the attacks [65]. Attributes could also be inferred
through a relaxed notion [91], model explanations [18], label-
only settings [51], or imputation analysis [33]. As we aim to
infer attributes from visual data, we select [52,65] as baseline
methods.
Gradient inversion attacks primarily aim to reconstruct
training samples at the local clients in federated learning. Us-
ing the publicly shared gradients in the server, adversaries can
execute the attacks by reconstructing the training samples us-
ing gradient matching. DLG [93] and its variant, iDLG [92],
were the early attacks to employ an optimization-based tech-
nique to reconstruct the training samples. Later research like
Inverting Gradients [22] and GradInversion [84] improved the
attack performance by incorporating regularizations into the
optimization process. APRIL [49] and GradViT [24] further
developed the attack methods to extract sensitive information
from Transformers. The use of Generative Adversarial Net-
works (GANs) in some gradient inversion attack methods [42]
can have a significant impact on reconstructed results, making
it difficult to isolate the influence of other factors on privacy
leakage. Therefore, we use conventional gradient inversion
attack methods [22] that do not involve the use of GANs.
There have been several evaluations and reviews of these
privacy attacks against deep learning models [27,31,43,44,66,
88,90]. However, we aim to evaluate the model architectures
leveraging these privacy attacks. To sum up, we utilize con-
ventional privacy attacks [7,22,52,60,62,65] as the baseline
attacks in our analysis, for these attack methods have inspired
many follow-up research works, and they are suitable for
evaluation on various models and datasets.
3
Methodology of Evaluating the Impact of the
Model Architecture on Privacy
In this section, we present our approach to assessing the im-
pact of model architectures on privacy leakage. In order to
organize our study in a thorough and logical manner, We aim
to answer the following research questions sequentially:
•
RQ1: How to analyze the privacy leakage in model ar-
chitectures?
•
RQ2: What model architectures of CNNs and Trans-
formers should we choose to evaluate these attacks?
•
RQ3: What performance aspects should we focus on
when evaluating the privacy attacks on model architec-
tures?
•
RQ4: How should we investigate what designs in model
architectures contribute to privacy leakage?
In this work, we focus on classifier or feature representation
models such as CNNs and Transformers, which are subject to
the investigated privacy attacks. A new line of generative AI
models, such as generative adversarial networks (GANs) and
diffusion models, are vulnerable to different privacy attacks
and thus out of the scope of this paper. We believe our eval-
uation methodology could shed light on the effect of model
privacy from the perspective of model architectures.
3.1 Privacy Threat Models
To answer the first research question (RQ1), we choose three
prominent privacy attack methods: membership inference
attacks, attribute inference attacks, and gradient inversion at-
tacks.
3.1.1 Membership Inference Attacks
Network-Based Attacks. Initiating a network-based mem-
bership inference attack [60,62] requires three models: the
victim model
V
(the target), the shadow model
S
(the model
to mimic the behavior of the victim model), and the attack
model
A
(the classifier to give results whether the sample
belongs to the member or non-member data). The following
paragraphs provide explanations of how the attacks work.
The first step is the attack preparation. Since the adversary
has only black-box access to the victim model
V
, they can
only query the model and record prediction results. To launch
a membership inference attack, the adversary needs to create a
shadow model
S
, which behaves similarly to the victim model
V
. This involves collecting a shadow dataset
DS
, usually
from the same data distribution as the victim dataset
DV
. The
shadow dataset
DS
is then divided into two subsets:
Dtrain
S
for
training and Dtest
Sfor testing.
Once the preparation is complete, the adversary trains the
attack model. The shadow model
S
and shadow dataset
DS
are used to train the attack model
A
. Each prediction result
of a data sample from the shadow dataset
DS
is a vector of
confidence scores for each class, which is concatenated with
a binary label indicating whether the prediction is correct or
not. The resulting vector, denoted as
Pi
S
, is collected for all
n
samples, forming the input set
PS={Pi
S,i=1,...,n}
for the
3