we conduct a comprehensive ablation study from different
perspectives that can guide developers and researchers to be
alert to vulnerabilities in text-to-image generation models.
Our main contributions are as follows:
•
We pioneer in studying the privacy risks of text-to-image
generation models from the perspective of membership
inference.
•
We consider three intuitions and design four attack
methodologies via differentiating attack intuitions.
•
We conduct an extensive evaluation on two mainstream
text-to-image generation models, and the results show
the effectiveness and generalizability of the proposed
attacks, indicating such membership leakage poses a
much more severe threat than existing work. We further
perform a comprehensive ablation study from different
angles to analyze the factors that may affect the attack
performance, which is expected to provide instructional
warnings to model inventors.
2 Background
2.1 Text-to-image Generation Models
Text-to-image generation targets to unlock the innovative ap-
plications covering various areas of life, including painting,
design, and multimedia content creation. It generates creative
images combining concepts, attributes, and styles, from ex-
pressive text descriptions. Currently, text-to-image generation
models can be divided into two different designs, namely
diffusion-based modeling and sequence-to-sequence model-
ing.
Diffusion-based.
The diffusion-based models directly lever-
age noise as the input of the de-noising network, starting
from these random points and gradually de-noising them con-
ditioned on textual descriptions until images matching the
conditional information are generated. Building on the power
of diffusion models in high-fidelity image synthesis, the text-
to-image generation is significantly pushed forward by the
recent effort of GLIDE [20], LDM [26], DALL-E 2 [23] and
Imagen [28].
Sequence-to-sequence.
The main idea of this design is
to turn images into discrete image tokens via leveraging
transformer-based image tokenizers (e.g., dVAE [25]), and
employ the sequence-to-sequence architectures to learn the re-
lationship between textual input and visual output from a large
collection of text-image pairs. The representative works of
sequence-to-sequence modeling are Parti [36], DALL-E [24]
and CogView [7].
In this work, we adopt LDM and DALL-E mini as our
target models, representing diffusion-based and sequence-to-
sequence models, respectively.
2.2 Membership Inference Attacks
Membership inference attacks (MIAs), aiming to infer
whether a specific data sample was involved in a target
model’s training phase (called member or non-member), are
considered as an approach to investigating privacy leakage
and detecting illegal data abuse. The basic idea is to ex-
ploit the behavioral difference of the target model on mem-
bers and non-members. Depending on the characteristics
of the target model, the behavioral differences can be con-
structed in different ways. For instance, in the classification
domain, the behavioral difference exploited in most prior
works [9,11,21,29,32,35] is the confidence score of members
over non-members. More recent works [5,16] later attack
in a more realistic scenario where the adversary has access
only to the predicted labels of the target model. Here, the
behavioral differences refer to the fact that the perturbations
added to members to change the predicted labels are larger
than those of non-members. In image generation domain
where the models accept random latent code as input and
then output images, Chen et al. and Hilprecht et al. propose
a customized attack method that estimates the probability of
the query sample can be generated by the generator, where
the behavioral difference is that the probability of members is
greater than non-members [3,10].
Unfortunately, these popular and well-explored attack
methodologies in classification and image generation domains
cannot be trivially extended to the text-to-image generation
domain, because the text-to-image generation model accepts
text as input and then outputs images, which is totally different
from previous works. Hence, it is difficult for existing attack
methods to evaluate whether text-to-image generation models
are truly vulnerable to membership inference, which prompts
the need to investigate new attack methods specifically for the
text-to-image generation models.
3 Problem Statement
In this section, we formulate the text-to-image generation and
the threat model.
3.1 Text-to-image Generation
As aforementioned, we focus on the membership leakage in
the text-to-image generation domain. A text-to-image genera-
tion model
M
can map a text caption
t
to the corresponding
image
x
. To construct a text-to-image generation model
M
,
one needs to collect a huge amount of data pairs (
t
,
x
) to
construct the training set
D
. The model is then optimized via
minimizing a predefined loss function.
3.2 Threat Model
Adversary’s goal.
The goal of the adversary is to infer
whether the user’s image
x
is used to train a target text-to-
image generation model Mtarget.
Adversary’s knowledge.
Typically, the training datasets are
composed of a huge number of data pairs, i.e., text caption
t
and the corresponding image
x
. Here, we assume that the
adversary only queries a candidate image xwithout its corre-
sponding text caption to infer the membership, which is more
realistic and broadly applicable. We assume the adversary
only has black-box access to the target model
Mtarget
, which
is the most difficult and realistic scenario. Besides, we assume
that the adversary has a very small subset from the member
training data of target model
Dmember
sub_target
, as well as a small set
2