Membership Inference Attacks Against Text-to-image Generation Models Yixin Wu1Ning Yu2Zheng Li1Michael Backes1Yang Zhang1 1CISPA Helmholtz Center for Information Security2Salesforce Research

2025-05-02 0 0 2.14MB 11 页 10玖币
侵权投诉
Membership Inference Attacks Against Text-to-image Generation Models
Yixin Wu1Ning Yu2Zheng Li1Michael Backes1Yang Zhang1
1CISPA Helmholtz Center for Information Security 2Salesforce Research
Abstract
Text-to-image generation models have recently attracted un-
precedented attention as they unlatch imaginative applications
in all areas of life. However, developing such models requires
huge amounts of data that might contain privacy-sensitive
information, e.g., face identity. While privacy risks have been
extensively demonstrated in the image classification and GAN
generation domains, privacy risks in the text-to-image genera-
tion domain are largely unexplored. In this paper, we perform
the first privacy analysis of text-to-image generation mod-
els through the lens of membership inference. Specifically,
we propose three key intuitions about membership informa-
tion and design four attack methodologies accordingly. We
conduct comprehensive evaluations on two mainstream text-
to-image generation models including sequence-to-sequence
modeling and diffusion-based modeling. The empirical results
show that all of the proposed attacks can achieve significant
performance, in some cases even close to an accuracy of 1,
and thus the corresponding risk is much more severe than
that shown by existing membership inference attacks. We
further conduct an extensive ablation study to analyze the fac-
tors that may affect the attack performance, which can guide
developers and researchers to be alert to vulnerabilities in
text-to-image generation models. All these findings indicate
that our proposed attacks pose a realistic privacy threat to the
text-to-image generation models.
1 Introduction
With the superb power of unfastening limitless imaginative
content creations, text-to-image generation has become one
of the most noteworthy topics in the computer vision field and
has been advanced significantly in recent years by a series
of designs such as sequence-to-sequence based models (e.g.,
Parti [36]), and diffusion-based models (e.g., DALL-E 2 [23]
and Imagen [28]). Along with the extremely rapid develop-
ment of this topic, the demand for data is highly growing.
For instance, the Parti/DALL-E 2/Imagen models are trained
on 6.6B/650M/860M image-text pairs, respectively. A stark
reality is that such large amounts of training data collected for
model builders often contain inherently privacy-sensitive in-
formation, such as facial identity, and have raised community
concerns. Under the terms of the GDPR,
1
the LAION organi-
1https://gdpr-info.eu/
zation is calling on people to determine whether their private
information exists in publicly released LAION datasets and
providing support for the removal of their private data.
2
Unfor-
tunately, model builders are unlikely to disclose their training
data due to the huge effort and resources they have put into
collecting them [1,14,27,34,37].
Various recent studies have shown that machine learning
(ML) models are vulnerable to privacy attacks against their
training data, and a major attack in this area is membership
inference: an adversary aims to infer whether a data sample
is part of the dataset trained by the target ML model. The
resulting privacy leakage caused by this attack would raise
serious issues as the training data is intellectual property and
contains sensitive information. In addition, data owners can
also use it to audit whether their data was collected by model
builders without authorization under GDPR terms, i.e., having
better control over their data.
Existing membership inference attacks have been demon-
strated to be a realistic threat to different type of tasks, such as
classification [5,8,9,11,15,16,21,29,32,35] and GAN gener-
ation [3,10]. Unfortunately, the peculiarities of text-to-image
generation do not allow to trivially extend the understanding
of membership leakage from fully explored classification and
GAN generation domains to text-to-image domain. Hence,
these aforementioned realities motivate us to focus on mem-
bership leakage in text-to-image generation models.
Contribution.
In this work, we take the first step towards
studying membership leakage in text-to-image generation
models, where an adversary aims to infer whether a given im-
age is used to train a target text-to-image generation model. In
particular, we focus on the most difficult and realistic scenario
where no additional information about the target model is
available to the adversary other than the output images. Based
on the characteristics of the text-to-image generation mod-
els, we consider three key intuitions and design four attack
methods accordingly. We conduct comprehensive experi-
ments on two representative text-to-image generation modes,
i.e., sequence-to-sequence and diffusion-based. Extensive
empirical results show that all of our proposed attack method-
ologies achieve remarkable performance, which convincingly
demonstrates that membership leakage is a severe threat to
the text-to-image generation models. Furthermore, to delve
into which factors and their impact on the attack performance,
2https://laion.ai/gdpr/
1
arXiv:2210.00968v1 [cs.CR] 3 Oct 2022
we conduct a comprehensive ablation study from different
perspectives that can guide developers and researchers to be
alert to vulnerabilities in text-to-image generation models.
Our main contributions are as follows:
We pioneer in studying the privacy risks of text-to-image
generation models from the perspective of membership
inference.
We consider three intuitions and design four attack
methodologies via differentiating attack intuitions.
We conduct an extensive evaluation on two mainstream
text-to-image generation models, and the results show
the effectiveness and generalizability of the proposed
attacks, indicating such membership leakage poses a
much more severe threat than existing work. We further
perform a comprehensive ablation study from different
angles to analyze the factors that may affect the attack
performance, which is expected to provide instructional
warnings to model inventors.
2 Background
2.1 Text-to-image Generation Models
Text-to-image generation targets to unlock the innovative ap-
plications covering various areas of life, including painting,
design, and multimedia content creation. It generates creative
images combining concepts, attributes, and styles, from ex-
pressive text descriptions. Currently, text-to-image generation
models can be divided into two different designs, namely
diffusion-based modeling and sequence-to-sequence model-
ing.
Diffusion-based.
The diffusion-based models directly lever-
age noise as the input of the de-noising network, starting
from these random points and gradually de-noising them con-
ditioned on textual descriptions until images matching the
conditional information are generated. Building on the power
of diffusion models in high-fidelity image synthesis, the text-
to-image generation is significantly pushed forward by the
recent effort of GLIDE [20], LDM [26], DALL-E 2 [23] and
Imagen [28].
Sequence-to-sequence.
The main idea of this design is
to turn images into discrete image tokens via leveraging
transformer-based image tokenizers (e.g., dVAE [25]), and
employ the sequence-to-sequence architectures to learn the re-
lationship between textual input and visual output from a large
collection of text-image pairs. The representative works of
sequence-to-sequence modeling are Parti [36], DALL-E [24]
and CogView [7].
In this work, we adopt LDM and DALL-E mini as our
target models, representing diffusion-based and sequence-to-
sequence models, respectively.
2.2 Membership Inference Attacks
Membership inference attacks (MIAs), aiming to infer
whether a specific data sample was involved in a target
model’s training phase (called member or non-member), are
considered as an approach to investigating privacy leakage
and detecting illegal data abuse. The basic idea is to ex-
ploit the behavioral difference of the target model on mem-
bers and non-members. Depending on the characteristics
of the target model, the behavioral differences can be con-
structed in different ways. For instance, in the classification
domain, the behavioral difference exploited in most prior
works [9,11,21,29,32,35] is the confidence score of members
over non-members. More recent works [5,16] later attack
in a more realistic scenario where the adversary has access
only to the predicted labels of the target model. Here, the
behavioral differences refer to the fact that the perturbations
added to members to change the predicted labels are larger
than those of non-members. In image generation domain
where the models accept random latent code as input and
then output images, Chen et al. and Hilprecht et al. propose
a customized attack method that estimates the probability of
the query sample can be generated by the generator, where
the behavioral difference is that the probability of members is
greater than non-members [3,10].
Unfortunately, these popular and well-explored attack
methodologies in classification and image generation domains
cannot be trivially extended to the text-to-image generation
domain, because the text-to-image generation model accepts
text as input and then outputs images, which is totally different
from previous works. Hence, it is difficult for existing attack
methods to evaluate whether text-to-image generation models
are truly vulnerable to membership inference, which prompts
the need to investigate new attack methods specifically for the
text-to-image generation models.
3 Problem Statement
In this section, we formulate the text-to-image generation and
the threat model.
3.1 Text-to-image Generation
As aforementioned, we focus on the membership leakage in
the text-to-image generation domain. A text-to-image genera-
tion model
M
can map a text caption
t
to the corresponding
image
x
. To construct a text-to-image generation model
M
,
one needs to collect a huge amount of data pairs (
t
,
x
) to
construct the training set
D
. The model is then optimized via
minimizing a predefined loss function.
3.2 Threat Model
Adversary’s goal.
The goal of the adversary is to infer
whether the user’s image
x
is used to train a target text-to-
image generation model Mtarget.
Adversary’s knowledge.
Typically, the training datasets are
composed of a huge number of data pairs, i.e., text caption
t
and the corresponding image
x
. Here, we assume that the
adversary only queries a candidate image xwithout its corre-
sponding text caption to infer the membership, which is more
realistic and broadly applicable. We assume the adversary
only has black-box access to the target model
Mtarget
, which
is the most difficult and realistic scenario. Besides, we assume
that the adversary has a very small subset from the member
training data of target model
Dmember
sub_target
, as well as a small set
2
Image
Captioning
Azebra standing
under a tree in an
enclosure
Query Image Generated Caption
Diffusion based model
Text-to-image Generation
Sequence-to-sequencebased model
Or
GeneratedImage
Image Generation Phase
Attack Training Phase
Image Generation
Query
Attack Model
!!"#$%$!&'
Generated Image
Image Embedding
Caption Embedding
Attack Input
Member
Generated Image
Image Embedding
Caption Embedding
Attack Input
Non-member
Attack
Training
Dataset
Train
Predict
"is member?
Figure 1: Overview of our attack pipeline.
of local non-member data
Dnon_member
local
. The adversary then
constructs an auxiliary dataset
Dauxiliary ={xmxnm :xm
Dmember
sub_target,xnm Dnon_member
local }
that can be used to train the
attack model
A
, i.e., a binary classifier. Note that, the assump-
tion of auxiliary dataset also holds for previous works [18].
The reason of the second assumption is that due to the re-
source limitation, i.e., insufficient GPU resource, so we are
unable to leverage shadow technique [15,17], i.e., training
shadow models on billions of local image-caption pairs to
mimic the target model Mtarget.
4 Methodology
In this section, we start with the design intuition, and then we
introduce the attack methodologies.
4.1 Intuitions
The key intuition of our work is a general observation about
the overfitting nature of ML models. Concretely, given a
query data pair (text caption
t
and the corresponding image
x
),
the text-to-image generation models accept text
t
as input and
are optimized to generate image
x0
that are similar or same
to the original image
x
. This derives to the following three
perspective of key intuitions on membership information:
Intuition I.
The quality of generated image
x0
of data pair
(
t
and
x
) from training set should be higher than that from
testing set.
Intuition II.
The reconstruction error between generated im-
age
x0
and original image
x
from training set should be smaller
than that from testing set.
Intuition III. The generated image x0should more faithfully
reflects the semantic of a textual caption
t
from the training
set than the testing set.
Therefore, we focus on the distinguishability of member-
ship by observing behavioral differences of these intuitions,
i.e., members and non-members behave differently in the
three aspects mentioned above.
4.2 Attack Methodologies
As the adversary only holds a query image
x
that they aim
to infer, the adversary initializes the attack by leveraging a
third parity image captioning tool to generate a caption
t
for
the given query image
x
, as illustrated in Figure 1. Then,
they feed the generated caption
t
into the target text-to-image
generation model
Mtarget
to obtain a generated image
x0
. In
this way, we connect the query image and generated images
explicitly. In addition, we only need to query the target model
once for each query image to get one corresponding generated
image, largely decreasing the possibility of being detected
by defense mechanisms. Building on the attack pipeline, we
design 4 different types of attacks via exploiting different
intuitions, i.e., Attack-I/II/III based on Intuition-I/II/III, and
Attack-IV using all intuitions, as illustrated by Table 1. In
the end, the adversary constructs an attack dataset and trains
the attack models. The dataset is split by half as the attack
training dataset and attack testing dataset. We provide the
details of each attack method as follows.
Attack I.
This attack is based on Intuition-I that there is a dis-
crepancy between members and non-members in terms of the
quality of generated images. For simplicity, an adversary can
differentiate between members or non-members by feeding
the query image directly into the attack model, rather than
measuring quality and then making the distinction. Such an
attack method is based on the pixel-level discrepancy, hence
3
摘要:

MembershipInferenceAttacksAgainstText-to-imageGenerationModelsYixinWu1NingYu2ZhengLi1MichaelBackes1YangZhang11CISPAHelmholtzCenterforInformationSecurity2SalesforceResearchAbstractText-to-imagegenerationmodelshaverecentlyattractedun-precedentedattentionastheyunlatchimaginativeapplicationsinallareasof...

展开>> 收起<<
Membership Inference Attacks Against Text-to-image Generation Models Yixin Wu1Ning Yu2Zheng Li1Michael Backes1Yang Zhang1 1CISPA Helmholtz Center for Information Security2Salesforce Research.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:2.14MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注