Membership Inference Attacks Against Text-to-image Generation Models Yixin Wu1Ning Yu2Zheng Li1Michael Backes1Yang Zhang1 1CISPA Helmholtz Center for Information Security2Salesforce Research

2025-05-02 0 0 2.14MB 11 页 10玖币

侵权投诉

Membership Inference Attacks Against Text-to-image Generation Models

Yixin Wu1Ning Yu2Zheng Li1Michael Backes1Yang Zhang1

1CISPA Helmholtz Center for Information Security 2Salesforce Research

Abstract

Text-to-image generation models have recently attracted un-

precedented attention as they unlatch imaginative applications

in all areas of life. However, developing such models requires

huge amounts of data that might contain privacy-sensitive

information, e.g., face identity. While privacy risks have been

extensively demonstrated in the image classiﬁcation and GAN

generation domains, privacy risks in the text-to-image genera-

tion domain are largely unexplored. In this paper, we perform

the ﬁrst privacy analysis of text-to-image generation mod-

els through the lens of membership inference. Speciﬁcally,

we propose three key intuitions about membership informa-

tion and design four attack methodologies accordingly. We

conduct comprehensive evaluations on two mainstream text-

to-image generation models including sequence-to-sequence

modeling and diffusion-based modeling. The empirical results

show that all of the proposed attacks can achieve signiﬁcant

performance, in some cases even close to an accuracy of 1,

and thus the corresponding risk is much more severe than

that shown by existing membership inference attacks. We

further conduct an extensive ablation study to analyze the fac-

tors that may affect the attack performance, which can guide

developers and researchers to be alert to vulnerabilities in

text-to-image generation models. All these ﬁndings indicate

that our proposed attacks pose a realistic privacy threat to the

text-to-image generation models.

1 Introduction

With the superb power of unfastening limitless imaginative

content creations, text-to-image generation has become one

of the most noteworthy topics in the computer vision ﬁeld and

has been advanced signiﬁcantly in recent years by a series

of designs such as sequence-to-sequence based models (e.g.,

Parti [36]), and diffusion-based models (e.g., DALL-E 2 [23]

and Imagen [28]). Along with the extremely rapid develop-

ment of this topic, the demand for data is highly growing.

For instance, the Parti/DALL-E 2/Imagen models are trained

on 6.6B/650M/860M image-text pairs, respectively. A stark

reality is that such large amounts of training data collected for

model builders often contain inherently privacy-sensitive in-

formation, such as facial identity, and have raised community

concerns. Under the terms of the GDPR,

the LAION organi-

1https://gdpr-info.eu/

zation is calling on people to determine whether their private

information exists in publicly released LAION datasets and

providing support for the removal of their private data.

Unfor-

tunately, model builders are unlikely to disclose their training

data due to the huge effort and resources they have put into

collecting them [1,14,27,34,37].

Various recent studies have shown that machine learning

(ML) models are vulnerable to privacy attacks against their

training data, and a major attack in this area is membership

inference: an adversary aims to infer whether a data sample

is part of the dataset trained by the target ML model. The

resulting privacy leakage caused by this attack would raise

serious issues as the training data is intellectual property and

contains sensitive information. In addition, data owners can

also use it to audit whether their data was collected by model

builders without authorization under GDPR terms, i.e., having

better control over their data.

Existing membership inference attacks have been demon-

strated to be a realistic threat to different type of tasks, such as

classiﬁcation [5,8,9,11,15,16,21,29,32,35] and GAN gener-

ation [3,10]. Unfortunately, the peculiarities of text-to-image

generation do not allow to trivially extend the understanding

of membership leakage from fully explored classiﬁcation and

GAN generation domains to text-to-image domain. Hence,

these aforementioned realities motivate us to focus on mem-

bership leakage in text-to-image generation models.

Contribution.

In this work, we take the ﬁrst step towards

studying membership leakage in text-to-image generation

models, where an adversary aims to infer whether a given im-

age is used to train a target text-to-image generation model. In

particular, we focus on the most difﬁcult and realistic scenario

where no additional information about the target model is

available to the adversary other than the output images. Based

on the characteristics of the text-to-image generation mod-

els, we consider three key intuitions and design four attack

methods accordingly. We conduct comprehensive experi-

ments on two representative text-to-image generation modes,

i.e., sequence-to-sequence and diffusion-based. Extensive

empirical results show that all of our proposed attack method-

ologies achieve remarkable performance, which convincingly

demonstrates that membership leakage is a severe threat to

the text-to-image generation models. Furthermore, to delve

into which factors and their impact on the attack performance,

2https://laion.ai/gdpr/

arXiv:2210.00968v1 [cs.CR] 3 Oct 2022

we conduct a comprehensive ablation study from different

perspectives that can guide developers and researchers to be

alert to vulnerabilities in text-to-image generation models.

Our main contributions are as follows:

•

We pioneer in studying the privacy risks of text-to-image

generation models from the perspective of membership

inference.

•

We consider three intuitions and design four attack

methodologies via differentiating attack intuitions.

•

We conduct an extensive evaluation on two mainstream

text-to-image generation models, and the results show

the effectiveness and generalizability of the proposed

attacks, indicating such membership leakage poses a

much more severe threat than existing work. We further

perform a comprehensive ablation study from different

angles to analyze the factors that may affect the attack

performance, which is expected to provide instructional

warnings to model inventors.

2 Background

2.1 Text-to-image Generation Models

Text-to-image generation targets to unlock the innovative ap-

plications covering various areas of life, including painting,

design, and multimedia content creation. It generates creative

images combining concepts, attributes, and styles, from ex-

pressive text descriptions. Currently, text-to-image generation

models can be divided into two different designs, namely

diffusion-based modeling and sequence-to-sequence model-

ing.

Diffusion-based.

The diffusion-based models directly lever-

age noise as the input of the de-noising network, starting

from these random points and gradually de-noising them con-

ditioned on textual descriptions until images matching the

conditional information are generated. Building on the power

of diffusion models in high-ﬁdelity image synthesis, the text-

to-image generation is signiﬁcantly pushed forward by the

recent effort of GLIDE [20], LDM [26], DALL-E 2 [23] and

Imagen [28].

Sequence-to-sequence.

The main idea of this design is

to turn images into discrete image tokens via leveraging

transformer-based image tokenizers (e.g., dVAE [25]), and

employ the sequence-to-sequence architectures to learn the re-

lationship between textual input and visual output from a large

collection of text-image pairs. The representative works of

sequence-to-sequence modeling are Parti [36], DALL-E [24]

and CogView [7].

In this work, we adopt LDM and DALL-E mini as our

target models, representing diffusion-based and sequence-to-

sequence models, respectively.

2.2 Membership Inference Attacks

Membership inference attacks (MIAs), aiming to infer

whether a speciﬁc data sample was involved in a target

model’s training phase (called member or non-member), are

considered as an approach to investigating privacy leakage

and detecting illegal data abuse. The basic idea is to ex-

ploit the behavioral difference of the target model on mem-

bers and non-members. Depending on the characteristics

of the target model, the behavioral differences can be con-

structed in different ways. For instance, in the classiﬁcation

domain, the behavioral difference exploited in most prior

works [9,11,21,29,32,35] is the conﬁdence score of members

over non-members. More recent works [5,16] later attack

in a more realistic scenario where the adversary has access

only to the predicted labels of the target model. Here, the

behavioral differences refer to the fact that the perturbations

added to members to change the predicted labels are larger

than those of non-members. In image generation domain

where the models accept random latent code as input and

then output images, Chen et al. and Hilprecht et al. propose

a customized attack method that estimates the probability of

the query sample can be generated by the generator, where

the behavioral difference is that the probability of members is

greater than non-members [3,10].

Unfortunately, these popular and well-explored attack

methodologies in classiﬁcation and image generation domains

cannot be trivially extended to the text-to-image generation

domain, because the text-to-image generation model accepts

text as input and then outputs images, which is totally different

from previous works. Hence, it is difﬁcult for existing attack

methods to evaluate whether text-to-image generation models

are truly vulnerable to membership inference, which prompts

the need to investigate new attack methods speciﬁcally for the

text-to-image generation models.

3 Problem Statement

In this section, we formulate the text-to-image generation and

the threat model.

3.1 Text-to-image Generation

As aforementioned, we focus on the membership leakage in

the text-to-image generation domain. A text-to-image genera-

tion model

can map a text caption

to the corresponding

image

. To construct a text-to-image generation model

one needs to collect a huge amount of data pairs (

) to

construct the training set

. The model is then optimized via

minimizing a predeﬁned loss function.

3.2 Threat Model

Adversary’s goal.

The goal of the adversary is to infer

whether the user’s image

is used to train a target text-to-

image generation model Mtarget.

Adversary’s knowledge.

Typically, the training datasets are

composed of a huge number of data pairs, i.e., text caption

and the corresponding image

. Here, we assume that the

adversary only queries a candidate image xwithout its corre-

sponding text caption to infer the membership, which is more

realistic and broadly applicable. We assume the adversary

only has black-box access to the target model

Mtarget

, which

is the most difﬁcult and realistic scenario. Besides, we assume

that the adversary has a very small subset from the member

training data of target model

Dmember

sub_target

, as well as a small set

Image

Captioning

Azebra standing

under a tree in an

enclosure

Query Image Generated Caption

Diffusion based model

Text-to-image Generation

Sequence-to-sequencebased model

GeneratedImage

Image Generation Phase

Attack Training Phase

Image Generation

Query

Attack Model

!!"#$%$!&'

Generated Image

Image Embedding

Caption Embedding

Attack Input

Member

…

Generated Image

Image Embedding

Caption Embedding

Attack Input

Non-member

…

Attack

Training

Dataset

Train

Predict

"is member?

Figure 1: Overview of our attack pipeline.

of local non-member data

Dnon_member

local

. The adversary then

constructs an auxiliary dataset

Dauxiliary ={xm∪xnm :xm∈

Dmember

sub_target,xnm ∈Dnon_member

local }

that can be used to train the

attack model

, i.e., a binary classiﬁer. Note that, the assump-

tion of auxiliary dataset also holds for previous works [18].

The reason of the second assumption is that due to the re-

source limitation, i.e., insufﬁcient GPU resource, so we are

unable to leverage shadow technique [15,17], i.e., training

shadow models on billions of local image-caption pairs to

mimic the target model Mtarget.

4 Methodology

In this section, we start with the design intuition, and then we

introduce the attack methodologies.

4.1 Intuitions

The key intuition of our work is a general observation about

the overﬁtting nature of ML models. Concretely, given a

query data pair (text caption

and the corresponding image

the text-to-image generation models accept text

as input and

are optimized to generate image

that are similar or same

to the original image

. This derives to the following three

perspective of key intuitions on membership information:

Intuition I.

The quality of generated image

of data pair

(

and

) from training set should be higher than that from

testing set.

Intuition II.

The reconstruction error between generated im-

age

and original image

from training set should be smaller

than that from testing set.

Intuition III. The generated image x0should more faithfully

reﬂects the semantic of a textual caption

from the training

set than the testing set.

Therefore, we focus on the distinguishability of member-

ship by observing behavioral differences of these intuitions,

i.e., members and non-members behave differently in the

three aspects mentioned above.

4.2 Attack Methodologies

As the adversary only holds a query image

that they aim

to infer, the adversary initializes the attack by leveraging a

third parity image captioning tool to generate a caption

for

the given query image

, as illustrated in Figure 1. Then,

they feed the generated caption

into the target text-to-image

generation model

Mtarget

to obtain a generated image

. In

this way, we connect the query image and generated images

explicitly. In addition, we only need to query the target model

once for each query image to get one corresponding generated

image, largely decreasing the possibility of being detected

by defense mechanisms. Building on the attack pipeline, we

design 4 different types of attacks via exploiting different

intuitions, i.e., Attack-I/II/III based on Intuition-I/II/III, and

Attack-IV using all intuitions, as illustrated by Table 1. In

the end, the adversary constructs an attack dataset and trains

the attack models. The dataset is split by half as the attack

training dataset and attack testing dataset. We provide the

details of each attack method as follows.

Attack I.

This attack is based on Intuition-I that there is a dis-

crepancy between members and non-members in terms of the

quality of generated images. For simplicity, an adversary can

differentiate between members or non-members by feeding

the query image directly into the attack model, rather than

measuring quality and then making the distinction. Such an

attack method is based on the pixel-level discrepancy, hence

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MembershipInferenceAttacksAgainstText-to-imageGenerationModelsYixinWu1NingYu2ZhengLi1MichaelBackes1YangZhang11CISPAHelmholtzCenterforInformationSecurity2SalesforceResearchAbstractText-to-imagegenerationmodelshaverecentlyattractedun-precedentedattentionastheyunlatchimaginativeapplicationsinallareasof...

展开>> 收起<<

Membership Inference Attacks Against Text-to-image Generation Models Yixin Wu1Ning Yu2Zheng Li1Michael Backes1Yang Zhang1 1CISPA Helmholtz Center for Information Security2Salesforce Research.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Membership Inference Attacks Against Text-to-image Generation Models Yixin Wu1Ning Yu2Zheng Li1Michael Backes1Yang Zhang1 1CISPA Helmholtz Center for Information Security2Salesforce Research

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: