Untargeted Backdoor Watermark Towards Harmless and Stealthy Dataset Copyright Protection Yiming Li1 Yang Bai2 Yong Jiang1 Yong Yang3 Shu-Tao Xia1 Bo Li4

2025-05-06 2 0 2.78MB 26 页 10玖币

侵权投诉

Untargeted Backdoor Watermark: Towards

Harmless and Stealthy Dataset Copyright Protection

Yiming Li1,∗

, Yang Bai2,∗, Yong Jiang1, Yong Yang3, Shu-Tao Xia1, Bo Li4

1Tsinghua Shenzhen International Graduate School, Tsinghua University, China

2Tencent Security Zhuque Lab, China

3Tencent Security Platform Department, China

4The Department of Computer Science, University of Illinois at Urbana-Champaign, USA

li-ym18@mails.tinghua.edu.cn;{mavisbai,coolcyang}@tencent.com;

{jiangy,xiast}@sz.tsinghua.edu.cn;lbo@illinois.edu

Abstract

Deep neural networks (DNNs) have demonstrated their superiority in practice.

Arguably, the rapid development of DNNs is largely beneﬁted from high-quality

(open-sourced) datasets, based on which researchers and developers can easily

evaluate and improve their learning methods. Since the data collection is usually

time-consuming or even expensive, how to protect their copyrights is of great

signiﬁcance and worth further exploration. In this paper, we revisit dataset own-

ership veriﬁcation. We ﬁnd that existing veriﬁcation methods introduced new

security risks in DNNs trained on the protected dataset, due to the targeted nature

of poison-only backdoor watermarks. To alleviate this problem, in this work, we

explore the untargeted backdoor watermarking scheme, where the abnormal model

behaviors are not deterministic. Speciﬁcally, we introduce two dispersibilities

and prove their correlation, based on which we design the untargeted backdoor

watermark under both poisoned-label and clean-label settings. We also discuss how

to use the proposed untargeted backdoor watermark for dataset ownership veriﬁca-

tion. Experiments on benchmark datasets verify the effectiveness of our methods

and their resistance to existing backdoor defenses. Our codes are available at

https://github.com/THUYimingLi/Untargeted_Backdoor_Watermark.

1 Introduction

Deep neural networks (DNNs) have been widely and successfully deployed in many applications, for

their effectiveness and efﬁciency. Arguably, the existence of high-quality open-sourced datasets (

e.g.

CIFAR-10 [

] and ImageNet [

]) is one of the key factors for the prosperity of DNNs. Researchers

and developers can easily evaluate and improve their methods based on them. However, these datasets

may probably be used for commercial purposes without authorization rather than only the educational

or academic goals, due to their high accessibility.

Currently, there were some classical methods for data protection, including encryption, data water-

marking, and defenses against data leakage. However, these methods cannot be used to protect the

copyrights of open-sourced datasets, since they either hinder the dataset accessibility or functionality

(

e.g.

, encryption), require manipulating the training process (

e.g.

, differential privacy), or even have

no effect in this case. To the best of our knowledge, there is only one method [

] designed for

protecting open-sourced datasets. Speciﬁcally, it ﬁrst adopted poison-only backdoor attacks [

] to

watermark the unprotected dataset and then conducted ownership veriﬁcation by verifying whether

the suspicious model has speciﬁc targeted backdoor behaviors (as shown in Figure 1).

∗The ﬁrst two authors contributed equally to this work. Correspondence to: Yang Bai and Shu-Tao Xia.

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.00875v3 [cs.CR] 5 Apr 2023

Benign Images

Poisoned Images

Probability on

the Target Class 𝑷"

Hypothesis Tes t

𝐻$: 𝑷&> 𝑷"

Suspicious DNN

Probability on

the Target Class 𝑷&

Figure 1: The veriﬁcation process of BEDW.

Poisoned Images

DNNs with

Targeted Backdoor Watermarks

Target Label

Random Labels

DNNs with Our

Untargeted Backdoor Watermarks

Figure 2: The inference process of DNNs

with different types of backdoor watermarks.

In this paper, we revisit dataset ownership veriﬁcation. We argue that BEDW introduced new

threatening security risks in DNNs trained on the protected datasets, due to the targeted manner

of existing backdoor watermarks. Speciﬁcally, the adversaries can exploit the embedded hidden

backdoors to maliciously and deterministically manipulate model predictions (as shown in Figure 2).

Based on this understanding, we explore how to design the untargeted backdoor watermark (UBW)

and how to use it for harmless and stealthy dataset ownership veriﬁcation. Speciﬁcally, we ﬁrst

introduce two dispersibilities, including averaged sample-wise and averaged class-wise dispersibility,

and prove their correlation. Based on them, we propose a simple yet effective heuristic method for

UBW with poisoned labels (

i.e.

, UBW-P) and the UBW with clean labels (

i.e.

, UBW-C) based on

bi-level optimization. The UBW-P is more effective while the UBW-C is more stealthy. We also

design a UBW-based dataset ownership veriﬁcation, based on the pairwise T-test [6] at the end.

The main contributions of this paper are four-fold:

We reveal the limitations of existing methods in

protecting the copyrights of open-sourced datasets;

We explore the untargeted backdoor watermark

(UBW) paradigm under both poisoned-label and clean-label settings;

We further discuss how to

use our UBW for harmless and stealthy dataset ownership veriﬁcation;

Extensive experiments on

benchmark datasets verify the effectiveness of our method.

2 Related Work

In this paper, we focus on the backdoor watermarks in image classiﬁcation. The watermarks in other

tasks (e.g., [7, 8, 9]) and their dataset protection are out of the scope of this paper.

2.1 Data Protection

Data protection aims to prevent unauthorized data usage or protect data privacy, which has always

been an important research direction. Currently, encryption, data watermarking, and the defenses

against data leakage are the most widespread methods discussed in data protection, as follows:

Encryption.

Currently, encryption is the most widely used data protection method, which intends to

encrypt the whole or parts of the protected data [

]. Only authorized users have the secret

key to decrypt the encrypted data for further usage. Except for directly preventing unauthorized data

usage, there were also some empirical methods focused on encrypting only the sensitive information

(e.g., backgrounds or image-label mappings) [13, 14, 15].

Data Watermarking.

This approach was initially used to embed a distinctive watermark into the

data to protect its copyright based on ownership veriﬁcation [

]. Recently, data watermarking

was also adopted for other applications, such as DeepFake detection [

] and image steganography

[20], inspired by its unique properties.

Defenses against Data Leakage.

These methods mainly focus on preventing the leakage of sensitive

information (

e.g.

, membership inference [

], attribute inference [

], and deep gradient leakage

[

]) during the training process. Among all these methods, differential privacy [

] is the

most representative one for its good theoretical properties and effectiveness. In general, differential

privacy requires to introduce certain randomness via adding noises when training the model.

However, the aforementioned existing methods can not be adopted to prevent open-soured datasets

from being unauthorizedly used, since they either hinder dataset functionalities or are not capable in

this scenario. To the best of our knowledge, there was only one method [

] designed for protecting

open-sourced datasets, based on the poison-only targeted backdoor attacks [

]. However, this method

will introduce new security threats in the models trained on the protected dataset, which hinders its

usage. How to better protect dataset copyrights is still an important open question.

2.2 Backdoor Attacks

Backdoor attacks are emerging yet critical threats in the training process of deep neural networks

(DNNs), where the adversary intends to embed hidden backdoors into DNNs. The attacked models

behave normally in predicting benign samples, whereas the predictions are maliciously changed

whenever the adversary-speciﬁed trigger patterns appear. Due to this property, they were also used as

the watermark techniques for model [27, 28, 29] and dataset [3, 4] ownership veriﬁcation.

In general, existing backdoor attacks can be divided into three main categories, including

poison-

only attacks [

training-controlled attacks [

], and

model-modiﬁed attacks

[

], based on the adversary’s capacity levels. In this paper, we only focus on poison-only

backdoor attacks, since they are the hardest attack having widespread threat scenarios. Only these

attacks can be used to protect open-sourced datasets [

]. In particular, based on the label type,

existing poison-only attacks can also be separated into two main sub-types, as follows:

Poison-only Backdoor Attacks with Poisoned Labels.

In these attacks, the re-assigned labels of

poisoned samples are different from their ground-truth labels. For example, a cat-like poisoned image

may be labeled as the dog in the poisoned dataset released by backdoor adversaries. It is currently

the most widespread attack paradigm. To the best of our knowledge, BadNets [

] is the ﬁrst and

most representative attack with poisoned labels. Speciﬁcally, the BadNets adversary randomly selects

certain benign samples from the original benign dataset to generate poisoned samples, based on

adding a speciﬁc trigger pattern to the images and changing their labels to the pre-deﬁned target label.

The adversary will then combine the generated poisoned samples with the remaining benign ones to

make the poisoned dataset, which is released to train the attacked models. After that, Chen et al. [

]

proposed the blended attack, which suggested that the poisoned image should be similar to its benign

version to ensure stealthiness. Most recently, a more stealthy and effective attack (

i.e.

, WaNet [

])

was proposed, which exploited image warping to design trigger patterns.

Poison-only Backdoor Attacks with Clean Labels.

Turner et al. [

] proposed the ﬁrst poison-

only backdoor attack with clean labels (i.e., label-consistent attack), where the target label is the

same as the ground-truth label of all poisoned samples. They argued that attacks with poisoned

labels were not stealthy enough even when the trigger pattern was invisible, since users could still

identify the attacks by examining the image-label relation when they caught the poisoned samples.

However, this attack is far less effective when the dataset has many classes or high image-resolution

(

e.g.

, GTSRB and ImageNet) [

]. Most recently, a more effective attack (i.e., Sleeper Agent)

was proposed, which generated trigger patterns by optimization [

]. Nevertheless, these attacks are

still difﬁcult since the ‘robust features’ contained in the poisoned images will hinder the learning of

trigger patterns [

]. How to design attacks with clean labels is still left far behind and worth further

exploration.

Besides, to the best of our knowledge, all existing backdoor attacks are targeted,

i.e.

, the predictions

of poisoned samples are deterministic and known by the adversaries. How to design backdoor attacks

in an untargeted manner and its positive applications remain blank and worth further explorations.

3 Untargeted Backdoor Watermark (UBW)

3.1 Preliminaries

Threat Model.

In this paper, we focus on poison-only backdoor attacks as the backdoor watermarks

in image classiﬁcation. Speciﬁcally, the backdoor adversaries are only allowed to modify some benign

samples while having neither the information nor the ability to modify other training components

(

e.g.

, training loss, training schedule, and model structure). The generated poisoned samples with

remaining unmodiﬁed benign ones will be released to victims, who will train their DNNs based on

them. In particular, we only consider poison-only backdoor attacks instead of other types of methods

(

e.g.

, training-controlled attacks or model-modiﬁed attacks) because they require additional adversary

capacities and therefore can not be used to protect open-sourced datasets [3, 4].

The Main Pipeline of Existing Targeted Backdoor Attacks.

Let

D={(xi, yi)}N

i=1

denotes the

benign training set, where

xi∈ X ={0,1,...,255}C×W×H

is the image,

yi∈ Y ={1, . . . , K}

is its label, and

is the number of classes. How to generate the poisoned dataset

is the

cornerstone of poison-only backdoor attacks. To the best of our knowledge, almost all existing

backdoor attacks are targeted, where all poisoned samples share the same target label. Speciﬁcally,

consists of two disjoint parts, including the modiﬁed version of a selected subset (

i.e.

) of

and remaining benign samples,

i.e.

Dp=Dm∪ Db

, where

is an adversary-speciﬁed target

label,

Db=D\Ds

Dm={(x0, yt)|x0=G(x;θ),(x, y)∈ Ds}

γ,|Ds|

|D|

is the poisoning rate,

and

G:X → X

is an adversary-speciﬁed poisoned image generator with parameter

. In particular,

poison-only backdoor attacks are mainly characterized by their poison generator

. For example,

G(x)=(1−α)⊗x+α⊗t

, where

α∈[0,1]C×W×H

t∈ X

is the trigger pattern, and

⊗

the element-wise product in the blended attack [

];

G(x) = x+t

in the ISSBA [

]. Once the

poisoned dataset

is generated, it will be released to train DNNs. Accordingly, in the inference

process, the attacked model behaves normally on predicting benign samples while its predictions will

be maliciously and constantly changed to the target label whenever poisoned images appear.

3.2 Problem Formulation

As described in previous sections, DNNs trained on the poisoned dataset will have distinctive

behaviors while behaving normally in predicting benign images. As such, the poison-only backdoor

attacks can be used to watermark (open-sourced) datasets for their copyright protection. However,

this method introduces new security threats in the model since the backdoor adversaries can determine

model predictions of malicious samples, due to the targeted nature of existing backdoor watermarks.

Motivated by this understanding, we explore untargeted backdoor watermark (UBW) in this paper.

Our Watermark’s Goals.

The UBW has three main goals, including

effectiveness,

stealthiness,

and

dispersibility. Speciﬁcally, the effectiveness requires that the watermarked DNNs will

misclassify poisoned images; The stealthiness needs that dataset users can not identify the watermark;

The dispersibility (denoted in Deﬁnition 1) ensures dispersible predictions of poisoned images.

Deﬁnition 1

(Averaged Prediction Dispersibility)

Let

D={(xi, yi)}N

i=1

indicates the dataset where

yi∈ Y ={1, . . . , K}

and

C:X → Y

is a classiﬁer. Let

P(j)

is the probability vector of model

predictions on samples having the ground-truth label j, where the i-th element of P(j)is

P(j)

i,PN

k=1 I{C(xk) = i} · I{yk=j}

k=1 I{yk=j}.(1)

The averaged prediction dispersibility Dpis deﬁned as

Dp,1

j=1

i=1

I{yi=j} · HP(j),(2)

where H(·)denotes the entropy [43].

In general,

measures how dispersible the predictions of different images having the same label.

The larger the Dp, the harder that the adversaries can deterministically manipulate the predictions.

3.3 Untargeted Backdoor Watermark with Poisoned Labels (UBW-P)

Arguably, the most straightforward strategy to fulﬁll prediction dispersibility is to make the predictions

of poisoned images as the uniform probability vector. Speciﬁcally, we propose to randomly ‘shufﬂe’

the label of poisoned training samples when making the poisoned dataset. This attack is dubbed

untargeted backdoor watermark with poisoned labels (UBW-P) in this paper.

Speciﬁcally, similar to the existing targeted backdoor watermarks, our UBW-P ﬁrst randomly

select a subset

from the benign dataset

to make its modiﬁed version

Dm=

{(x0, y0)|x0=G(x;θ), y0∼[1,· · · , K],(x, y)∈ Ds}

, where ‘

y0∼[1,· · · , K]

’ denotes sampling

from the list

[1,· · · , K]

with equal probability and

is an adversary-speciﬁed poisoned image

generator. The modiﬁed subset

associated with the remaining benign samples

D\Ds

will then be

released to train the model f(·;w)by

min

(x,y)∈Dm∪(D\Ds)

L(f(x;w), y),(3)

where Lis the loss function (e.g., cross-entropy [43]).

In the inference process, for any testing sample

(ˆ

x,ˆy)/∈ D

, the adversary can activate the hidden

backdoor contained in attacked DNNs with poisoned image G(ˆ

x), based on the generator G.

3.4 Untargeted Backdoor Watermark with Clean Labels (UBW-C)

As we will demonstrate in Section 5, the aforementioned heuristic UBW-P can reach promising results.

However, it is not stealthy enough even though the poisoning rate can be small, since UBW-P is still

with poisoned labels. Dataset users may identify the watermark by examining the image-label relation

when they catch the poisoned samples. In this section, we discuss how to design the untargeted

backdoor watermark with clean labels (UBW-C), based on the bi-level optimization [44].

To formulate UBW-C as a bi-level optimization, we need to optimize the prediction dispersibility.

However, it is non-differentiable and therefore cannot be optimized directly. In this paper, we

introduce two differentiable surrogate dispersibilities to alleviate this problem, as follows:

Deﬁnition 2

(Averaged Sample-wise and Class-wise Dispersibility)

Let

D={(xi, yi)}N

i=1

indicates

the dataset where

yi∈ Y ={1, . . . , K}

, the averaged sample-wise dispersibility of predictions given

by the DNN f(·)(over dataset D) is deﬁned as

Ds,1

i=1

H(f(xi)) ,(4)

while the class-wise dispersibility is deﬁned as

Dc,1

j=1

i=1

I{yi=j} · H PN

k=1 f(xk)·I{yk=j}

k=1 I{yk=j}!.(5)

In general, the averaged sample-wise dispersibility describes the average dispersion of predicted

probability vectors for all samples, while the averaged class-wise dispersibility depicts the average

degree of the dispersion of the average prediction of samples in each class. Maximizing them will

have similar effects in optimizing the prediction dispersibility Dp.

In particular, the main difference of UBW-C compared with UBW-P and existing targeted backdoor

watermarks lies in the generation of the modiﬁed subset

. Speciﬁcally, in UBW-C, we do not

modify the labels of all poisoned samples,

i.e.

Dm={(x0, y)|x0=G(x;θ),(x, y)∈ Ds}

. Before

we reach the technical details of our UBW-C, we ﬁrst present the necessary lemma and theorem.

Lemma 1.

The averaged class-wise dispersibility is always greater than the averaged sample-wise

dispersibility divided by N,i.e.,Dc>1

N·Ds.

Theorem 1.

Let

f(·;w)

denotes the DNN with parameter

G(·;θ)

is the poisoned image generator

with parameter θ, and D={(xi, yi)}N

i=1 is a given dataset with Kdifferent classes, we have

max

i=1

H(f(G(xi;θ); w)) < N·max

j=1

i=1

I{yi=j}·H PN

i=1 f(G(xi;θ); w)·I{yi=j}

i=1 I{yi=j}!.

Theorem 1 implies that we can optimize the averaged sample-wise dispersibility

and the class-wise

dispersibility

simultaneously by only maximizing

. It motivates us to generate the modiﬁed

subset Dmin our UBW-C (via optimizing generator G) as follows:

max

θX

(x,y)∈Ds

[L(f(G(x;θ); w∗), y) + λ·H(f(G(x;θ); w∗))] ,(6)

s.t. w∗= arg min

(x,y)∈Dp

L(f(x;w), y),(7)

where λis a non-negative trade-off hyper-parameter.

In general, the aforementioned process is a standard bi-level optimization, which can be effectively

and efﬁciently solved by alternatively optimizing the lower-level and upper-level sub-problems [

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UntargetedBackdoorWatermark:TowardsHarmlessandStealthyDatasetCopyrightProtectionYimingLi1;,YangBai2;,YongJiang1,YongYang3,Shu-TaoXia1,BoLi41TsinghuaShenzhenInternationalGraduateSchool,TsinghuaUniversity,China2TencentSecurityZhuqueLab,China3TencentSecurityPlatformDepartment,China4TheDepartmentofCom...

展开>> 收起<<

Untargeted Backdoor Watermark Towards Harmless and Stealthy Dataset Copyright Protection Yiming Li1 Yang Bai2 Yong Jiang1 Yong Yang3 Shu-Tao Xia1 Bo Li4.pdf

共26页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Untargeted Backdoor Watermark Towards Harmless and Stealthy Dataset Copyright Protection Yiming Li1 Yang Bai2 Yong Jiang1 Yong Yang3 Shu-Tao Xia1 Bo Li4

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: