Watermarking for Out-of-distribution Detection Qizhou Wang1Feng Liu2Yonggang Zhang1Jing Zhang3 Chen Gong45Tongliang Liu6yBo Han1y

2025-05-06 1 0 5.31MB 24 页 10玖币

侵权投诉

Watermarking for Out-of-distribution Detection

Qizhou Wang1∗Feng Liu2∗Yonggang Zhang1Jing Zhang3

Chen Gong4,5Tongliang Liu6†Bo Han1†

1Department of Computer Science, Hong Kong Baptist University

2School of Mathematics and Statistics, The University of Melbourne

3School of Computer Science, The University of Sydney

4PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of MoE

5Jiangsu Key Lab of Image and Video Understanding for Social Security,

School of Computer Science and Engineering, Nanjing University of Science and Technology

6TML Lab, The University of Sydney

{csqzwang, csygzhang, bhanml}@comp.hkbu.edu.hk

fengliu.ml@gmail.com chen.gong@njust.edu.cn

{jing.zhang1, tongliang.liu}@sydney.edu.au

Abstract

Out-of-distribution (OOD) detection aims to identify OOD data based on represen-

tations extracted from well-trained deep models. However, existing methods largely

ignore the reprogramming property of deep models and thus may not fully unleash

their intrinsic strength: without modifying parameters of a well-trained deep model,

we can reprogram this model for a new purpose via data-level manipulation (e.g.,

adding a speciﬁc feature perturbation to the data). This property motivates us to

reprogram a classiﬁcation model to excel at OOD detection (a new task), and thus

we propose a general methodology named watermarking in this paper. Speciﬁcally,

we learn a uniﬁed pattern that is superimposed onto features of original data, and

the model’s detection capability is largely boosted after watermarking. Extensive

experiments verify the effectiveness of watermarking, demonstrating the signiﬁ-

cance of the reprogramming property of deep models in OOD detection. The code

is publicly available at: github.com/qizhouwang/watermarking.

1 Introduction

Deep learning systems in an open world often encounter out-of-distribution (OOD) inputs whose

label spaces are disjoint with that of training data, known as in-distribution (ID) data. For safety-

critical applications, deep models should make reliable predictions for ID data, meanwhile detecting

OOD data and avoiding making predictions for the detected ones. This leads to the OOD detection

task [1,2,3,4], which has attracted intensive attention in the real world.

Identifying OOD data remains non-trivial since deep models can be overconﬁdent with them [

]. As

a promising technique, the classiﬁcation-based OOD detection [

] relies on various scoring functions

derived by classiﬁcation models well trained with ID data (i.e., well-trained models), taking those

inputs with small scores as OOD cases. In general, the scoring functions can be deﬁned by logit

outputs [

], gradients [

], and embedding features [

]. Without interfering with the well-trained

models or requiring extra computation, they exploit the inherent capability of models learned from

only ID data. In general, these advantages can be critical in reality, where the cost of re-training is

prohibitively high and the acquisition of true OOD data is very difﬁcult [6].

∗Equal contributions.

†

Correspondence to Bo Han (bhanml@comp.hkbu.edu.hk) and Tongliang Liu (tongliang.liu@sydney.edu.au).

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.15198v1 [cs.LG] 27 Oct 2022

Although promising progress has been achieved, previous works largely ignore the reprogramming

property [

] of deep models: a well-trained model can be repurposed for a new task by a proper

transformation of original inputs (e.g., a universal feature perturbation), without modifying any model

parameter. For example, a model pre-trained on ImageNet [

] dataset can be reprogrammed for

classifying biomedical images [

]. This property indicates the possibility of making a well-trained

model adapt for effective OOD detection, motivating us to make the ﬁrst attempt to investigate if the

reprogramming property of deep models can help to address OOD detection, i.e., can we reprogram

well-trained deep models for OOD detection (a new task)?

In this paper, we propose a novel method,

watermarking

, to reprogram a well-trained model by

adding a watermark to original inputs, making the model can help detect OOD data well. The

watermark has the same shape with original inputs, which is a static pattern that can be added for

test-time inputs (cf., Figure 1). The pre-deﬁned scoring strategy (e.g., the free energy scoring [

])

is expected to be enhanced, with an enlarged gap of OOD scores between the watermarked ID and

OOD data (cf., Figure 2).

Figure 1: Watermarking on CIFAR-

[

] with

free energy scoring [

]. The left ﬁgure is the

learned watermark; the middle ﬁgure is an original

input; the right ﬁgure is the watermarked result.

It is non-trivial to ﬁnd the proper watermark due

to our lack of knowledge about unseen OOD

data in advance. To address the issue, we pro-

pose a learning framework for effective water-

marking. The insight is to make a well-trained

model produce high scores for watermarked ID

inputs meanwhile regularize the watermark such

that the model will return low conﬁdence with-

out perceiving ID pattern. In this case, the model

will have a relatively high score for a water-

marked ID input, while the score remains low

for OOD data (cf., Figure 2). The reason is that

the model encounters a watermarked input but

not seeing any ID pattern. In our realization, we

adopt several representative scoring strategies, devising speciﬁed learning objectives and proposing a

reliable optimization algorithm to learn an effective watermark.

To understand our watermarking, Figure 1depicts the watermark learned on CIFAR-

[

] dataset,

with the free energy scoring [

]. As we can see, the centre area of the learned watermark largely

preserves the original input pattern, containing the semantic message that guides the detection

primitively. By contrast, the edge area of the original input is superimposed by the speciﬁc pattern

of the watermark, which may encode the knowledge once hidden by the model in boosting OOD

detection. Overall, watermarking can preserve the meaningful pattern of original inputs in detection,

with the improved detection capability that is learned from the trained model and ID data.

Figure 2demonstrates the effect of our learned watermark, which is an example with the free energy

scoring. After watermarking, the scoring distributions are much concentrated, and the gap between

ID (i.e., CIFAR-

) and OOD (i.e., SVHN [

] and Texture [

] datasets) data is enlarged notably.

We conduct extensive experiments for a wide range of OOD evaluation benchmarks , and the results

verify the effectiveness of our proposal.

The success of watermarking takes roots in the following aspects: (1) a well-trained model on

classiﬁcation has the potential to be reprogrammed for OOD detection since they are two related

tasks; (2) reprogramming has been widely studied, ranging from image classiﬁcation to time series

analysis [

], making our proposal general across various domains; and (3) OOD detection suffers

from the lack of knowledge about the real-world OOD distributions. Fortunately, with only data-level

manipulation in low dimensions, watermarking can largely mitigate this issue of limited data. Overall,

this data-level manipulation is orthogonal to existing methods, and thus provides a new road in OOD

detection and can inspire more ways to design OOD detection methods in the future.

2 Related Works

To begin with, we brieﬂy review the related works in OOD detection and model reprogramming.

Please refer to Appendix Afor the detailed discussion.

Density

Free Energy Score Free Energy Score

iSUN Places365 Texture

SVHN LSUN-C LSUN

𝜎 𝜎 𝜎Free Energy Score

Density

(b) w/ watermarking

Free Energy Score

Density

(a) w/o watermarking

(a) before watermarking

Density

Free Energy Score Free Energy Score

iSUN Places365 Texture

SVHN LSUN-C LSUN

𝜎 𝜎 𝜎Free Energy Score

Density

(b) w/ watermarking

Free Energy Score

Density

(a) w/o watermarking

(b) after watermarking

Figure 2: Experimental results before (a) /after (b) watermarking with CIFAR-

being the ID

dataset, SVHN and Texture being the OOD datasets. Data with large (small) OOD scores should be

taken as ID (OOD) data, and a larger distribution gap of scoring between ID and OOD data ensures a

better detection performance. After watermarking, the gap between ID and OOD data is enlarged,

demonstrating the improved capability of the original model in OOD detection. The horizontal axes

are ignored for illustration, please refer to Figure 4for a completed version.

OOD Detection

discerns ID and OOD data by their gaps regarding the speciﬁed metrics/scores,

and existing methods can be roughly divided into three categories [

], the classiﬁcation-based

methods, the density-based methods, and the distance-based methods. Speciﬁcally, the classiﬁcation-

based methods [

] use representations extracted from the well-trained models in OOD

scoring; and the distance-based methods [

] measure the distance of inputs from class

centers in the embedding space. Moreover, the density-based methods estimate input density with

probabilistic models [

], identifying those OOD data with small likelihood values. Distance-

based and density-based methods may suffer from complexity in computation [

] and difﬁculty in

optimization [

]. Therefore, more researchers focus on developing classiﬁcation-based methods and

have made big progress on benchmark datasets recently [9,8].

Model Reprogramming

repurposes well-trained models for new tasks with only data-level manip-

ulation [

], indicating that deep models are competent for different jobs without changing any

model parameter. In previous works, the data-level manipulation typically refers to a static padding

pattern (different from our proposal) learned for the target task, which is added to the test-time data.

The effectiveness of the model reprogramming is veriﬁed across image classiﬁcation [

] and

time-series analysis [

]. In this paper, we use the reprogramming property of deep models for

effective OOD detection, which has been overseen previously.

3 Preliminary

Let

X ⊂ Rd

be the input space and

Y={1, . . . , c}

be the label space. We consider the ID distribution

DID

X,Y

deﬁned over

X × Y

, the training sample

Sn={(xi, yi)}n

i=1

of size

independently drawn

from DID

X,Y, and a classiﬁcation model f:X → Rc(with logit outputs) well-trained on Sn.

Based on the model

f(·)

, the goal of the classiﬁcation-based OOD detection is to design a detection

model

g:X → {0,1}

that can distinguish test-time inputs with the ID distribution

DID

from those

with the OOD distribution

DOOD

. In general,

DOOD

is deﬁned as an irrelevant distribution of which

the label set has no intersection with

, and thus should not be predicted by

f(·)

. Overall, with

denoting the OOD case and 1the ID case, the detection model g(·)is deﬁned as

g(x;τ) = 1s(x;f)≥τ

0s(x;f)< τ ,(1)

where

τ∈R

is a threshold and

s:X → R

is the scoring function deﬁned by

f(·)

whose parameters

are ﬁxed. Here, we focus on two representative methods in the classiﬁcation-based OOD detection,

namely, the softmax scoring and the free energy scoring.

Softmax Scoring Function

[

] uses the maximum softmax prediction in OOD detection, of which

the scoring function sSM(·)is given by

sSM(x;f) = max

ksoftmaxkf(x),(2)

where

softmaxk(·)

denotes the

-th element of the softmax outputs. In general, with a large (small)

sSM(x;f), the detection model will take the input xas an ID (OOD) case.

Free Energy Scoring Function

[

] adopts the free energy function for scoring, deﬁned by the logit

outputs with the logsumexp operation, namely,

sFE(x;f) = log X

exp fk(x)/T, (3)

where

T > 0

is the temperature parameter, ﬁxed to

[

]. It aligns with the density of inputs to some

extent, and thus is less susceptible to the overconﬁdence issue than the softmax scoring [8].

4 Watermarking Strategy

This section introduces the key concepts of watermarking for classiﬁcation-based OOD detection.

Deﬁnition.

A watermark

w∈Rd

is a uniﬁed pattern with the exact shape as original inputs. It

is added to test-time inputs statically, and we refer

w+x

awatermarked input for

∀x∈ X

. In

expectation, regarding the speciﬁed scoring function

s(·)

, our watermarking should make the model

excel at OOD detection for watermarked data.

Learning Strategy.

Given the scoring function

s(·)

, it is challenging to devise the exact watermark

pattern by predeﬁned rules. Therefore, for the proper watermarks in OOD detection, we need to

devise learning objectives with respect to watermarks, which consider both ID and OOD data.

We generally have no information about the OOD distribution

DOOD

X,Y

, while we still want the model

excels in discerning ID and OOD data from scoring. For this challenge, we make the model produce

high scores if watermarked ID data are observed; meanwhile, we regularize the watermark such that

the model will return low scores when ID patterns do not exist. From the lens of our model, the

scores should remain low if a watermarked OOD input is given since the watermark is not trained to

perceive OOD data, of which the patterns are very different from the ID data.

Beneﬁts of Watermarking.

Watermarking directly reprograms the model to make an adaptation

to our speciﬁed task of scoring, such that the detection capability of the original model is largely

improved. By contrast, previous methods typically adapt to their speciﬁed tasks by only the threshold

as in Eq.

(1)

. However, it requires the trade-off between false positive (ID) and false negative

(OOD) rates when densities of scoring are non-separable (cf., Figure 2(a)).

Further, watermarking enjoys the beneﬁts of previous classiﬁcation-based methods in that we do not

modify the original training procedure in classiﬁcation, making our proposal easy to be deployed in

real-world systems. Although the watermark also should be learned, the parameter space is in low

dimension, and the learning procedure could be conducted post-hoc after the systems are deployed.

Comparison with Existing Works

. In OOD detection, this paper is a ﬁrst attempt in using the

reprogramming property of deep models, leading to an effective learning framework named water-

marking. At ﬁrst glance, our methodology is seemingly similar to ODIN [

], which also conducts

data-level perturbation for OOD detection. However, their instance-speciﬁed perturbation relies on

extra backward-forward iterations during the test, which is not required in our method. Further, ODIN

is designed for the softmax scoring, but our proposal is much general in OOD detection.

5 Realizations of Watermarking Strategy

In this section, we discuss our learning framework of watermarking in detail.

Learning Objectives.

As mentioned above, we need to consider the ID and OOD situations sep-

arately, with the associated loss functions denoted by

`ID(·)

and

`OOD(·)

. For the ID case, the ID

training data are required, where we make the high scores for their watermarked counterparts. By

contrast, since we typically lack knowledge about the test-time OOD data, only the watermark is

used here, and we expect the model to produce the score as low as possible when only perceiving the

watermark.

Further, since only the watermark is adopted for training in the OOD case, the learned watermark is

pretty sensitive regarding the detection model, i.e., the model may return different predictions when

facing small perturbations. Thus, the watermarked OOD inputs may not guarantee the low scores.

To this end, the watermark is further perturbed during training. Here we adopt the Gaussian noise,

leading to the perturbed watermark of the form

+w

with

∼ N (0, σ1Id)

the independent and

identically distributed (i.i.d.) Gaussian noise of

-dimension (the mean

and the standard deviation

σ1Id). Then, the overall risk can be written as,

Ln(w) = X

`ID(xi+w, yi;f)

| {z }

LID

n(w)

+βX

`OOD(j+w;f)

| {z }

LOOD

n(w)

,(4)

with

β≥0

the trade-off parameter,

LID

n(w)

the risk for ID data, and

LOOD

n(w)

the risk for OOD data.

Optimization.

To ﬁnd the proper watermark, we use the ﬁrst-order gradient update to iteratively

update watermark’s elements. However, data-level optimization remains difﬁcult in deep learning, of

which the results may get stuck at suboptimal points [

]. A common approach is to use the signum

of ﬁrst-order gradients, guiding the updating rule of the current watermark via

w←w−αsign(∇wLn(w)),(5)

where sign(·)denotes the signum function and α > 0is the step size [26].

Further, for generality and insensibility, we prefer the solution that lies in the neighbourhood having

uniformly low loss, i.e., with a smooth loss landscape [

]. Therefore, we adopt the sharpness-aware

minimization (SAM) [

], an effective optimization framework in the seek of both the low loss value

and the smooth loss landscape. Speciﬁcally, given the original risk Ln(w), the SAM problem is:

LSAM

n(w) = max

||κ||2≤ρ[Ln(w+κ)− Ln(w)]

| {z }

sharpness

+Ln(w) = max

||κ||2≤ρLn(w+κ)(6)

where

ρ≥0

is a constraint. For efﬁciency, the SAM makes the ﬁrst-order Taylor expansion w.r.t.

around 0, obtaining the approximated solution of the form 3:

κ=ρsign(∇wLn(w)) |∇wLn(w)|q−1

(||∇wLn(w)||q

q)1/p ,(7)

where

1/p + 1/q = 1

and we set

p=q= 2

for simplicity. Therefore, the estimation form of the

SAM is written as Ln(w+κ), with corresponding updating rule of

w←w−αsign(∇wLn(w+κ)),(8)

yielding an efﬁcient optimization algorithm that induces the effective watermark.

The Overall Algorithm.

In summary, we describe the overall learning framework. To begin with, the

watermark is initialized by the i.i.d. Gaussian noise with the

mean and a small standard deviation

σ2Id, and the learning procedure consists of three stages for each updating step:

•

Negative sampling: a set of noise data



is sampled, assuming be of the size

as that of the

mini-batch regarding the ID sample;

•

Risk calculating: the risk for ID and OOD data are computed, and the overall risk is given

by their sum with a trade-off parameter βas in Eq. (4);

•

Watermark updating: the ﬁrst-order gradient guides the pixel-level update of the watermark,

using the signum of gradients and the SAM to make a reliable update as in Eq. (8).

The learned watermark is added to test-time inputs for OOD detection, and the detection model with

the pre-deﬁned scoring function is then deployed. Appendix Bsummarizes our learning framework

of watermarking. Moreover, two speciﬁcations of watermarking are discussed in the following.

3With an abuse of notation, we denote the estimated solution in the SAM as κfor simplicity.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

WatermarkingforOut-of-distributionDetectionQizhouWang1FengLiu2YonggangZhang1JingZhang3ChenGong4;5TongliangLiu6yBoHan1y1DepartmentofComputerScience,HongKongBaptistUniversity2SchoolofMathematicsandStatistics,TheUniversityofMelbourne3SchoolofComputerScience,TheUniversityofSydney4PCALab,KeyLabofIntell...

展开>> 收起<<

Watermarking for Out-of-distribution Detection Qizhou Wang1Feng Liu2Yonggang Zhang1Jing Zhang3 Chen Gong45Tongliang Liu6yBo Han1y.pdf

共24页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Watermarking for Out-of-distribution Detection Qizhou Wang1Feng Liu2Yonggang Zhang1Jing Zhang3 Chen Gong45Tongliang Liu6yBo Han1y

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: