Watermarking for Out-of-distribution Detection Qizhou Wang1Feng Liu2Yonggang Zhang1Jing Zhang3 Chen Gong45Tongliang Liu6yBo Han1y

2025-05-06 0 0 5.31MB 24 页 10玖币
侵权投诉
Watermarking for Out-of-distribution Detection
Qizhou Wang1Feng Liu2Yonggang Zhang1Jing Zhang3
Chen Gong4,5Tongliang Liu6Bo Han1
1Department of Computer Science, Hong Kong Baptist University
2School of Mathematics and Statistics, The University of Melbourne
3School of Computer Science, The University of Sydney
4PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of MoE
5Jiangsu Key Lab of Image and Video Understanding for Social Security,
School of Computer Science and Engineering, Nanjing University of Science and Technology
6TML Lab, The University of Sydney
{csqzwang, csygzhang, bhanml}@comp.hkbu.edu.hk
fengliu.ml@gmail.com chen.gong@njust.edu.cn
{jing.zhang1, tongliang.liu}@sydney.edu.au
Abstract
Out-of-distribution (OOD) detection aims to identify OOD data based on represen-
tations extracted from well-trained deep models. However, existing methods largely
ignore the reprogramming property of deep models and thus may not fully unleash
their intrinsic strength: without modifying parameters of a well-trained deep model,
we can reprogram this model for a new purpose via data-level manipulation (e.g.,
adding a specific feature perturbation to the data). This property motivates us to
reprogram a classification model to excel at OOD detection (a new task), and thus
we propose a general methodology named watermarking in this paper. Specifically,
we learn a unified pattern that is superimposed onto features of original data, and
the model’s detection capability is largely boosted after watermarking. Extensive
experiments verify the effectiveness of watermarking, demonstrating the signifi-
cance of the reprogramming property of deep models in OOD detection. The code
is publicly available at: github.com/qizhouwang/watermarking.
1 Introduction
Deep learning systems in an open world often encounter out-of-distribution (OOD) inputs whose
label spaces are disjoint with that of training data, known as in-distribution (ID) data. For safety-
critical applications, deep models should make reliable predictions for ID data, meanwhile detecting
OOD data and avoiding making predictions for the detected ones. This leads to the OOD detection
task [1,2,3,4], which has attracted intensive attention in the real world.
Identifying OOD data remains non-trivial since deep models can be overconfident with them [
5
]. As
a promising technique, the classification-based OOD detection [
6
] relies on various scoring functions
derived by classification models well trained with ID data (i.e., well-trained models), taking those
inputs with small scores as OOD cases. In general, the scoring functions can be defined by logit
outputs [
7
,
8
], gradients [
9
], and embedding features [
1
,
10
]. Without interfering with the well-trained
models or requiring extra computation, they exploit the inherent capability of models learned from
only ID data. In general, these advantages can be critical in reality, where the cost of re-training is
prohibitively high and the acquisition of true OOD data is very difficult [6].
Equal contributions.
Correspondence to Bo Han (bhanml@comp.hkbu.edu.hk) and Tongliang Liu (tongliang.liu@sydney.edu.au).
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.15198v1 [cs.LG] 27 Oct 2022
Although promising progress has been achieved, previous works largely ignore the reprogramming
property [
11
] of deep models: a well-trained model can be repurposed for a new task by a proper
transformation of original inputs (e.g., a universal feature perturbation), without modifying any model
parameter. For example, a model pre-trained on ImageNet [
12
] dataset can be reprogrammed for
classifying biomedical images [
13
]. This property indicates the possibility of making a well-trained
model adapt for effective OOD detection, motivating us to make the first attempt to investigate if the
reprogramming property of deep models can help to address OOD detection, i.e., can we reprogram
well-trained deep models for OOD detection (a new task)?
In this paper, we propose a novel method,
watermarking
, to reprogram a well-trained model by
adding a watermark to original inputs, making the model can help detect OOD data well. The
watermark has the same shape with original inputs, which is a static pattern that can be added for
test-time inputs (cf., Figure 1). The pre-defined scoring strategy (e.g., the free energy scoring [
8
])
is expected to be enhanced, with an enlarged gap of OOD scores between the watermarked ID and
OOD data (cf., Figure 2).
Figure 1: Watermarking on CIFAR-
10
[
14
] with
free energy scoring [
8
]. The left figure is the
learned watermark; the middle figure is an original
input; the right figure is the watermarked result.
It is non-trivial to find the proper watermark due
to our lack of knowledge about unseen OOD
data in advance. To address the issue, we pro-
pose a learning framework for effective water-
marking. The insight is to make a well-trained
model produce high scores for watermarked ID
inputs meanwhile regularize the watermark such
that the model will return low confidence with-
out perceiving ID pattern. In this case, the model
will have a relatively high score for a water-
marked ID input, while the score remains low
for OOD data (cf., Figure 2). The reason is that
the model encounters a watermarked input but
not seeing any ID pattern. In our realization, we
adopt several representative scoring strategies, devising specified learning objectives and proposing a
reliable optimization algorithm to learn an effective watermark.
To understand our watermarking, Figure 1depicts the watermark learned on CIFAR-
10
[
14
] dataset,
with the free energy scoring [
8
]. As we can see, the centre area of the learned watermark largely
preserves the original input pattern, containing the semantic message that guides the detection
primitively. By contrast, the edge area of the original input is superimposed by the specific pattern
of the watermark, which may encode the knowledge once hidden by the model in boosting OOD
detection. Overall, watermarking can preserve the meaningful pattern of original inputs in detection,
with the improved detection capability that is learned from the trained model and ID data.
Figure 2demonstrates the effect of our learned watermark, which is an example with the free energy
scoring. After watermarking, the scoring distributions are much concentrated, and the gap between
ID (i.e., CIFAR-
10
) and OOD (i.e., SVHN [
15
] and Texture [
16
] datasets) data is enlarged notably.
We conduct extensive experiments for a wide range of OOD evaluation benchmarks , and the results
verify the effectiveness of our proposal.
The success of watermarking takes roots in the following aspects: (1) a well-trained model on
classification has the potential to be reprogrammed for OOD detection since they are two related
tasks; (2) reprogramming has been widely studied, ranging from image classification to time series
analysis [
12
,
13
], making our proposal general across various domains; and (3) OOD detection suffers
from the lack of knowledge about the real-world OOD distributions. Fortunately, with only data-level
manipulation in low dimensions, watermarking can largely mitigate this issue of limited data. Overall,
this data-level manipulation is orthogonal to existing methods, and thus provides a new road in OOD
detection and can inspire more ways to design OOD detection methods in the future.
2 Related Works
To begin with, we briefly review the related works in OOD detection and model reprogramming.
Please refer to Appendix Afor the detailed discussion.
2
Density
Density
Free Energy Score Free Energy Score
iSUN Places365 Texture
SVHN LSUN-C LSUN
𝜎 𝜎 𝜎Free Energy Score
Density
(b) w/ watermarking
Free Energy Score
Density
(a) w/o watermarking
(a) before watermarking
Density
Density
Free Energy Score Free Energy Score
iSUN Places365 Texture
SVHN LSUN-C LSUN
𝜎 𝜎 𝜎Free Energy Score
Density
(b) w/ watermarking
Free Energy Score
Density
(a) w/o watermarking
(b) after watermarking
Figure 2: Experimental results before (a) /after (b) watermarking with CIFAR-
10
being the ID
dataset, SVHN and Texture being the OOD datasets. Data with large (small) OOD scores should be
taken as ID (OOD) data, and a larger distribution gap of scoring between ID and OOD data ensures a
better detection performance. After watermarking, the gap between ID and OOD data is enlarged,
demonstrating the improved capability of the original model in OOD detection. The horizontal axes
are ignored for illustration, please refer to Figure 4for a completed version.
OOD Detection
discerns ID and OOD data by their gaps regarding the specified metrics/scores,
and existing methods can be roughly divided into three categories [
6
], the classification-based
methods, the density-based methods, and the distance-based methods. Specifically, the classification-
based methods [
7
,
9
,
8
,
10
] use representations extracted from the well-trained models in OOD
scoring; and the distance-based methods [
17
,
18
,
19
] measure the distance of inputs from class
centers in the embedding space. Moreover, the density-based methods estimate input density with
probabilistic models [
1
,
3
,
20
], identifying those OOD data with small likelihood values. Distance-
based and density-based methods may suffer from complexity in computation [
1
] and difficulty in
optimization [
21
]. Therefore, more researchers focus on developing classification-based methods and
have made big progress on benchmark datasets recently [9,8].
Model Reprogramming
repurposes well-trained models for new tasks with only data-level manip-
ulation [
11
], indicating that deep models are competent for different jobs without changing any
model parameter. In previous works, the data-level manipulation typically refers to a static padding
pattern (different from our proposal) learned for the target task, which is added to the test-time data.
The effectiveness of the model reprogramming is verified across image classification [
11
,
13
] and
time-series analysis [
22
,
23
]. In this paper, we use the reprogramming property of deep models for
effective OOD detection, which has been overseen previously.
3 Preliminary
Let
X Rd
be the input space and
Y={1, . . . , c}
be the label space. We consider the ID distribution
DID
X,Y
defined over
X × Y
, the training sample
Sn={(xi, yi)}n
i=1
of size
n
independently drawn
from DID
X,Y, and a classification model f:X Rc(with logit outputs) well-trained on Sn.
Based on the model
f(·)
, the goal of the classification-based OOD detection is to design a detection
model
g:X → {0,1}
that can distinguish test-time inputs with the ID distribution
DID
X
from those
with the OOD distribution
DOOD
X
. In general,
DOOD
X
is defined as an irrelevant distribution of which
the label set has no intersection with
Y
, and thus should not be predicted by
f(·)
. Overall, with
0
denoting the OOD case and 1the ID case, the detection model g(·)is defined as
g(x;τ) = 1s(x;f)τ
0s(x;f)< τ ,(1)
where
τR
is a threshold and
s:X R
is the scoring function defined by
f(·)
whose parameters
are fixed. Here, we focus on two representative methods in the classification-based OOD detection,
namely, the softmax scoring and the free energy scoring.
3
Softmax Scoring Function
[
7
] uses the maximum softmax prediction in OOD detection, of which
the scoring function sSM(·)is given by
sSM(x;f) = max
ksoftmaxkf(x),(2)
where
softmaxk(·)
denotes the
k
-th element of the softmax outputs. In general, with a large (small)
sSM(x;f), the detection model will take the input xas an ID (OOD) case.
Free Energy Scoring Function
[
8
] adopts the free energy function for scoring, defined by the logit
outputs with the logsumexp operation, namely,
sFE(x;f) = log X
k
exp fk(x)/T, (3)
where
T > 0
is the temperature parameter, fixed to
1
[
8
]. It aligns with the density of inputs to some
extent, and thus is less susceptible to the overconfidence issue than the softmax scoring [8].
4 Watermarking Strategy
This section introduces the key concepts of watermarking for classification-based OOD detection.
Definition.
A watermark
wRd
is a unified pattern with the exact shape as original inputs. It
is added to test-time inputs statically, and we refer
w+x
awatermarked input for
x∈ X
. In
expectation, regarding the specified scoring function
s(·)
, our watermarking should make the model
excel at OOD detection for watermarked data.
Learning Strategy.
Given the scoring function
s(·)
, it is challenging to devise the exact watermark
pattern by predefined rules. Therefore, for the proper watermarks in OOD detection, we need to
devise learning objectives with respect to watermarks, which consider both ID and OOD data.
We generally have no information about the OOD distribution
DOOD
X,Y
, while we still want the model
excels in discerning ID and OOD data from scoring. For this challenge, we make the model produce
high scores if watermarked ID data are observed; meanwhile, we regularize the watermark such that
the model will return low scores when ID patterns do not exist. From the lens of our model, the
scores should remain low if a watermarked OOD input is given since the watermark is not trained to
perceive OOD data, of which the patterns are very different from the ID data.
Benefits of Watermarking.
Watermarking directly reprograms the model to make an adaptation
to our specified task of scoring, such that the detection capability of the original model is largely
improved. By contrast, previous methods typically adapt to their specified tasks by only the threshold
τ
as in Eq.
(1)
. However, it requires the trade-off between false positive (ID) and false negative
(OOD) rates when densities of scoring are non-separable (cf., Figure 2(a)).
Further, watermarking enjoys the benefits of previous classification-based methods in that we do not
modify the original training procedure in classification, making our proposal easy to be deployed in
real-world systems. Although the watermark also should be learned, the parameter space is in low
dimension, and the learning procedure could be conducted post-hoc after the systems are deployed.
Comparison with Existing Works
. In OOD detection, this paper is a first attempt in using the
reprogramming property of deep models, leading to an effective learning framework named water-
marking. At first glance, our methodology is seemingly similar to ODIN [
24
], which also conducts
data-level perturbation for OOD detection. However, their instance-specified perturbation relies on
extra backward-forward iterations during the test, which is not required in our method. Further, ODIN
is designed for the softmax scoring, but our proposal is much general in OOD detection.
5 Realizations of Watermarking Strategy
In this section, we discuss our learning framework of watermarking in detail.
Learning Objectives.
As mentioned above, we need to consider the ID and OOD situations sep-
arately, with the associated loss functions denoted by
`ID(·)
and
`OOD(·)
. For the ID case, the ID
training data are required, where we make the high scores for their watermarked counterparts. By
contrast, since we typically lack knowledge about the test-time OOD data, only the watermark is
4
used here, and we expect the model to produce the score as low as possible when only perceiving the
watermark.
Further, since only the watermark is adopted for training in the OOD case, the learned watermark is
pretty sensitive regarding the detection model, i.e., the model may return different predictions when
facing small perturbations. Thus, the watermarked OOD inputs may not guarantee the low scores.
To this end, the watermark is further perturbed during training. Here we adopt the Gaussian noise,
leading to the perturbed watermark of the form
+w
with
N (0, σ1Id)
the independent and
identically distributed (i.i.d.) Gaussian noise of
d
-dimension (the mean
0
and the standard deviation
σ1Id). Then, the overall risk can be written as,
Ln(w) = X
n
`ID(xi+w, yi;f)
| {z }
LID
n(w)
+βX
n
`OOD(j+w;f)
| {z }
LOOD
n(w)
,(4)
with
β0
the trade-off parameter,
LID
n(w)
the risk for ID data, and
LOOD
n(w)
the risk for OOD data.
Optimization.
To find the proper watermark, we use the first-order gradient update to iteratively
update watermark’s elements. However, data-level optimization remains difficult in deep learning, of
which the results may get stuck at suboptimal points [
25
]. A common approach is to use the signum
of first-order gradients, guiding the updating rule of the current watermark via
wwαsign(wLn(w)),(5)
where sign(·)denotes the signum function and α > 0is the step size [26].
Further, for generality and insensibility, we prefer the solution that lies in the neighbourhood having
uniformly low loss, i.e., with a smooth loss landscape [
27
]. Therefore, we adopt the sharpness-aware
minimization (SAM) [
28
], an effective optimization framework in the seek of both the low loss value
and the smooth loss landscape. Specifically, given the original risk Ln(w), the SAM problem is:
LSAM
n(w) = max
||κ||2ρ[Ln(w+κ)− Ln(w)]
| {z }
sharpness
+Ln(w) = max
||κ||2ρLn(w+κ)(6)
where
ρ0
is a constraint. For efficiency, the SAM makes the first-order Taylor expansion w.r.t.
κ
around 0, obtaining the approximated solution of the form 3:
κ=ρsign(wLn(w)) |∇wLn(w)|q1
(||∇wLn(w)||q
q)1/p ,(7)
where
1/p + 1/q = 1
and we set
p=q= 2
for simplicity. Therefore, the estimation form of the
SAM is written as Ln(w+κ), with corresponding updating rule of
wwαsign(wLn(w+κ)),(8)
yielding an efficient optimization algorithm that induces the effective watermark.
The Overall Algorithm.
In summary, we describe the overall learning framework. To begin with, the
watermark is initialized by the i.i.d. Gaussian noise with the
0
mean and a small standard deviation
σ2Id, and the learning procedure consists of three stages for each updating step:
Negative sampling: a set of noise data
is sampled, assuming be of the size
m
as that of the
mini-batch regarding the ID sample;
Risk calculating: the risk for ID and OOD data are computed, and the overall risk is given
by their sum with a trade-off parameter βas in Eq. (4);
Watermark updating: the first-order gradient guides the pixel-level update of the watermark,
using the signum of gradients and the SAM to make a reliable update as in Eq. (8).
The learned watermark is added to test-time inputs for OOD detection, and the detection model with
the pre-defined scoring function is then deployed. Appendix Bsummarizes our learning framework
of watermarking. Moreover, two specifications of watermarking are discussed in the following.
3With an abuse of notation, we denote the estimated solution in the SAM as κfor simplicity.
5
摘要:

WatermarkingforOut-of-distributionDetectionQizhouWang1FengLiu2YonggangZhang1JingZhang3ChenGong4;5TongliangLiu6yBoHan1y1DepartmentofComputerScience,HongKongBaptistUniversity2SchoolofMathematicsandStatistics,TheUniversityofMelbourne3SchoolofComputerScience,TheUniversityofSydney4PCALab,KeyLabofIntell...

展开>> 收起<<
Watermarking for Out-of-distribution Detection Qizhou Wang1Feng Liu2Yonggang Zhang1Jing Zhang3 Chen Gong45Tongliang Liu6yBo Han1y.pdf

共24页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:24 页 大小:5.31MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 24
客服
关注