Learning Sample Reweighting for Accuracy and Adversarial Robustness Chester Holtz

2025-05-02 0 0 1.54MB 18 页 10玖币
侵权投诉
Learning Sample Reweighting for
Accuracy and Adversarial Robustness
Chester Holtz
Computer Science and Engineering
University of California San Diego
La Jolla, CA 92093
chholtz@eng.ucsd.edu
Tsui-Wei Weng
Halicioˇ
glu Data Science Institute
University of California San Diego
La Jolla, CA 92093
lweng@ucsd.edu
Gal Mishne
Halicioˇ
glu Data Science Institute
University of California San Diego
La Jolla, CA 92093
gmishne@ucsd.edu
Abstract
There has been great interest in enhancing the robustness of neural network clas-
sifiers to defend against adversarial perturbations through adversarial training,
while balancing the trade-off between robust accuracy and standard accuracy. We
propose a novel adversarial training framework that learns to reweight the loss
associated with individual training samples based on a notion of class-conditioned
margin, with the goal of improving robust generalization. We formulate weighted
adversarial training as a bilevel optimization problem with the upper-level problem
corresponding to learning a robust classifier, and the lower-level problem corre-
sponding to learning a parametric function that maps from a sample’s multi-class
margin to an importance weight. Extensive experiments demonstrate that our
approach consistently improves both clean and robust accuracy compared to related
methods and state-of-the-art baselines.
1 Introduction
While neural networks have been extremely successful in tasks such as image classification and speech
recognition, recent work [
29
,
12
] has demonstrated that neural network classifiers can be arbitrarily
fooled by small, adversarially-chosen perturbations of their input. Notably, Su et al.
[28]
demonstrated
that neural network classifiers which can correctly classify “clean” images may be vulnerable to
targeted attacks, e.g., misclassify those same images when only a single pixel is changed.
Recent work has shown a common failing among techniques that uniformly encourage robustness.
In particular, there exists an intrinsic tradeoff between robustness and accuracy [
40
]. Bao et al.
[3]
investigate this tradeoff from the perspective of classification-callibrated loss theory. Rice et al.
[22]
empirically showed that during adversarial training networks often irreversibly lose robustness after
training for a short time. They dubbed this phenomenon adversarial overfitting while proposing early
stopping as a remedy. The significance of label noise and memorization in the context of adversarial
overfitting was demonstrated by Sanyal et al.
[26]
—in particular that poor training samples induce
fragility to adversarial perturbations due to the tendency of neural networks to interpolate the training
data. Methods based on weight and logit smoothing have been proposed as an alternative to early
stopping [5] as well as techniques for dataset augmentation [20, 13] and local smoothing [36, 35].
Preprint. Under review.
arXiv:2210.11513v1 [cs.LG] 20 Oct 2022
In a different approach to addressing adversarial overfitting, Geometry-Aware Instance Reweighted
Adversarial Training (GAIRAT; [
42
]), Weighted Margin-aware Minimax Risk (WMMR; [
39
]), and
Margin-Aware Instance reweighting Learning (MAIL; [
30
]) control the influence of training examples
via importance or loss weighting. Intuitively, the samples assigned a low weight correspond to samples
on which the classifier is already sufficiently robust. Generally, these methods are well-motivated—e.g.
by [
34
] who conclude that a good set of weights (large (small) weights for samples close (far) to the
decision boundary) are tied to generalization. However, existing methods rely on approximations of
the margin and employ heuristic weighting schemes that rely on careful choices of hyperparameters.
Building upon these observations, we present BiLAW (Bilevel Learnable Adversarial reWeighting),
an approach that explicitly learns a parametric function (e.g. represented by a small feed-forward
network) that assigns weights to the loss suffered by a classifier, associated with individual training
samples. The sample weights are learned as a function of the classifier multiclass margins of samples,
according to the weights’ effect on robust generalization. We employ a bi-level optimization formu-
lation [
4
] and leverage a validation set, where the upper-level objective corresponds to learning the
parameters of a robust classifier, while the lower-level objective corresponds to learning a function that
predicts sample weights that improve robustness on a validation set. Our approach alternates between
iteratively updating the parametric sample weights and updating the classifier network parameters.
Contributions
As far as we know, this is the first work to explore a learning-based approach to
sample weighting in the context of adversarial training. Prior work [
42
,
37
,
30
] only used heuristics
to estimate the weight and did not involve any learning components. Our contributions include:
1.
We propose BiLAW, a new adversarial training method based on learning sample weights as
a parametric function mapping from multi-class margins. Our method can be formulated as
a bi-level optimization problem that can be solved efficiently thanks to recent advances in
meta-learning.
2.
We motivate and extend the notion of the robust margin of a classifier at a particular sample
to the multi-class setting, and show that the magnitude of a sample’s learned weight directly
corresponds to the vulnerability of the classifier at that sample.
3.
We evaluate the performance of BiLAW on MNIST, F-MNIST, and CIFAR-10 and demon-
strate it significantly improves clean accuracy by up to
6%
and robust test accuracy by up
to
5%
compared to TRADES and other state-of-the-art sample reweighting methods on
CIFAR-10.
2 Preliminaries and Related Work
In this section, we briefly present background terminology pertaining to adversarially robust classifi-
cation, sample reweighting and bilevel optimization.
Notations
Let
f:Rd[0,1]k
be a feedforward ReLU network with
l
hidden layers and weights
θ
; for example,
f
may map from a
d
-dimensional image to a
k
-dimensional vector corresponding to
likelihoods for kclasses.
Given a training set of
m
sample-label pairs
(xi, yi)
drawn from a training data distribution
D
, we
associate a weight
wi
with each training sample. Informally, these weights characterize the effect
of the sample on the generalization of the network (i.e. samples with large weights promote robust
generalization and visa versa). Given a loss function
`:Rk×RkR
, we denote the empirical
weighted training loss suffered by a network with parameters
θ
on
m
training samples with weights
w
to be
Ltr(θ, w) = Pm
i=1 wi`(yi, f(xi;θ))
such that
wi0
and
Piwi= 1
. For brevity, we write
`i(θ) = `(yi, f(xi;θ))
. Additionally, if
w
is left unspecified,
L
corresponds to the unweighted mean
over empirical losses. Likewise, the unweighted validation loss of nsamples is denoted Lval(θ).
2.1 Robust classification and adversarial overfitting
Consider the network
f:RdRk
, where the input is
d
-dimensional and the output is a
k
-
dimensional vector of likelihoods, with
j
-th entry corresponding to the likelihood the image belongs
to the
j
-th class. The associated classification is then
c(x;θ) = arg maxj[1,k]fj(x;θ)
. In adversarial
machine learning, we are not just concerned that the classification be correct, but we also want to
2
(a) (b)
Class 2
Class
1
Class 3
Decision
boundary
Decision
boundary
Δ
margin
Figure 1: (
a
) Diagram of multiclass margin. Larger samples denote samples that should be assigned
large weight, e.g., are misclassified or close to the decision boundary. Green (red) arrows denote
entries in the multiclass margin vector for a correctly (incorrectly) classified sampled. (
b
) Sorted
logit order and frequency of adversarial classification. Number of instances where the prediction of
an adversarial sample corresponds to its
i
-th largest logit in CIFAR-10 (ignoring the
0
-th logit/sam-
ples where the prediction does not change). Colors represent the perturbation budget used during
adversarial training (i.e. degrees of robustness). Perturbations are computed using
`
-PGD with 10
iterations and a budget of 0.031.
be robust against adversarial examples, i.e. small perturbations to the input which may change the
classification to an incorrect class. We define the notion of -robustness below:
Definition 2.1
(
-robust)
.f
parameterized by
θ
is called
-robust with respect to norm
p
at
x
if the
classification is consistent for a small ball of radius around x:
c(x+δ;θ) = c(x;θ),δ:||δ||p. (1)
Note that the
-robustness of
f
at
x
is intimately related to the uniform Lipschitz smoothness of
f
around x. Recall that a function fhas finite Lipschitz constant L > 0with respect to norm || · ||, if
L0s.t. |f(x)f(x0)| ≤ L· ||xx0||,x, x0X. (2)
An immediate consequence of Eq. 1 and Eq. 2 is that if
f
is uniformly
L
-Lipschitz, then
f
is
-robust
at
x
with
=1
2L(PaPb)
where
Pa
is the likelihood of the most likely outcome, and
Pb
is the
likelihood of the second most likely outcome [
25
]. The piecewise linearity of ReLU networks
facilitates the extension of this consequence to the locally Lipschitz regime [
36
,
35
].
L
corresponds
to the norm of the affine map characterized by
f
conditioned on input
x
. These properties were
previously [39, 37, 30] used to characterize the robustness of a network at a sample (and the weight
associated with the sample).
The minimal
`p
-norm perturbation
δ
p
required to switch an sample’s label is given by the solution to
the following optimization problem:
δ
p= arg min ||δ||ps.t. c(x;θ)6=c(x+δ;θ).
A significant amount of existing work relies on a first-order approximations and Hölder’s inequality to
recover
δ
, justifying the popularity of inducing robustness by controlling global and local Lipschitz
constants. More concretely, given a
`p
norm and radius
, a typical goal of robust machine learning is
to learn classifiers that minimize the robust loss on a training dataset:
min
θ
E(x,y)∼D max
||δ||p`(y, f(x+δ;θ)).
For brevity we will denote the robust analogue of a loss
L
as
ˆ
L
(likewise, the pointwise loss
`
as
ˆ
`
),
indicating this is the robust counterpart of L, differentiated by the “inner” maximization problem.
2.2 Margin-aware Reweighting
In the framework of cost-sensitive learning, weights are assigned to the loss associated with individual
samples and the goal is to minimize the empirical weighted training loss:
Ltr(θ, w) :=
m
X
i=1
wi`i(θ).
3
Previous work in margin-aware adversarial training [
41
,
42
,
39
,
2
,
9
] typically substitutes the robust
loss
ˆ
Ltr
for
Ltr
and largely focuses on designing heuristic functions of various notions of margin to
use for the sample weight wi.
For example, in GAIRAT [
42
,
41
,
9
], the margin is defined as the least number of PGD steps, denoted
κ
, that leads the classifier to make an incorrect prediction. The sample’s weight is computed as
ωGAIRAT(xi) = 1
2(1 + tanh(λ+ 5(1 2κ/K)))
with hyperparameters
K
and
λ
. A small
κ
indicates
that the sample lies close to the decision boundary. Larger
κ
values imply that associated samples lie
far from the decision boundary, and are therefore more robust, requiring smaller weights. However,
due to the non-linearity of the loss-surface in practice, PGD-based attacks with finite iterations may
suffer from the same issues that plague standard iterative first-order methods in non-convex settings.
In other words,
κ
is heavily dependent on the optimization path taken by PGD. This is demonstrated
by GAIRAT’s vulnerability to sophisticated attacks, e.g. AutoAttack [8].
Zhang et al.
[41]
define the margin as the difference between the loss of a network suffered at a
clean sample and its adversarial variant. Zeng et al.
[39]
, Wang et al.
[30]
, Balaji et al.
[2]
propose a
definition of margin corresponding to taking differences between logits, as follows.
Definition 2.2
(Zeng et al.
[39]
, Wang et al.
[30]
)
.
The margin of a classifier
f
on sample
(x, y)
is
the difference between the confidence of
f
in the true label
y
and the maximal probability of an
incorrect label t, margin(x, y;θ) = p(f(x;θ) = y)maxt6=yp(f(x;θ) = t).
Given this definition, Zeng et al.
[39]
, Wang et al.
[30]
propose to use exponential (WMMR)
and sigmoidal (MAIL) functions respectively:
ωWMMR(xi) = exp(αm)
with parameter
α
, and
ωMAIL(xi) = sigmoid(γ(mβ))
with parameters
γ
and
β
. WMMR and MAIL rely on the local
linearity of ReLU networks and that for samples near the margin, the relative scale of predicted
class-likelihoods directly corresponds to the distance to the decision boundary. However, similarly to
GAIRAT’s
κ
, even for samples very close to the decision boundary, simple functions of the difference
between class likelihoods may not necessarily correspond to the true distance to the decision boundary.
In contrast, we propose a more fine-grained notion of margin, the multi-class margin, and a method
to learn a mapping between the margin at a sample and its associated weight, rather than use a
predefined heuristic function.
Previous work has explored theoretical notions of a multi-class margin. For example, Zou
[43]
defined the margin vector in the context of boosting as a proxy for a vector of conditional class
probabilities. However, this notion of margin is unaware of the true class of a sample. In contrast,
the multi-class margin proposed by Saberian and Vasconcelos
[24]
, Cortes et al.
[6]
are both closely
related to Wang et al.
[30]
, Zeng et al.
[39]
, i.e. defined as the minimal distance between an arbitrary
predicted logit and the logit of the true class.
In Fig. 1 we explore the relationship between the logits of a network evaluated at a clean sample
and the predicted class of the adversarially perturbed variant. Methods which rely on the canonical
notions of margin reasonably assume that samples at which a classifier is vulnerable have small
margin according to Def. 2.2, i.e. the magnitude of the smallest difference between the logits of
any class and the logit corresponding the true class is small. However, we demonstrate in Fig. 1(b)
that a significant number of predictions made by vulnerable classifiers on perturbed samples do not
correspond to the classes with minimal margin. In other words, the class for which the margin is
smallest does not always correspond to the adversarial class. Furthermore, this issue is exacerbated
for robust networks as shown by the difference in count distribution between networks whose relative
robustness varies.
2.3 Bi-level Optimization and Meta-learning
Bilevel optimization, first introduced by Bracken and McGill
[4]
is an optimization framework
involving nested optimization problems. A typical bilevel optimization problem takes on the form:
min
xRpΦ(x) := f(x, y(x)) s.t. yarg min
yRp
g(x, y),(3)
where
f
and
g
are respectively denoted the upper-level and lower-level objectives. The goal of the
framework is to minimize the primary objective
Φ(x)
with respect to
x
where
y(x)
is obtained
by solving the lower-level minimization problem. The framework of bilevel optimization has seen
adoption by the machine learning community—in particular in the context of hyperparameter tuning
4
摘要:

LearningSampleReweightingforAccuracyandAdversarialRobustnessChesterHoltzComputerScienceandEngineeringUniversityofCaliforniaSanDiegoLaJolla,CA92093chholtz@eng.ucsd.eduTsui-WeiWengHaliciogluDataScienceInstituteUniversityofCaliforniaSanDiegoLaJolla,CA92093lweng@ucsd.eduGalMishneHaliciogluDataScienceI...

展开>> 收起<<
Learning Sample Reweighting for Accuracy and Adversarial Robustness Chester Holtz.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:1.54MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注