Learning Sample Reweighting for Accuracy and Adversarial Robustness Chester Holtz

2025-05-02 0 0 1.54MB 18 页 10玖币

侵权投诉

Learning Sample Reweighting for

Accuracy and Adversarial Robustness

Chester Holtz

Computer Science and Engineering

University of California San Diego

La Jolla, CA 92093

chholtz@eng.ucsd.edu

Tsui-Wei Weng

Halicioˇ

glu Data Science Institute

University of California San Diego

La Jolla, CA 92093

lweng@ucsd.edu

Gal Mishne

Halicioˇ

glu Data Science Institute

University of California San Diego

La Jolla, CA 92093

gmishne@ucsd.edu

Abstract

There has been great interest in enhancing the robustness of neural network clas-

siﬁers to defend against adversarial perturbations through adversarial training,

while balancing the trade-off between robust accuracy and standard accuracy. We

propose a novel adversarial training framework that learns to reweight the loss

associated with individual training samples based on a notion of class-conditioned

margin, with the goal of improving robust generalization. We formulate weighted

adversarial training as a bilevel optimization problem with the upper-level problem

corresponding to learning a robust classiﬁer, and the lower-level problem corre-

sponding to learning a parametric function that maps from a sample’s multi-class

margin to an importance weight. Extensive experiments demonstrate that our

approach consistently improves both clean and robust accuracy compared to related

methods and state-of-the-art baselines.

1 Introduction

While neural networks have been extremely successful in tasks such as image classiﬁcation and speech

recognition, recent work [

] has demonstrated that neural network classiﬁers can be arbitrarily

fooled by small, adversarially-chosen perturbations of their input. Notably, Su et al.

[28]

demonstrated

that neural network classiﬁers which can correctly classify “clean” images may be vulnerable to

targeted attacks, e.g., misclassify those same images when only a single pixel is changed.

Recent work has shown a common failing among techniques that uniformly encourage robustness.

In particular, there exists an intrinsic tradeoff between robustness and accuracy [

]. Bao et al.

[3]

investigate this tradeoff from the perspective of classiﬁcation-callibrated loss theory. Rice et al.

[22]

empirically showed that during adversarial training networks often irreversibly lose robustness after

training for a short time. They dubbed this phenomenon adversarial overﬁtting while proposing early

stopping as a remedy. The signiﬁcance of label noise and memorization in the context of adversarial

overﬁtting was demonstrated by Sanyal et al.

[26]

—in particular that poor training samples induce

fragility to adversarial perturbations due to the tendency of neural networks to interpolate the training

data. Methods based on weight and logit smoothing have been proposed as an alternative to early

stopping [5] as well as techniques for dataset augmentation [20, 13] and local smoothing [36, 35].

Preprint. Under review.

arXiv:2210.11513v1 [cs.LG] 20 Oct 2022

In a different approach to addressing adversarial overﬁtting, Geometry-Aware Instance Reweighted

Adversarial Training (GAIRAT; [

]), Weighted Margin-aware Minimax Risk (WMMR; [

]), and

Margin-Aware Instance reweighting Learning (MAIL; [

]) control the inﬂuence of training examples

via importance or loss weighting. Intuitively, the samples assigned a low weight correspond to samples

on which the classiﬁer is already sufﬁciently robust. Generally, these methods are well-motivated—e.g.

by [

] who conclude that a good set of weights (large (small) weights for samples close (far) to the

decision boundary) are tied to generalization. However, existing methods rely on approximations of

the margin and employ heuristic weighting schemes that rely on careful choices of hyperparameters.

Building upon these observations, we present BiLAW (Bilevel Learnable Adversarial reWeighting),

an approach that explicitly learns a parametric function (e.g. represented by a small feed-forward

network) that assigns weights to the loss suffered by a classiﬁer, associated with individual training

samples. The sample weights are learned as a function of the classiﬁer multiclass margins of samples,

according to the weights’ effect on robust generalization. We employ a bi-level optimization formu-

lation [

] and leverage a validation set, where the upper-level objective corresponds to learning the

parameters of a robust classiﬁer, while the lower-level objective corresponds to learning a function that

predicts sample weights that improve robustness on a validation set. Our approach alternates between

iteratively updating the parametric sample weights and updating the classiﬁer network parameters.

Contributions

As far as we know, this is the ﬁrst work to explore a learning-based approach to

sample weighting in the context of adversarial training. Prior work [

] only used heuristics

to estimate the weight and did not involve any learning components. Our contributions include:

We propose BiLAW, a new adversarial training method based on learning sample weights as

a parametric function mapping from multi-class margins. Our method can be formulated as

a bi-level optimization problem that can be solved efﬁciently thanks to recent advances in

meta-learning.

We motivate and extend the notion of the robust margin of a classiﬁer at a particular sample

to the multi-class setting, and show that the magnitude of a sample’s learned weight directly

corresponds to the vulnerability of the classiﬁer at that sample.

We evaluate the performance of BiLAW on MNIST, F-MNIST, and CIFAR-10 and demon-

strate it signiﬁcantly improves clean accuracy by up to

and robust test accuracy by up

compared to TRADES and other state-of-the-art sample reweighting methods on

CIFAR-10.

2 Preliminaries and Related Work

In this section, we brieﬂy present background terminology pertaining to adversarially robust classiﬁ-

cation, sample reweighting and bilevel optimization.

Notations

Let

f:Rd→[0,1]k

be a feedforward ReLU network with

hidden layers and weights

; for example,

may map from a

-dimensional image to a

-dimensional vector corresponding to

likelihoods for kclasses.

Given a training set of

sample-label pairs

(xi, yi)

drawn from a training data distribution

, we

associate a weight

with each training sample. Informally, these weights characterize the effect

of the sample on the generalization of the network (i.e. samples with large weights promote robust

generalization and visa versa). Given a loss function

`:Rk×Rk→R

, we denote the empirical

weighted training loss suffered by a network with parameters

training samples with weights

to be

Ltr(θ, w) = Pm

i=1 wi`(yi, f(xi;θ))

such that

wi≥0

and

Piwi= 1

. For brevity, we write

`i(θ) = `(yi, f(xi;θ))

. Additionally, if

is left unspeciﬁed,

corresponds to the unweighted mean

over empirical losses. Likewise, the unweighted validation loss of nsamples is denoted Lval(θ).

2.1 Robust classiﬁcation and adversarial overﬁtting

Consider the network

f:Rd→Rk

, where the input is

-dimensional and the output is a

dimensional vector of likelihoods, with

-th entry corresponding to the likelihood the image belongs

to the

-th class. The associated classiﬁcation is then

c(x;θ) = arg maxj∈[1,k]fj(x;θ)

. In adversarial

machine learning, we are not just concerned that the classiﬁcation be correct, but we also want to

(a) (b)

Class 2

Class

Class 3

Decision

boundary

Decision

boundary

margin

Figure 1: (

) Diagram of multiclass margin. Larger samples denote samples that should be assigned

large weight, e.g., are misclassiﬁed or close to the decision boundary. Green (red) arrows denote

entries in the multiclass margin vector for a correctly (incorrectly) classiﬁed sampled. (

) Sorted

logit order and frequency of adversarial classiﬁcation. Number of instances where the prediction of

an adversarial sample corresponds to its

-th largest logit in CIFAR-10 (ignoring the

-th logit/sam-

ples where the prediction does not change). Colors represent the perturbation budget used during

adversarial training (i.e. degrees of robustness). Perturbations are computed using

`∞

-PGD with 10

iterations and a budget of 0.031.

be robust against adversarial examples, i.e. small perturbations to the input which may change the

classiﬁcation to an incorrect class. We deﬁne the notion of -robustness below:

Deﬁnition 2.1

(



-robust)

parameterized by

is called



-robust with respect to norm

if the

classiﬁcation is consistent for a small ball of radius around x:

c(x+δ;θ) = c(x;θ),∀δ:||δ||p≤. (1)

Note that the



-robustness of

is intimately related to the uniform Lipschitz smoothness of

around x. Recall that a function fhas ﬁnite Lipschitz constant L > 0with respect to norm || · ||, if

∃L≥0s.t. |f(x)−f(x0)| ≤ L· ||x−x0||,∀x, x0∈X. (2)

An immediate consequence of Eq. 1 and Eq. 2 is that if

is uniformly

-Lipschitz, then



-robust

with

=1

2L(Pa−Pb)

where

is the likelihood of the most likely outcome, and

is the

likelihood of the second most likely outcome [

]. The piecewise linearity of ReLU networks

facilitates the extension of this consequence to the locally Lipschitz regime [

corresponds

to the norm of the afﬁne map characterized by

conditioned on input

. These properties were

previously [39, 37, 30] used to characterize the robustness of a network at a sample (and the weight

associated with the sample).

The minimal

-norm perturbation

δ∗

required to switch an sample’s label is given by the solution to

the following optimization problem:

δ∗

p= arg min ||δ||ps.t. c(x;θ)6=c(x+δ;θ).

A signiﬁcant amount of existing work relies on a ﬁrst-order approximations and Hölder’s inequality to

recover

δ∗

, justifying the popularity of inducing robustness by controlling global and local Lipschitz

constants. More concretely, given a

norm and radius



, a typical goal of robust machine learning is

to learn classiﬁers that minimize the robust loss on a training dataset:

min

E(x,y)∼D max

||δ||p≤`(y, f(x+δ;θ)).

For brevity we will denote the robust analogue of a loss

(likewise, the pointwise loss

indicating this is the robust counterpart of L, differentiated by the “inner” maximization problem.

2.2 Margin-aware Reweighting

In the framework of cost-sensitive learning, weights are assigned to the loss associated with individual

samples and the goal is to minimize the empirical weighted training loss:

Ltr(θ, w) :=

i=1

wi`i(θ).

Previous work in margin-aware adversarial training [

] typically substitutes the robust

loss

Ltr

for

Ltr

and largely focuses on designing heuristic functions of various notions of margin to

use for the sample weight wi.

For example, in GAIRAT [

], the margin is deﬁned as the least number of PGD steps, denoted

, that leads the classiﬁer to make an incorrect prediction. The sample’s weight is computed as

ωGAIRAT(xi) = 1

2(1 + tanh(λ+ 5(1 −2κ/K)))

with hyperparameters

and

. A small

indicates

that the sample lies close to the decision boundary. Larger

values imply that associated samples lie

far from the decision boundary, and are therefore more robust, requiring smaller weights. However,

due to the non-linearity of the loss-surface in practice, PGD-based attacks with ﬁnite iterations may

suffer from the same issues that plague standard iterative ﬁrst-order methods in non-convex settings.

In other words,

is heavily dependent on the optimization path taken by PGD. This is demonstrated

by GAIRAT’s vulnerability to sophisticated attacks, e.g. AutoAttack [8].

Zhang et al.

[41]

deﬁne the margin as the difference between the loss of a network suffered at a

clean sample and its adversarial variant. Zeng et al.

[39]

, Wang et al.

[30]

, Balaji et al.

[2]

propose a

deﬁnition of margin corresponding to taking differences between logits, as follows.

Deﬁnition 2.2

(Zeng et al.

[39]

, Wang et al.

[30]

)

The margin of a classiﬁer

on sample

(x, y)

the difference between the conﬁdence of

in the true label

and the maximal probability of an

incorrect label t, margin(x, y;θ) = p(f(x;θ) = y)−maxt6=yp(f(x;θ) = t).

Given this deﬁnition, Zeng et al.

[39]

, Wang et al.

[30]

propose to use exponential (WMMR)

and sigmoidal (MAIL) functions respectively:

ωWMMR(xi) = exp(−αm)

with parameter

, and

ωMAIL(xi) = sigmoid(−γ(m−β))

with parameters

and

. WMMR and MAIL rely on the local

linearity of ReLU networks and that for samples near the margin, the relative scale of predicted

class-likelihoods directly corresponds to the distance to the decision boundary. However, similarly to

GAIRAT’s

, even for samples very close to the decision boundary, simple functions of the difference

between class likelihoods may not necessarily correspond to the true distance to the decision boundary.

In contrast, we propose a more ﬁne-grained notion of margin, the multi-class margin, and a method

to learn a mapping between the margin at a sample and its associated weight, rather than use a

predeﬁned heuristic function.

Previous work has explored theoretical notions of a multi-class margin. For example, Zou

[43]

deﬁned the margin vector in the context of boosting as a proxy for a vector of conditional class

probabilities. However, this notion of margin is unaware of the true class of a sample. In contrast,

the multi-class margin proposed by Saberian and Vasconcelos

[24]

, Cortes et al.

[6]

are both closely

related to Wang et al.

[30]

, Zeng et al.

[39]

, i.e. deﬁned as the minimal distance between an arbitrary

predicted logit and the logit of the true class.

In Fig. 1 we explore the relationship between the logits of a network evaluated at a clean sample

and the predicted class of the adversarially perturbed variant. Methods which rely on the canonical

notions of margin reasonably assume that samples at which a classiﬁer is vulnerable have small

margin according to Def. 2.2, i.e. the magnitude of the smallest difference between the logits of

any class and the logit corresponding the true class is small. However, we demonstrate in Fig. 1(b)

that a signiﬁcant number of predictions made by vulnerable classiﬁers on perturbed samples do not

correspond to the classes with minimal margin. In other words, the class for which the margin is

smallest does not always correspond to the adversarial class. Furthermore, this issue is exacerbated

for robust networks as shown by the difference in count distribution between networks whose relative

robustness varies.

2.3 Bi-level Optimization and Meta-learning

Bilevel optimization, ﬁrst introduced by Bracken and McGill

[4]

is an optimization framework

involving nested optimization problems. A typical bilevel optimization problem takes on the form:

min

x∈RpΦ(x) := f(x, y∗(x)) s.t. y∗∈arg min

y∈Rp

g(x, y),(3)

where

and

are respectively denoted the upper-level and lower-level objectives. The goal of the

framework is to minimize the primary objective

Φ(x)

with respect to

where

y∗(x)

is obtained

by solving the lower-level minimization problem. The framework of bilevel optimization has seen

adoption by the machine learning community—in particular in the context of hyperparameter tuning

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LearningSampleReweightingforAccuracyandAdversarialRobustnessChesterHoltzComputerScienceandEngineeringUniversityofCaliforniaSanDiegoLaJolla,CA92093chholtz@eng.ucsd.eduTsui-WeiWengHaliciogluDataScienceInstituteUniversityofCaliforniaSanDiegoLaJolla,CA92093lweng@ucsd.eduGalMishneHaliciogluDataScienceI...

展开>> 收起<<

Learning Sample Reweighting for Accuracy and Adversarial Robustness Chester Holtz.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Learning Sample Reweighting for Accuracy and Adversarial Robustness Chester Holtz

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: