Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization Theory and Algorithm Huiyang Shao12Qianqian Xu1Zhiyong Yang2

2025-05-06 0 0 1.21MB 36 页 10玖币

侵权投诉

Asymptotically Unbiased Instance-wise Regularized

Partial AUC Optimization: Theory and Algorithm

Huiyang Shao1,2Qianqian Xu1∗Zhiyong Yang2

Shilong Bao3,4Qingming Huang1,2,5,6∗

1Key Lab of Intell. Info. Process., Inst. of Comput. Tech., CAS

2School of Computer Science and Tech., University of Chinese Academy of Sciences

3State Key Lab of Info. Security, Inst. of Info. Engineering, CAS

4School of Cyber Security, University of Chinese Academy of Sciences

5BDKM, University of Chinese Academy of Sciences

6Peng Cheng Laboratory

shaohuiyang21@mails.ucas.ac.cn xuqianqian@ict.ac.cn

yangzhiyong21@ucas.ac.cn baoshilong@iie.ac.cn qmhuang@ucas.ac.cn

Abstract

The Partial Area Under the ROC Curve (PAUC), typically including One-way

Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC), measures the average

performance of a binary classiﬁer within a speciﬁc false positive rate and/or true

positive rate interval, which is a widely adopted measure when decision constraints

must be considered. Consequently, PAUC optimization has naturally attracted

increasing attention in the machine learning community within the last few years.

Nonetheless, most of the existing methods could only optimize PAUC approxi-

mately, leading to inevitable biases that are not controllable. Fortunately, a recent

work presents an unbiased formulation of the PAUC optimization problem via dis-

tributional robust optimization. However, it is based on the pair-wise formulation

of AUC, which suffers from the limited scalability w.r.t. sample size and a slow

convergence rate, especially for TPAUC. To address this issue, we present a simpler

reformulation of the problem in an asymptotically unbiased and instance-wise

manner. For both OPAUC and TPAUC, we come to a nonconvex strongly concave

minimax regularized problem of instance-wise functions. On top of this, we em-

ploy an efﬁcient solver enjoys a linear per-iteration computational complexity w.r.t.

the sample size and a time-complexity of

O(−1/3)

to reach a



stationary point.

Furthermore, we ﬁnd that the minimax reformulation also facilitates the theoretical

analysis of generalization error as a byproduct. Compared with the existing results,

we present new error bounds that are much easier to prove and could deal with

hypotheses with real-valued outputs. Finally, extensive experiments on several

benchmark datasets demonstrate the effectiveness of our method.

1 Introduction

AUC refers to the Area Under the Receiver Operating Characteristic (ROC) curve [

], where the ROC

curve is obtained by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) of a

given classiﬁer for all possible thresholds. Since it is insensitive to the class distribution, AUC has

become one of the standard metrics for long-tail, and imbalanced datasets [

]. Consequently,

AUC optimization has attracted increasing attention in the machine learning community ever since

the early 2000s [

]. Over the last two decades, research on AUC optimization has

evolved from the simplest linear models and decision trees [

] to state-of-the-art deep

∗Corresponding authors.

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.03967v2 [cs.LG] 11 Oct 2022

Table 1: Comparison with existing partial AUC algorithms. The convergence rate represents the

number of iterations after which an algorithm can ﬁnd an



-stationary point, where



-sp is



-stationary

point.

implies a natural result of non-convex SGD.

(

−

resp.) is the number of positive

(negative resp.) instances for each mini-batch B.

SOPA [44] SOPA-S [44] TPAUC [39] Ours

Convergence Rate (OPAUC)O(−4)O(−4)O(−4)4O(−3)

Convergence Rate (TPAUC)O(−6)O(−4)O(−4)4O(−3)

Convergence Measure -sp (non-smooth) -sp -sp -sp

Smoothness non-smooth smooth smooth smooth

Unbiasedness √× × with bias O(1/κ)

when ω= 0

Per-Iteration Time Complexity O(nB

+nB

−)O(nB

+nB

−)O(nB

+nB

−)O(nB

++nB

−)

learning architectures [

]. With such remarkable success, one can now easily

apply AUC optimization to deal with various real-world problems ranging from ﬁnancial fraud

detection[16,4,23], spam email detection [24], to medical diagnosis [24,38,37,43], etc.

FPR FPR

TPR TPR

0101

FPR

TPR

0 1

(a) AUC (b) OPAUC (c) TPAUC

Figure 1: The comparison among different AUC variants. (a) The area under the ROC curve (AUC).

(b) The one-way partial AUC (OPAUC). (c) The two-way partial AUC (TPAUC).

However, in such long-tailed applications, we are often interested in a speciﬁc region in the ROC curve

where its area is called Partial AUC (PAUC). As illustrated in Fig.1, there are two types of PAUC.

Here, One-way Partial AUC (OPAUC) measures the area within an FPR interval (

0≤FPR ≤β

);

while Two-way Partial AUC (TPAUC) measures the area with

FPR ≤β

TPR ≥α

. Unlike

the full AUC, optimizing PAUC requires selecting top-ranked or/and bottom-ranked instances,

leading to a hard combinatorial optimization problem. Many efforts have been made to solve the

problem [

]. However, the majority of them rely on full-batch optimization and

the approximation of the top (bottom)-

ranking process, which suffers from immeasurable biases

and undesirable efﬁciency. Most recently, researchers have started to explore mini-batch PAUC

optimization for deep learning models. [

] proposed a novel end-to-end optimization framework

for PAUC. This formulation has a fast convergence rate with the help of a stochastic optimization

algorithm, but the estimation of PAUC is still biased. Later, [

] proposed a Distributional Robust

Optimization (DRO) framework for PAUC optimization, where the bias can be eliminated by a clever

reformulation and the compositional SGD algorithms [

]. However, they adopt the pair-wise loss

function which has limited scalability w.r.t. sample size and

O(−4)

O(−6)

time complexity to

reach the



-stationary point for

OPAUC

TPAUC

. Considering the efﬁciency bottleneck comes from

the pair-wise formulation, we will explore the following question in this paper:

Can we design a simpler, nearly asymptotically unbiased and instance-wise formulation

to optimize OPAUC and TPAUC in an efﬁcient way?

To answer this question, we propose an efﬁcient and nearly unbiased optimization algorithm (the bias

vanishes asymptotically when

κ→0

) for regularized PAUC maximization with a faster convergence

guarantee. The comparison with previous results are listed in Tab.1. We consider both

OPAUC

and

TPAUC

maximization, where for

OPAUC

, we focus on maximizing PAUC in the region:

FPR ≤β

and for

TPAUC

we focus on the region:

FPR ≤β

and

TPR ≥α

. We summarize our contributions

below.

•

With a proper regularization, we propose a nonconvex strongly concave minimax instance-

wise formulation for

OPAUC

and

TPAUC

maximization. On top of our proposed formu-

lation, we employ an efﬁcient stochastic minimax algorithm that ﬁnds a



-stationary point

within O(−3)iterations.

•

We conduct a generalization analysis of our proposed methods. Our instance-wise reformula-

tion can overcome the interdependent issue of the original pair-wise generalization analysis.

The proof is much easier than the existing results. Moreover, compared with [

], it can

be applied to any real-valued hypothesis functions other than the hard-threshold functions.

•

We conduct extensive experiments on multiple imbalanced image classiﬁcation tasks. The

experimental results speak to the effectiveness of our proposed methods.

2 Preliminaries

Notations. In this section, we give some deﬁnitions and preliminaries about OPAUC and TPAUC.

Let

X ⊆ Rd

be the input space,

Y={0,1}

be the label space. We denote

and

as positive

and negative instance distribution, respectively. Let

S={z= (xi, yi)}n

i=1

be a set of training

data drawn from distribution

, where

is the number of samples. Let

(

resp.) be a set

of positive (negative resp.) instances in the dataset.

In this paper we only focus on the scoring

functions f:X 7→ [0,1]

. The output range can be simply implemented by any deep neural network

with sigmoid outputs.

Standard AUC.

The standard AUC calculates the entire area under the ROC curve. For mathematical

convenience, our arguments start with the pair-wise reformulation of AUC. Speciﬁcally, as shown

in [

AUC

measures the probability of a positive instance having a higher score than a negative

instance:

AUC(f) = Pr

x∼DP,x0∼DN

[f(x)> f(x0)] .(1)

OPAUC.

As mentioned in the introduction, instead of considering the entire region of ROC, we

focus on two forms of PAUC, namely TPAUC and OPAUC. According to [

OPAUC

is equivalent

to the probability of a positive instance

being scored higher than a negative instance

within the

speciﬁc range f(x0)∈[ηβ(f),1] s.t.Pr

x0∼DN

[f(x0)≥ηβ] = β:

OPAUC(f) = Pr

x∼DP,x0∼DN

[f(x)> f(x0), f(x0)≥ηβ(f)] .(2)

Practically, we do not know the exact data distributions

to calculate Eq.

(2)

. Therefore, we

turn to the empirical estimation of Eq.

(2)

. Given a ﬁnite dataset

with

instances, let

n−

the numbers of positive/negative instances, respectively. For the

OPAUC

, its empirical estimation

could be expressed as [24]:

AUCβ(f, S)=1−

i=1

nβ

−

j=1

`0,1f(xi)−f(x0

[j])

n+nβ

−

,(3)

where

nβ

−=bn−·βc

;

[j]

denotes the

-th largest score among negative samples;

`0,1(t)

is the

0−1

loss, which returns 1if t < 0and 0otherwise.

TPAUC.

More recently, [

] argued that an efﬁcient classiﬁer should have low

FPR

and high

TPR simultaneously. Therefore, we also study a more general variant called Two-way Partial AUC

(

TPAUC

), where the restricted regions satisfy

TPR ≥α

and

FPR ≤β

. Similar to

OPAUC

TPAUC

measures the probability that a positive instance

ranks higher than a negative instance

where

f(x)∈[0, ηα(f)]

,s.t.

x∼DP

[f(x)≤ηα] = α, f(x0)∈[ηβ(f),1]

s.t.

x0∼DN

[f(x0)≥ηβ] =

β.

TPAUC(f) = Pr

x∼DP,x0∼DN

[f(x)> f(x0), f(x)≤ηα(f), f(x0)≥ηβ(f)] .(4)

Similarly to OPAUC, for the TPAUC, we adopt its empirical estimation [36,39]:

AUCα,β (f, S)=1−

nα

i=1

nβ

−

j=1

`0,1f(x[i])−f(x0

[j])

nα

+nβ

−

,(5)

where nα

+=bn+·αcand x[i]is i-th smallest score among all positive instances.

3 Problem Formulation

In this section, we introduce how to optimize

OPAUC

and

TPAUC

in an asymptotically unbiased

instance-wise manner. Note that Eq.

(3)

and Eq.

(5)

are hard to optimize since it is complicated to

determine the positive quantile function

ηα(f)

and the negative quantile function

ηβ(f)

. So we can

not obtain the bottom-ranked positive instances and top-ranked negative instances directly. In this

section, we will elaborate on how to tackle these challenges.

3.1 Optimizing the OPAUC

According to Eq.

(3)

, given a surrogate loss

and the ﬁnite dataset

, maximizing

OPAUC

and

AUCβ(f, S)is equivalent to solving the following problems, respectively:

min

fRβ(f) = Ex∼DP,x0∼DNIf(x0)≥ηβ(f)·`(f(x)−f(x0)),(6)

min

Rβ(f, S) =

i=1

nβ

−

j=1

`f(xi)−f(x0

[j])

n+nβ

−

.(7)

Step 1: Instance-wise Reformulation.

Here, to simplify the reformulation, we will use the most

popular

surrogate squared loss `(x) = (1 −x)2

. Under this setting, the following theorem shows

an instance-wise reformulation of the

OPAUC

optimization problem (please see Appendix.Ffor the

proof):

Theorem 1. Assuming that f(x)∈[0,1],∀x∈ X,Fop(f, a, b, γ, t, z)is deﬁned as:

Fop(f, a, b, γ, t, z) =[(f(x)−a)2−2(1 + γ)f(x)]y/p −γ2

[(f(x)−b)2+ 2(1 + γ)f(x)] ·[(1 −y)If(x)≥t]/[(1 −p)β],(8)

where

y= 1

for positive instances,

y= 0

for negative instances and we have the following

conclusions:

(a) (Population Version.) We have:

min

fRβ(f)⇔min

f,(a,b)∈[0,1]2max

γ∈[−1,1]

z∼DZ

[Fop(f, a, b, γ, ηβ(f),z)] ,(9)

where ηβ(f)= arg minηβ∈REx0∼DN[If(x0)≥ηβ] = β.

(b) (Empirical Version.) Moreover, given a training dataset Swith sample size n, denote:

z∼S[Fop(f, a, b, γ, ˆηβ(f),z)] = 1

i=1

Fop(f, a, b, γ, ˆηβ(f),zi),

where ˆηβ(f)is the empirical quantile of the negative instances in S. We have:

min

Rβ(f, S)⇔min

f,(a,b)∈[0,1]2max

γ∈[−1,1]

z∼S[Fop(f, a, b, γ, ˆηβ(f),z)] ,(10)

Step 2: Differentiable Sample Selection.

Thm.1provides a support to convert the pair-wise loss

into instance-wise loss for

OPAUC

. However, the minimax problem Eq.

(10)

is still difﬁcult to solve

due to the operation

If(x0)≥ηβ(f)

, which requires selecting top-ranked negative instances. To make

the sample selection process differentiable, we adopt the following lemma.

Lemma 1. Pk

i=1 x[i]

is a convex function of

(x1,··· , xn)

where

x[i]

is the top-i element of a

set

{x1, x2,··· , xn}

. Furthermore, for

xi, i = 1,··· , n

, we have

kPk

i=1 x[i]= mins{s+

kPn

i=1[xi−s]+}

, where

[a]+= max{0, a}

. The population version is

Ex[x·Ix≥η(α)] =

mins1

αEx[αs + [x−s]+]

, where

η(α) = arg minη∈R[Ex[Ix≥η] = α]

(please see Appendix.F

for the proof).

Lem.1proposes an Average Top-

(ATk) loss which is the surrogate loss for top-

loss to eliminate

the sorting problem. Optimizing the ATk loss is equivalent to selecting top-ranked instances. Actually,

for

Rβ(f)

, we can just reformulate it as an Average Top-

(ATk) loss. Denote

`−(x0)=(f(x0)−

b)2+ 2(1 + γ)f(x0)

. In the proof of the next theorem, we will show that

`−(x0)

is an increasing

function w.r.t. f(x0), namely:

Ex0∼DN[If(x0)≥ηβ(f)·`−(x0)|f(x0)≥ηβ(f))] = min

β·Ex0∼DN[βs + [`−(x0)−s]+].(11)

The similar result holds for

Rβ(f, S)

. Then, we can reach to Thm.2(please see Appendix.Ffor the

proof):

Theorem 2.

Assuming that

f(x)∈[0,1]

, for all

x∈ X

, we have the equivalent optimization for

OPAUC:

min

f,(a,b)∈[0,1]2max

γ∈[−1,1]

z∼DZ

[Fop(f, a, b, γ, ηβ(f),z)]

⇔min

f,(a,b)∈[0,1]2max

γ∈Ωγ

min

s0∈Ωs0

z∼DZ

[Gop(f, a, b, γ, z, s0)],(12)

min

f,(a,b)∈[0,1]2max

γ∈[−1,1]

z∼S[Fop(f, a, b, γ, ˆηβ(f),z)]

⇔min

f,(a,b)∈[0,1]2max

γ∈Ωγ

min

s0∈Ωs0

z∼S[Gop(f, a, b, γ, z, s0)],(13)

where Ωγ= [b−1,1],Ωs0= [0,5] and

Gop(f, a, b,γ, z, s0) = [(f(x)−a)2−2(1 + γ)f(x)]y/p −γ2

+βs0+(f(x)−b)2+ 2(1 + γ)f(x)−s0+(1 −y)/[β(1 −p)].(14)

Step 3: Asymptotically Unbiased Smoothing.

Even with Thm.2, it is hard to optimize the min-

max-min formulation in Eq.

(13)

. A solution is to swap the order

maxγ

and

mins0

to reformulate it as

a min-max problem. The key obstacle to this idea is the non-smooth function

[·]+

. To avoid the

[·]+

we apply the softplus function [12]:

rκ(x) = log (1 + exp(κ·x))

κ,(15)

as a smooth surrogate. It is easy to show that

rκ(x)κ→∞

→[x]+

Denote Gκ

op(f, a, b, γ, z, s0)the

surrogate objective where the [·]+in Gop(f, a, b, γ, z, s0)is replaced with rκ(·)

. We then proceed

to solve the surrogate problem:

min

f,(a,b)∈[0,1]2max

γ∈Ωγ

min

s0∈Ωs0

z∼DZ

[Gκ

op(f, a, b, γ, z, s0)]

min

f,(a,b)∈[0,1]2max

γ∈Ωγ

min

s0∈Ωs0

z∼S[Gκ

op(f, a, b, γ, z, s0)],(16)

respectively for the population and empirical version. In Appendix.B, we will proof that such a

approximation has a convergence rate O(1/κ).

Step 4: The Regularized Problem.

It is easy to check that

rκ(x)

has a bounded second-order deriva-

tion. In this way, we can regard

Gκ

op(f, a, b, γ, z, s0)

as a weakly-concave function [

] of

. By

employing an `2regularization, we turn to a regularized form:

Gκ,ω

op (f, a, b, γ, z, s0) = Gκ

op(f, a, b, γ, z, s0)−ω·γ2,

With a sufﬁciently large

Gκ,ω

op (f, a, b, γ, z, s0)

is strongly-concave w.r.t.

when all the other

variables are ﬁxed. Note that the regularization scheme will inevitably bias. As a very general result,

regularization will inevitably induce bias. However, it is known to be a necessary building block to

stabilize the solutions and improve generalization performance. We then reach a minimax problem in

the ﬁnal step.

Step 5: Min-Max Swapping.

According to min-max theorem [

], if we replace

Gκ

op(f, a, b, γ, z, s0)

with Gκ,ω

op (f, a, b, γ, z, s0), the surrogate optimization problem satisﬁes:

min

f,(a,b)∈[0,1]2max

γ∈Ωγ

min

s0∈Ωs0

z∼DZ

[Gκ,ω

op ]⇔min

f,(a,b)∈[0,1]2,s0∈Ωs0

max

γ∈Ωγ

z∼DZ

[Gκ,ω

op ],(17)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AsymptoticallyUnbiasedInstance-wiseRegularizedPartialAUCOptimization:TheoryandAlgorithmHuiyangShao1;2QianqianXu1ZhiyongYang2ShilongBao3;4QingmingHuang1;2;5;61KeyLabofIntell.Info.Process.,Inst.ofComput.Tech.,CAS2SchoolofComputerScienceandTech.,UniversityofChineseAcademyofSciences3StateKeyLabofInfo....

展开>> 收起<<

Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization Theory and Algorithm Huiyang Shao12Qianqian Xu1Zhiyong Yang2.pdf

共36页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization Theory and Algorithm Huiyang Shao12Qianqian Xu1Zhiyong Yang2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: