OpenAUC Towards AUC-Oriented Open-Set Recognition Zitai Wang12Qianqian Xu3Zhiyong Yang4

2025-05-02 0 0 1.52MB 20 页 10玖币

侵权投诉

OpenAUC: Towards AUC-Oriented

Open-Set Recognition

Zitai Wang1,2Qianqian Xu3∗Zhiyong Yang4

Yuan He5Xiaochun Cao6,1Qingming Huang4,3,7,8∗

1SKLOIS, Institute of Information Engineering, CAS

2School of Cyber Security, University of Chinese Academy of Sciences

3Key Lab. of Intelligent Information Processing, Institute of Computing Tech., CAS

4School of Computer Science and Tech., University of Chinese Academy of Sciences

5Alibaba Group

6School of Cyber Science and Tech., Shenzhen Campus, Sun Yat-sen University

7BDKM, University of Chinese Academy of Sciences

8Peng Cheng Laboratory

wangzitai@iie.ac.cn xuqianqian@ict.ac.cn

yangzhiyong21@ucas.ac.cn heyuan.hy@alibaba-inc.com

caoxiaochun@mail.sysu.edu.cn qmhuang@ucas.ac.cn

Abstract

Traditional machine learning follows a close-set assumption that the training and

test set share the same label space. While in many practical scenarios, it is inevitable

that some test samples belong to unknown classes (open-set). To ﬁx this issue,

Open-Set Recognition (OSR), whose goal is to make correct predictions on both

close-set samples and open-set samples, has attracted rising attention. In this

direction, the vast majority of literature focuses on the pattern of open-set samples.

However, how to evaluate model performance in this challenging task is still

unsolved. In this paper, a systematic analysis reveals that most existing metrics

are essentially inconsistent with the aforementioned goal of OSR: (1) For metrics

extended from close-set classiﬁcation, such as Open-set F-score, Youden’s index,

and Normalized Accuracy, a poor open-set prediction can escape from a low

performance score with a superior close-set prediction. (2) Novelty detection

AUC, which measures the ranking performance between close-set and open-set

samples, ignores the close-set performance. To ﬁx these issues, we propose a novel

metric named OpenAUC. Compared with existing metrics, OpenAUC enjoys a

concise pairwise formulation that evaluates open-set performance and close-set

performance in a coupling manner. Further analysis shows that OpenAUC is

free from the aforementioned inconsistency properties. Finally, an end-to-end

learning method is proposed to minimize the OpenAUC risk, and the experimental

results on popular benchmark datasets speak to its effectiveness. Project Page:

https://github.com/wang22ti/OpenAUC.

1 Introduction

Traditional classiﬁcation algorithms have achieved tremendous success under the close-set assumption

that all the test classes are known during the training period. However, in many practical scenarios,

it is inevitable that some test samples belong to none of the known classes. In this case, a close-set

model will classify all the novel samples into known classes, inducing a signiﬁcant performance

degeneration. To ﬁx this issue, Open-Set Recognition (OSR) has attracted rising attention in recent

years [

], where the model is required to not only (1) correctly

∗Corresponding authors.

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.13458v3 [cs.LG] 22 Feb 2023

classify the close-set samples but also (2) discriminate the open-set samples from the close-set ones.

In this complicated setting, how to evaluate model performance becomes a challenging problem.

Existing work has proposed several metrics, which fall into two categories:

The ﬁrst direction extends traditional classiﬁcation metrics to the open-set scenario. To this end,

one should ﬁrst extend the close-set confusion matrix with unknown classes, where a threshold

decides whether the input sample belongs to the unknown classes. On top of this,

open-set F-score

[

] summarizes the True Positive (TP), False Positive (FP), and False Negative

(FN) performance of known classes.

Youden’s index

[

] takes the sum of the True Positive Rate

(TPR) and True Negative Rate (TNR) performance of known classes as the performance measure.

Besides,

Normalized Accuracy

[

] summarizes the close-set accuracy and the open-set accuracy

via a convex combination. Although it is intuitive to extend close-set metrics, we point out that

these metrics are essentially inconsistent with the goal of OSR. Speciﬁcally, for open-set F-score and

Youden’s index, only the FP/FN performances of known classes evaluate the open-set performance

implicitly. As a result, these metrics will encourage classifying close-set samples into the open-set

to decrease the FN of known classes. Moreover, Normalized Accuracy encourages selecting the

threshold classifying more open-set samples into known classes. In extreme cases, even a close-set

model (i.e., all the open-set samples are classiﬁed into known classes) can obtain a high performance

on these metrics.

The second category regards OSR as a novelty detection problem [

] with multiple known

classes. Based on such observation, the Area Under ROC Curve (

AUC

) [

], which measures

the ranking performance between known classes and unknown classes, has become a popular metric

[

]. Compared with classiﬁcation-based metrics, AUC is insensitive to the selection of

threshold since it summarizes the True Positive Rate (TPR) performance for all possible thresholds.

However, the limitation of AUC is also obvious: the close-set performance is ignored. A natural

remedy is to adopt the close-set accuracy as a complementary metric [

]. However, what we expect is

a model that can make correct predictions on close-set and open-set simultaneously. This decoupling

strategy will induce a challenging multi-objective optimization problem and is also unfavorable to

comparing the overall performances of different models. What’s more, simply aggregating these two

metrics will induce another inconsistency property.

In view of this, a natural question arises:

Whether there exists a numeric metric that is consistent with the goal of OSR?

To answer this question, we propose a novel metric named

OpenAUC

. Speciﬁcally, the proposed

metric enjoys a concise pairwise formulation, where each pair consists of a close-set sample and an

open-set sample. For each pair, only if the close-set sample has been classiﬁed into the correct known

class, OpenAUC will check whether the open-set sample is ranked higher than the close-set one. In

this sense, OpenAUC evaluates the close-set performance and the open-set performance in a coupling

manner, which is consistent with the goal of OSR. What’s more, beneﬁting from the ranking operator,

OpenAUC overcomes the sensitivity of the threshold, and further analysis shows that maximizing

OpenAUC will guarantee a better open-set performance under a mild assumption on the threshold.

Considering these advantages, we further establish an end-to-end learning method to maximize

OpenAUC. Finally, extensive experiments conducted on multiple benchmark datasets validate the

proposed metric and learning method. To sum up, the contribution of this paper is three-fold:

•

We make a detailed analysis of existing metrics for OSR. The theoretical results show

that existing metrics, including the classiﬁcation-based ones and AUC, are essentially

inconsistent with the goal of OSR due to their own limitations.

•

A novel metric, named OpenAUC, is proposed. Beneﬁting from its concise formulation,

further analysis shows that OpenAUC overcomes the limitations of existing metrics and thus

is free from the inconsistency properties.

•

An end-to-end learning method is proposed to optimize OpenAUC, and the empirical results

on multiple benchmark datasets validate its effectiveness.

Table 1: The consistency analysis of existing metrics for OSR.

Metric P1 (close) P2 (open) P3 (threshold) P4 (numeric)

Open-set F-score [15]X× × X

Youden’s index [17]X× × X

Normalized Accuracy [15]X X ×X

AUC [3]×X X X

The OSCR curve [4]X X X ×

OpenAUC (Ours) X X X X

2 Preliminary

Problem deﬁnition.

In open-set recognition, the training samples

{zi= (xi, yi)}n

i=1

are drawn

from a product space

Zk=X × Yk

, where

is the input space, and

Yk={1,· · · , C}

is the

label space of known classes. During the test period, some samples might belong to none of the

known classes. For the sake of simplicity, all these samples can be allocated to one super unknown

class. In other words, the open-set samples are drawn from a product space

Zu=X × Yu

, where

Yu={C+ 1}

is the label space of unknown classes. To make predictions, OSR ﬁrst requires a

rejector

R=g1◦r

to judge whether an input sample comes from open-set, where

r:X → R

the open-set score function, and

g1:R→ {0,1}

is the open-set decision function. To be speciﬁc,

an input

will be classiﬁed as an open-set sample if

r(x)

is greater than a given threshold

t∈R

For the samples with

R(x)=0

, a classiﬁer

h=g2◦f

is further required to make predictions on

known classes, where

f:X → RC

is the close-set score function and

g2:RC→ Yk

is the close-set

decision function. In view of this, a proper metric for OSR should enjoy the following properties:

•(P1)

For close-set samples, the metric not only evaluates whether the open-set score function

outputs low open-set scores but also requires that the classiﬁer

make correct predictions.

•(P2)

For open-set samples, the metric should check whether the open-set score function

outputs high open-set scores.

•(P3)

The metric should be insensitive to the threshold

because different ratios of open-set

samples will induce different optimal thresholds, but such a ratio is unavailable during the

training period.

•(P4)

The metric should be a single numeric number to favor comparing the overall perfor-

mances of different models.

Roadmap.

Next, we ﬁrst present a detailed analysis of existing metrics in Sec.3. The results

show that these metrics are essentially inconsistent with the aforementioned properties, which are

summarized in Tab.1. Furthermore, a novel metric named OpenAUC and its end-to-end learning

method is proposed in Sec.4to overcome the inconsistency of existing metrics.

3 Existing metrics for Open-set Recognition

Existing metrics for OSR fall into two categories: the classiﬁcation-based ones and the novelty-

detection ones. The ﬁrst category extends existing classiﬁcation metrics to the open-set scenario,

while the second one regards OSR as a generalized novelty detection problem. We will present a

detailed analysis of these metrics in the rest of this section.

3.1 Open-set F-score and Youden’s Index

To extend classiﬁcation metrics, one should ﬁrst extend the confusion matrix with the unknown class.

Let

TPi,TNi,FPi,FNi

denote the True Positive (TP), True Negative (TN), False Positive (FP), False

Negative (FN) of the class

i∈ Yk∪ Yu

under the given threshold

, respectively. Note that we omit

the classiﬁer hand the rejector Rsince there exists no ambiguity.

Open-set F-score [

] is a representative classiﬁcation-based metric for OSR. Compared with its

close-set counterpart, this metric evaluates the open-set performance via

FPi

and

FNi

, where

i∈ Yk

To be speciﬁc, this metric summarizes

TPi,FPi,FNi

of known classes by the harmonic mean of

Precision and TPR (i.e., Recall):

F-score := 2 ×Pk×TPRk

Pk+TPRk

,(1)

where

Pk:= 1

i=1

TPi

TPi+FPi

,TPRk:= 1

i=1

TPi

TPi+FNi

(2)

if one aggregates model performances in a macro manner, and

Pk:= PC

i=1 TPi

i=1 (TPi+FPi),TPRk:= PC

i=1 TPi

i=1 (TPi+FNi)(3)

when model performances are summarized in a micro manner. Compared with open-set F-score,

Youden’s index additionally considers TNi, where i∈ Yk[17,22]:

J:= TPRk+TNRk−1,(4)

where

TNRk

denotes the TNR of known classes. However, as illustrated in Prop.1, these two metrics

suffer from an inconsistency property. Please refer to Appendix.B.1 for the proof.

Proposition 1

(Inconsistency Property I)

Given a dataset

and a metric

that is invariant to

TPC+1

FNC+1

and

FPC+1

, then for any

(h, R)

such that

i=1 FPi(h, R)≥TPC+1(h, R)

, there

exists (˜

h, ˜

R)such that M(˜

h, ˜

R) = M(h, R)but TPC+1(˜

h, ˜

R) = 0.

Remark 1.

If a metric

suffers from the inconsistency property I, then for any

(h, R)

, we can

construct

(˜

h, ˜

that performs as well as

(h, R)

but actually misclassiﬁes all the open-set

samples as known classes, which is inconsistent with (P2).

Remark 2. PC

i=1 FPi(h, R)≥TPC+1(h, R)

is a mild condition. To be speciﬁc, when

TPC+1

O(C)

, it only requires that

FPi

O(1)

for any

i∈ Yk

. What’s more, even if this condition does not

hold, we still have TPC+1(˜

h, ˜

R)<TPC+1(h, R)as long as PC

i=1 FPi(h, R)6= 0.

Corollary 1. Open-set F-score and Youden’s index both suffer from the inconsistency property I.

3.2 Normalized Accuracy

Normalized Accuracy (NAcc) [

] summaries the accuracy performances on close-set and open-set:

NAcc := λnaAKS + (1 −λna)AUS,(5)

where λna ∈(0,1) is the balance constant, and

AKS := PC

i=1 [TPi+TNi]

i=1 [TPi+TNi+FPi+FNi],AUS := TPC+1

TPC+1 +FPC+1

(6)

are the Accuracy on Known and Unknown Samples (AKS, AUS), respectively. Since the close-set

performance is explicitly involved, NAcc avoids the inconsistency property I. Ideally, if

λna =

P[y=C+ 1]

, NAcc becomes exactly the close-set accuracy. However, it is generally hard to decide

the balance constant

λna

since we have no idea about the ratio of open-set samples in the test set.

Besides, as shown in Prop.2, this metric suffers from another type of inconsistency property. Please

refer to Appendix.B.2 for the proof.

Proposition 2

(Inconsistency Property II)

Given a dataset

, for any classiﬁer-rejector pair

(h, R)

such that

i=1 FNi(h, R)≥TPC+1(h, R)

and

TPC+1(h, R)>FPC+1(h, R)

, there exists

(˜

h, ˜

such that NAcc(˜

h, ˜

R)>NAcc(h, R)but TPC+1(˜

h, ˜

R) = 0.

Remark 3.

For any

(h, R)

, we can construct

(˜

h, ˜

such that

NAcc(˜

h, ˜

R)>NAcc(h, R)

but actually

misclassiﬁes all the open-set samples to known classes. In other words, NAcc encourages selecting a

threshold that classiﬁes more open-set samples to known classes, which is inconsistent with (P3).

Remark 4.

Similar to the condition in Prop.1,

i=1 FNi(h, R)≥TPC+1(h, R)

is a mild condition.

And TPC+1(h, R)>FPC+1(h, R)is also mild since it is a basic requirement for open-set models.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OpenAUC:TowardsAUC-OrientedOpen-SetRecognitionZitaiWang1;2QianqianXu3ZhiyongYang4YuanHe5XiaochunCao6;1QingmingHuang4;3;7;81SKLOIS,InstituteofInformationEngineering,CAS2SchoolofCyberSecurity,UniversityofChineseAcademyofSciences3KeyLab.ofIntelligentInformationProcessing,InstituteofComputingTech.,CAS...

展开>> 收起<<

OpenAUC Towards AUC-Oriented Open-Set Recognition Zitai Wang12Qianqian Xu3Zhiyong Yang4.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

OpenAUC Towards AUC-Oriented Open-Set Recognition Zitai Wang12Qianqian Xu3Zhiyong Yang4

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: