This is a pre-print of the original paper accepted at the Winter Conference on Applications of Computer Vision WACV 2023. Large-Scale Open-Set Classiﬁcation Protocols for ImageNet Andres Palechor Annesha Bhoumik Manuel G unther

2025-04-26 0 0 866.89KB 20 页 10玖币

This is a pre-print of the original paper accepted at the Winter Conference on Applications of Computer Vision (WACV) 2023.

Large-Scale Open-Set Classiﬁcation Protocols for ImageNet

Andres Palechor Annesha Bhoumik Manuel G¨unther

Department of Informatics, University of Zurich, Andreasstrasse 15, CH-8050 Zurich

https://www.ifi.uzh.ch/en/aiml.html

Abstract

Open-Set Classiﬁcation (OSC) intends to adapt

closed-set classiﬁcation models to real-world scenar-

ios, where the classiﬁer must correctly label samples

of known classes while rejecting previously unseen un-

known samples. Only recently, research started to in-

vestigate on algorithms that are able to handle these

unknown samples correctly. Some of these approaches

address OSC by including into the training set negative

samples that a classiﬁer learns to reject, expecting that

these data increase the robustness of the classiﬁer on

unknown classes. Most of these approaches are evalu-

ated on small-scale and low-resolution image datasets

like MNIST, SVHN or CIFAR, which makes it diﬃ-

cult to assess their applicability to the real world, and

to compare them among each other. We propose three

open-set protocols that provide rich datasets of natural

images with diﬀerent levels of similarity between known

and unknown classes. The protocols consist of subsets

of ImageNet classes selected to provide training and

testing data closer to real-world scenarios. Addition-

ally, we propose a new validation metric that can be

employed to assess whether the training of deep learn-

ing models addresses both the classiﬁcation of known

samples and the rejection of unknown samples. We

use the protocols to compare the performance of two

baseline open-set algorithms to the standard SoftMax

baseline and ﬁnd that the algorithms work well on neg-

ative samples that have been seen during training, and

partially on out-of-distribution detection tasks, but drop

performance in the presence of samples from previously

unseen unknown classes.

1. Introduction

Automatic classiﬁcation of objects in images has

been an active direction of research for several decades

now. The advent of Deep Learning has brought algo-

rithms to a stage where they can handle large amounts

of data and produce classiﬁcation accuracies that were

beyond imagination a decade before. Supervised im-

age classiﬁcation algorithms have achieved tremendous

success when it comes to detecting classes from a ﬁnite

number of known classes, what is commonly known

as evaluation under the closed-set assumption. For

example, the deep learning algorithms that attempt

the classiﬁcation of ten handwritten digits [16] achieve

more than 99% accuracy when presented with a digit,

but it ignores the fact that the classiﬁer might be con-

fronted with non-digit images during testing [6]. Even

the well-known ImageNet Large Scale Visual Recog-

nition Challenge (ILSVRC) [26] contains 1000 classes

during training, and the test set contains samples from

exactly these 1000 classes, while the real world contains

many more classes, e.g., the WordNet hierarchy [19]

currently knows more than 100’000 classes.1Training

a categorical classiﬁer that can diﬀerentiate all these

classes is currently not possible – only feature compar-

ison approaches [22] exist – and, hence, we have to deal

with samples that we do not know how to classify.

Only recently, research on methods to improve clas-

siﬁcation in presence of unknown samples has gained

more attraction. These are samples from previously

unseen classes that might occur during deployment of

the algorithm in the real world and that the algorithm

needs to handle correctly by not assigning them to any

of the known classes. Bendale and Boult [2] provided

the ﬁrst algorithm that incorporates the possibility to

reject a sample as unknown into a deep network that

was trained on a ﬁnite set of known classes. Later,

other algorithms were developed to improve the detec-

tion of unknown samples. Many of these algorithms

require to train on samples from some of the unknown

classes that do not belong to the known classes of in-

terest – commonly, these classes are called known un-

known [18], but since this formulation is more confusing

than helpful, we will term these classes as the negative

classes. For example, Dhamija et al. [6] employed sam-

ples from a diﬀerent dataset, i.e., they trained their sys-

tem on MNIST as known classes and selected EMNIST

1https://wordnet.princeton.edu

arXiv:2210.06789v2 [cs.CV] 18 Oct 2022

Physical

Entity

Organism Artifact

Animal

Fungus

Dog

Hunting

Dog

Fox Wolf Bear VehicleDevice

Ungulate Musteline

Bird FurnitureInsect Car Truck

Food

Computer

Monkey

Fish BoatFruit

Known

Unknown

Negative

Figure 1: Class Sampling in our Open-Set Protocols.We make use of the WordNet hierarchy [19] to deﬁne

three protocols of diﬀerent diﬃculties. In this ﬁgure, we show the superclasses from which we sample the ﬁnal classes, all of

which are leaf nodes taken from the ILSVRC 2012 dataset. Dashed lines indicate that the lower nodes are descendants, but

they might not be direct children of the upper nodes. Additionally, all nodes have more descendants than those shown in

the ﬁgure. The colored bars below a class indicate that its subclasses are sampled for the purposes shown in the top-left of

the ﬁgure. For example, all subclasses of “Dog” are used as known classes in protocol P1, while the subclasses of “Hunting

Dog” are partitioned into known and negatives in protocol P2. For protocol P3, several intermediate nodes are partitioned

into known, negative and unknown classes.

letters as negatives. Other approaches try to create

negative samples by utilizing known classes in diﬀerent

ways, e.g., Ge et al. [8] used a generative model to form

negative samples, while Zhou et al. [30] try to utilize

internal representations of mixed known samples.

One issue that is inherent in all of these approaches

– with only a few exceptions [2, 25] – is that they eval-

uate only on small-scale datasets with a few known

classes, such as 10 classes in MNIST [16], CIFAR-10

[14], SVHN[21] or mixtures of these. While many algo-

rithms claim that they can handle unknown classes,

the number of known classes is low, and it is un-

clear whether these algorithms can handle more known

classes, or more diverse sets of unknown classes. Only

lately, a large-scale open-set validation protocol is de-

ﬁned on ImageNet [28], but it only separates unknown

samples based on visual2and not semantic similarity.

Another issue of research on open-set classiﬁcation is

that most of the employed evaluation criteria, such as

accuracy, macro-F1 or ROC metrics, do not evaluate

open-set classiﬁcation as it would be used in a real-

world task. Particularly, the currently employed vali-

dation metrics that are used during training a network

do not reﬂect the target task and, thus, it is unclear

whether the selected model is actually the best model

for the desired task.

In this paper we, therefore, propose large-scale open-

set recognition protocols that can be used to train and

test various open-set algorithms – and we will show-

2In fact, Vaze et al. [28] do not specify their criteria to se-

lect unknown classes and only mention visual similarity in their

supplemental material.

case the performance of three simple algorithms in this

paper. We decided to build our protocols based on the

well-known and well-investigated ILSVRC 2012 dataset

[26], and we build three evaluation protocols P1,P2

and P3that provide various diﬃculties based on the

WordNet hierarchy [19], as displayed in Fig. 1. The

protocols are publicly available,3including source code

for the baseline implementations and the evaluation,

which enables the reproduction of the results presented

in this paper. With these new protocols, we hope to

foster more comparable and reproducible research into

the direction of open-set object classiﬁcation as well

as related topics such as out-of-distribution detection.

This allows researchers to test their algorithms on our

protocols and directly compare with our results.

The contributions of this paper are as follows:

•We introduce three novel open-set evaluation pro-

tocols with diﬀerent complexities for the ILSVRC

2012 dataset.

•We propose a novel evaluation metric that can be

used for validation purposes when training open-

set classiﬁers.

•We train deep networks with three diﬀerent tech-

niques and report their open-set performances.

•We provide all source code3for training and eval-

uation of our models to the research community.

3https://github.com/AIML-IfI/openset-imagenet

2. Related Work

In open-set classiﬁcation, a classiﬁer is expected to

correctly classify known test samples into their respec-

tive classes, and correctly detect that unknown test

samples do not belong to any known class. The study

of unknown instances is not new in the literature. For

example, novelty detection, which is also known as

anomaly detection and has a high overlap with out-

of-distribution detection, focuses on identifying test in-

stances that do not belong to training classes. It can be

seen as a binary classiﬁcation problem that determines

if an instance belongs to any of the training classes or

not, but without exactly deciding which class [4], and

includes approaches in supervised, semi-supervised and

unsupervised learning [13, 23, 10].

However, all these approaches only consider the clas-

siﬁcation of samples into known and unknown, leaving

the later classiﬁcation of known samples into their re-

spective classes as a second step. Ideally, these two

steps should be incorporated into one method. An easy

approach would be to threshold on the maximum class

probability of the SoftMax classiﬁer using a conﬁdence

threshold, assuming that for an unknown input, the

probability would be distributed across all the classes

and, hence, would be low [17]. Unfortunately, often in-

puts overlap signiﬁcantly with known decision regions

and tend to get misclassiﬁed as a known class with

high conﬁdence [6]. It is therefore essential to devise

techniques that are more eﬀective than simply thresh-

olding SoftMax probabilities in detecting unknown in-

puts. Some initial approaches include extensions of 1-

class and binary Support Vector Machines (SVMs) as

implemented by Scheirer et al. [27] and devising recog-

nition systems to continuously learn new classes [1, 25].

While the above methods make use only of known

samples in order to disassociate unknown samples,

other approaches require samples of some negative

classes, hoping that these samples generalize to all un-

seen classes. For example, Dhamija et al. [6] utilize neg-

ative samples to train the network to provide low conﬁ-

dence values for all known classes when presented with

a sample from an unknown class. Many researchers

[8, 29, 20] utilize generative adversarial networks to

produce negative samples from the known samples.

Zhou et al. [30] combined pairs of known samples to

deﬁne negatives, both in input space and deeper in the

network. Other approaches to open-set recognition are

discussed by Geng et al. [9].

One problem that all the above methods possess is

that they are evaluated on small-scale datasets with

low-resolution images and low numbers of classes. Such

datasets include MNIST [16], SVHN [21] and CIFAR-

10 [14] where oftentimes a few random classes are used

as known and the remaining classes as unknown [9].

Sometimes, other datasets serve the roles of unknowns,

e.g., when MNIST build the known classes, EMNIST

letters [11] are used as negatives and/or unknowns.

Similarly, the known classes are composed of CIFAR-10

and other classes from CIFAR-100 or SVHN are nega-

tives or unknowns [15, 6]. Only few papers make use

of large-scale datasets such as ImageNet, where they

either use the classes of ILSVRC 2012 as known and

other classes from ImageNet as unknown [2, 28], or

random partitions of ImageNet [25, 24].

Oftentimes, evaluation protocols are home-grown

and, thus, the comparison across algorithms is very

diﬃcult. Additionally, there is no clear distinction on

the similarities between known, negative and unknown

classes, which makes it impossible to judge in which

scenarios a method will work, and in which not. Fi-

nally, the employed evaluation metrics are most often

not designed for open-set classiﬁcation and, hence, fail

to address typical use-cases of open-set recognition.

3. Approach

3.1. ImageNet Protocols

Based on [3], we design three diﬀerent protocols to

create three diﬀerent artiﬁcial open spaces, with in-

creasing level of similarity in appearance between in-

puts – and increasing complexity and overlap between

features – of known and unknown classes. To allow

for the comparison of algorithms that require negative

samples for training, we carefully design and include

negative classes into our protocols. This also allows

us to compare how well these algorithms work on pre-

viously seen negative classes and how on previously

unseen unknown classes.

In order to deﬁne our three protocols, we make use

of the WordNet hierarchy that provides us with a tree

structure for the 1000 classes of ILSVRC 2012. Partic-

ularly, we exploit the robustness Python library [7]

to parse the ILSVRC tree. All the classes in ILSVRC

are represented as leaf nodes of that graph, and we

use descendants of several intermediate nodes to form

our known and unknown classes. The deﬁnition of the

protocols and their open-set partitions are presented in

Fig. 1, a more detailed listing of classes can be found

in the supplemental material. We design the protocols

such that the diﬃculty levels of closed- and open-set

evaluation varies. While protocol P1is easy for open-

set, it is hard for closed-set classiﬁcation. On the con-

trary, P3is more easy for closed-set classiﬁcation and

more diﬃcult in open-set. Finally, P2is somewhere in

the middle, but small enough to run hyperparameter

optimization that can be transferred to P1and P3.

Table 1: ImageNet Classes used in the Protocols.This table shows the ImageNet parent classes that were

used to create the three protocols. Known and negative classes are used for training the open-set algorithms, while known,

negative and unknown classes are used in testing. Given are the numbers of classes: training / validation / test samples.

Known Negative Unknown

All dog classes Other 4-legged animal classes Non-animal classes

116: 116218 / 29055 / 5800 67: 69680 / 17420 / 3350 166 : — / — / 8300

Half of hunting dog classes Half of hunting dog classes Other 4-legged animal classes

30: 28895 / 7224 / 1500 31: 31794 / 7949 / 1550 55: — / — / 2750

Mix of common classes including

animals, plants and objects

Mix of common classes including

animals, plants and objects

Mix of common classes including

animals, plants and objects

151: 154522 / 38633 / 7550 97: 98202 / 24549 / 4850 164: — / — / 8200

In the ﬁrst protocol P1, known and unknown classes

are semantically quite distant, and also do not share

too many visual features. We include all 116 dog

classes as known classes – since dogs represent the

largest ﬁne-grained intermediate category in ImageNet

which makes closed-set classiﬁcation diﬃcult – and

select 166 non-animal classes as unknowns. P1can,

therefore, be used to test out-of-distribution detec-

tion algorithms since knowns and unknowns are not

very similar. In the second protocol P2, we only look

into the animal classes. Particularly, we use several

hunting dog classes as known and other classes of 4-

legged animals as unknown. This means that known

and unknown classes are still semantically relatively

distant, but image features such as fur is shared be-

tween known and unknown. This will make it harder

for out-of-distribution detection algorithms to perform

well. Finally, the third protocol P3includes ancestors

of various diﬀerent classes, both as known and un-

known classes, by making use of the mixed 13 classes

deﬁned in the robustness library. Since known and

unknown classes come from the same ancestors, it is

very unlikely that out-of-distribution detection algo-

rithms will be able to discriminate between them, and

real open-set classiﬁcation methods need to be applied.

To enable algorithms that require negative samples,

the negative classes are selected semantically similar

to the known or at least in-between the known and

the unknown. It has been shown that selecting nega-

tive samples too far from the known classes does not

help in creating better-suited open-set algorithms [6].

Naturally, we can only deﬁne semantic similarity based

on the WordNet hierarchy, but it is unclear whether

these negative classes are also structurally similar to

the known classes. Tab. 1 displays a summary of the

parent classes used in the protocols, and a detailed list

of all classes is presented in the supplemental material.

Finally, we split our data into three partitions, one

for training, one for validation and one for testing. The

training and validation partitions are taken from the

original ILSVRC 2012 training images by randomly

splitting oﬀ 80% for training and 20% for validation.

Since training and validation partitions are composed

of known and negative data only, no unknown data is

provided here. The test partition is composed of the

original ILSVRC validation set containing 50 images

per class, and is available for all three groups of data:

known, negative and unknown. This assures that dur-

ing testing no single image is used that the network has

seen in any stage of the training.

3.2. Open-Set Classiﬁcation Algorithms

We select three diﬀerent techniques to train deep

networks. While other algorithms shall be tested in

future work, we rely on three simple, very similar and

well-known methods. In particular, all three loss func-

tions solely utilize the plain categorical cross-entropy

loss JCCE on top of SoftMax activations (often termed

as the SoftMax loss) in diﬀerent settings. Generally,

the weighted categorical cross-entropy loss is:

JCCE =−1

n=1

c=1

wctn,c log yn,c (1)

where Nis the number of samples in our dataset

(note that we utilize batch processing), tn,c is the

target label of the nth sample for class c,wcis a

class-weight for class cand yn,c is the output prob-

ability of class cfor sample nusing SoftMax activation:

yc,n =ezc,n

c0=1

ezc0,n

(2)

of the logits zn,c, which are the network outputs.

The three diﬀerent training approaches diﬀer with

respect to the targets tn,c and the weights wc, and how

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Thisisapre-printoftheoriginalpaperacceptedattheWinterConferenceonApplicationsofComputerVision(WACV)2023.Large-ScaleOpen-SetClassicationProtocolsforImageNetAndresPalechorAnneshaBhoumikManuelGuntherDepartmentofInformatics,UniversityofZurich,Andreasstrasse15,CH-8050Zurichhttps://www.ifi.uzh.ch/en/aim...

展开>> 收起<<

This is a pre-print of the original paper accepted at the Winter Conference on Applications of Computer Vision WACV 2023. Large-Scale Open-Set Classiﬁcation Protocols for ImageNet Andres Palechor Annesha Bhoumik Manuel G unther.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

This is a pre-print of the original paper accepted at the Winter Conference on Applications of Computer Vision WACV 2023. Large-Scale Open-Set Classiﬁcation Protocols for ImageNet Andres Palechor Annesha Bhoumik Manuel G unther

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: