This is a pre-print of the original paper accepted at the Winter Conference on Applications of Computer Vision WACV 2023. Large-Scale Open-Set Classification Protocols for ImageNet Andres Palechor Annesha Bhoumik Manuel G unther

2025-04-26 0 0 866.89KB 20 页 10玖币
侵权投诉
This is a pre-print of the original paper accepted at the Winter Conference on Applications of Computer Vision (WACV) 2023.
Large-Scale Open-Set Classification Protocols for ImageNet
Andres Palechor Annesha Bhoumik Manuel G¨unther
Department of Informatics, University of Zurich, Andreasstrasse 15, CH-8050 Zurich
https://www.ifi.uzh.ch/en/aiml.html
Abstract
Open-Set Classification (OSC) intends to adapt
closed-set classification models to real-world scenar-
ios, where the classifier must correctly label samples
of known classes while rejecting previously unseen un-
known samples. Only recently, research started to in-
vestigate on algorithms that are able to handle these
unknown samples correctly. Some of these approaches
address OSC by including into the training set negative
samples that a classifier learns to reject, expecting that
these data increase the robustness of the classifier on
unknown classes. Most of these approaches are evalu-
ated on small-scale and low-resolution image datasets
like MNIST, SVHN or CIFAR, which makes it diffi-
cult to assess their applicability to the real world, and
to compare them among each other. We propose three
open-set protocols that provide rich datasets of natural
images with different levels of similarity between known
and unknown classes. The protocols consist of subsets
of ImageNet classes selected to provide training and
testing data closer to real-world scenarios. Addition-
ally, we propose a new validation metric that can be
employed to assess whether the training of deep learn-
ing models addresses both the classification of known
samples and the rejection of unknown samples. We
use the protocols to compare the performance of two
baseline open-set algorithms to the standard SoftMax
baseline and find that the algorithms work well on neg-
ative samples that have been seen during training, and
partially on out-of-distribution detection tasks, but drop
performance in the presence of samples from previously
unseen unknown classes.
1. Introduction
Automatic classification of objects in images has
been an active direction of research for several decades
now. The advent of Deep Learning has brought algo-
rithms to a stage where they can handle large amounts
of data and produce classification accuracies that were
beyond imagination a decade before. Supervised im-
age classification algorithms have achieved tremendous
success when it comes to detecting classes from a finite
number of known classes, what is commonly known
as evaluation under the closed-set assumption. For
example, the deep learning algorithms that attempt
the classification of ten handwritten digits [16] achieve
more than 99% accuracy when presented with a digit,
but it ignores the fact that the classifier might be con-
fronted with non-digit images during testing [6]. Even
the well-known ImageNet Large Scale Visual Recog-
nition Challenge (ILSVRC) [26] contains 1000 classes
during training, and the test set contains samples from
exactly these 1000 classes, while the real world contains
many more classes, e.g., the WordNet hierarchy [19]
currently knows more than 100’000 classes.1Training
a categorical classifier that can differentiate all these
classes is currently not possible – only feature compar-
ison approaches [22] exist – and, hence, we have to deal
with samples that we do not know how to classify.
Only recently, research on methods to improve clas-
sification in presence of unknown samples has gained
more attraction. These are samples from previously
unseen classes that might occur during deployment of
the algorithm in the real world and that the algorithm
needs to handle correctly by not assigning them to any
of the known classes. Bendale and Boult [2] provided
the first algorithm that incorporates the possibility to
reject a sample as unknown into a deep network that
was trained on a finite set of known classes. Later,
other algorithms were developed to improve the detec-
tion of unknown samples. Many of these algorithms
require to train on samples from some of the unknown
classes that do not belong to the known classes of in-
terest – commonly, these classes are called known un-
known [18], but since this formulation is more confusing
than helpful, we will term these classes as the negative
classes. For example, Dhamija et al. [6] employed sam-
ples from a different dataset, i.e., they trained their sys-
tem on MNIST as known classes and selected EMNIST
1https://wordnet.princeton.edu
arXiv:2210.06789v2 [cs.CV] 18 Oct 2022
Physical
Entity
Organism Artifact
Animal
Fungus
Dog
Hunting
Dog
Fox Wolf Bear VehicleDevice
Ungulate Musteline
Bird FurnitureInsect Car Truck
Food
Computer
Monkey
Fish BoatFruit
P1
P2
P3
Known
Unknown
Negative
Figure 1: Class Sampling in our Open-Set Protocols.We make use of the WordNet hierarchy [19] to define
three protocols of different difficulties. In this figure, we show the superclasses from which we sample the final classes, all of
which are leaf nodes taken from the ILSVRC 2012 dataset. Dashed lines indicate that the lower nodes are descendants, but
they might not be direct children of the upper nodes. Additionally, all nodes have more descendants than those shown in
the figure. The colored bars below a class indicate that its subclasses are sampled for the purposes shown in the top-left of
the figure. For example, all subclasses of “Dog” are used as known classes in protocol P1, while the subclasses of “Hunting
Dog” are partitioned into known and negatives in protocol P2. For protocol P3, several intermediate nodes are partitioned
into known, negative and unknown classes.
letters as negatives. Other approaches try to create
negative samples by utilizing known classes in different
ways, e.g., Ge et al. [8] used a generative model to form
negative samples, while Zhou et al. [30] try to utilize
internal representations of mixed known samples.
One issue that is inherent in all of these approaches
– with only a few exceptions [2, 25] – is that they eval-
uate only on small-scale datasets with a few known
classes, such as 10 classes in MNIST [16], CIFAR-10
[14], SVHN[21] or mixtures of these. While many algo-
rithms claim that they can handle unknown classes,
the number of known classes is low, and it is un-
clear whether these algorithms can handle more known
classes, or more diverse sets of unknown classes. Only
lately, a large-scale open-set validation protocol is de-
fined on ImageNet [28], but it only separates unknown
samples based on visual2and not semantic similarity.
Another issue of research on open-set classification is
that most of the employed evaluation criteria, such as
accuracy, macro-F1 or ROC metrics, do not evaluate
open-set classification as it would be used in a real-
world task. Particularly, the currently employed vali-
dation metrics that are used during training a network
do not reflect the target task and, thus, it is unclear
whether the selected model is actually the best model
for the desired task.
In this paper we, therefore, propose large-scale open-
set recognition protocols that can be used to train and
test various open-set algorithms – and we will show-
2In fact, Vaze et al. [28] do not specify their criteria to se-
lect unknown classes and only mention visual similarity in their
supplemental material.
case the performance of three simple algorithms in this
paper. We decided to build our protocols based on the
well-known and well-investigated ILSVRC 2012 dataset
[26], and we build three evaluation protocols P1,P2
and P3that provide various difficulties based on the
WordNet hierarchy [19], as displayed in Fig. 1. The
protocols are publicly available,3including source code
for the baseline implementations and the evaluation,
which enables the reproduction of the results presented
in this paper. With these new protocols, we hope to
foster more comparable and reproducible research into
the direction of open-set object classification as well
as related topics such as out-of-distribution detection.
This allows researchers to test their algorithms on our
protocols and directly compare with our results.
The contributions of this paper are as follows:
We introduce three novel open-set evaluation pro-
tocols with different complexities for the ILSVRC
2012 dataset.
We propose a novel evaluation metric that can be
used for validation purposes when training open-
set classifiers.
We train deep networks with three different tech-
niques and report their open-set performances.
We provide all source code3for training and eval-
uation of our models to the research community.
3https://github.com/AIML-IfI/openset-imagenet
2. Related Work
In open-set classification, a classifier is expected to
correctly classify known test samples into their respec-
tive classes, and correctly detect that unknown test
samples do not belong to any known class. The study
of unknown instances is not new in the literature. For
example, novelty detection, which is also known as
anomaly detection and has a high overlap with out-
of-distribution detection, focuses on identifying test in-
stances that do not belong to training classes. It can be
seen as a binary classification problem that determines
if an instance belongs to any of the training classes or
not, but without exactly deciding which class [4], and
includes approaches in supervised, semi-supervised and
unsupervised learning [13, 23, 10].
However, all these approaches only consider the clas-
sification of samples into known and unknown, leaving
the later classification of known samples into their re-
spective classes as a second step. Ideally, these two
steps should be incorporated into one method. An easy
approach would be to threshold on the maximum class
probability of the SoftMax classifier using a confidence
threshold, assuming that for an unknown input, the
probability would be distributed across all the classes
and, hence, would be low [17]. Unfortunately, often in-
puts overlap significantly with known decision regions
and tend to get misclassified as a known class with
high confidence [6]. It is therefore essential to devise
techniques that are more effective than simply thresh-
olding SoftMax probabilities in detecting unknown in-
puts. Some initial approaches include extensions of 1-
class and binary Support Vector Machines (SVMs) as
implemented by Scheirer et al. [27] and devising recog-
nition systems to continuously learn new classes [1, 25].
While the above methods make use only of known
samples in order to disassociate unknown samples,
other approaches require samples of some negative
classes, hoping that these samples generalize to all un-
seen classes. For example, Dhamija et al. [6] utilize neg-
ative samples to train the network to provide low confi-
dence values for all known classes when presented with
a sample from an unknown class. Many researchers
[8, 29, 20] utilize generative adversarial networks to
produce negative samples from the known samples.
Zhou et al. [30] combined pairs of known samples to
define negatives, both in input space and deeper in the
network. Other approaches to open-set recognition are
discussed by Geng et al. [9].
One problem that all the above methods possess is
that they are evaluated on small-scale datasets with
low-resolution images and low numbers of classes. Such
datasets include MNIST [16], SVHN [21] and CIFAR-
10 [14] where oftentimes a few random classes are used
as known and the remaining classes as unknown [9].
Sometimes, other datasets serve the roles of unknowns,
e.g., when MNIST build the known classes, EMNIST
letters [11] are used as negatives and/or unknowns.
Similarly, the known classes are composed of CIFAR-10
and other classes from CIFAR-100 or SVHN are nega-
tives or unknowns [15, 6]. Only few papers make use
of large-scale datasets such as ImageNet, where they
either use the classes of ILSVRC 2012 as known and
other classes from ImageNet as unknown [2, 28], or
random partitions of ImageNet [25, 24].
Oftentimes, evaluation protocols are home-grown
and, thus, the comparison across algorithms is very
difficult. Additionally, there is no clear distinction on
the similarities between known, negative and unknown
classes, which makes it impossible to judge in which
scenarios a method will work, and in which not. Fi-
nally, the employed evaluation metrics are most often
not designed for open-set classification and, hence, fail
to address typical use-cases of open-set recognition.
3. Approach
3.1. ImageNet Protocols
Based on [3], we design three different protocols to
create three different artificial open spaces, with in-
creasing level of similarity in appearance between in-
puts – and increasing complexity and overlap between
features – of known and unknown classes. To allow
for the comparison of algorithms that require negative
samples for training, we carefully design and include
negative classes into our protocols. This also allows
us to compare how well these algorithms work on pre-
viously seen negative classes and how on previously
unseen unknown classes.
In order to define our three protocols, we make use
of the WordNet hierarchy that provides us with a tree
structure for the 1000 classes of ILSVRC 2012. Partic-
ularly, we exploit the robustness Python library [7]
to parse the ILSVRC tree. All the classes in ILSVRC
are represented as leaf nodes of that graph, and we
use descendants of several intermediate nodes to form
our known and unknown classes. The definition of the
protocols and their open-set partitions are presented in
Fig. 1, a more detailed listing of classes can be found
in the supplemental material. We design the protocols
such that the difficulty levels of closed- and open-set
evaluation varies. While protocol P1is easy for open-
set, it is hard for closed-set classification. On the con-
trary, P3is more easy for closed-set classification and
more difficult in open-set. Finally, P2is somewhere in
the middle, but small enough to run hyperparameter
optimization that can be transferred to P1and P3.
Table 1: ImageNet Classes used in the Protocols.This table shows the ImageNet parent classes that were
used to create the three protocols. Known and negative classes are used for training the open-set algorithms, while known,
negative and unknown classes are used in testing. Given are the numbers of classes: training / validation / test samples.
Known Negative Unknown
P1
All dog classes Other 4-legged animal classes Non-animal classes
116: 116218 / 29055 / 5800 67: 69680 / 17420 / 3350 166 : — / — / 8300
P2
Half of hunting dog classes Half of hunting dog classes Other 4-legged animal classes
30: 28895 / 7224 / 1500 31: 31794 / 7949 / 1550 55: — / — / 2750
P3
Mix of common classes including
animals, plants and objects
Mix of common classes including
animals, plants and objects
Mix of common classes including
animals, plants and objects
151: 154522 / 38633 / 7550 97: 98202 / 24549 / 4850 164: — / — / 8200
In the first protocol P1, known and unknown classes
are semantically quite distant, and also do not share
too many visual features. We include all 116 dog
classes as known classes – since dogs represent the
largest fine-grained intermediate category in ImageNet
which makes closed-set classification difficult – and
select 166 non-animal classes as unknowns. P1can,
therefore, be used to test out-of-distribution detec-
tion algorithms since knowns and unknowns are not
very similar. In the second protocol P2, we only look
into the animal classes. Particularly, we use several
hunting dog classes as known and other classes of 4-
legged animals as unknown. This means that known
and unknown classes are still semantically relatively
distant, but image features such as fur is shared be-
tween known and unknown. This will make it harder
for out-of-distribution detection algorithms to perform
well. Finally, the third protocol P3includes ancestors
of various different classes, both as known and un-
known classes, by making use of the mixed 13 classes
defined in the robustness library. Since known and
unknown classes come from the same ancestors, it is
very unlikely that out-of-distribution detection algo-
rithms will be able to discriminate between them, and
real open-set classification methods need to be applied.
To enable algorithms that require negative samples,
the negative classes are selected semantically similar
to the known or at least in-between the known and
the unknown. It has been shown that selecting nega-
tive samples too far from the known classes does not
help in creating better-suited open-set algorithms [6].
Naturally, we can only define semantic similarity based
on the WordNet hierarchy, but it is unclear whether
these negative classes are also structurally similar to
the known classes. Tab. 1 displays a summary of the
parent classes used in the protocols, and a detailed list
of all classes is presented in the supplemental material.
Finally, we split our data into three partitions, one
for training, one for validation and one for testing. The
training and validation partitions are taken from the
original ILSVRC 2012 training images by randomly
splitting off 80% for training and 20% for validation.
Since training and validation partitions are composed
of known and negative data only, no unknown data is
provided here. The test partition is composed of the
original ILSVRC validation set containing 50 images
per class, and is available for all three groups of data:
known, negative and unknown. This assures that dur-
ing testing no single image is used that the network has
seen in any stage of the training.
3.2. Open-Set Classification Algorithms
We select three different techniques to train deep
networks. While other algorithms shall be tested in
future work, we rely on three simple, very similar and
well-known methods. In particular, all three loss func-
tions solely utilize the plain categorical cross-entropy
loss JCCE on top of SoftMax activations (often termed
as the SoftMax loss) in different settings. Generally,
the weighted categorical cross-entropy loss is:
JCCE =1
N
N
X
n=1
C
X
c=1
wctn,c log yn,c (1)
where Nis the number of samples in our dataset
(note that we utilize batch processing), tn,c is the
target label of the nth sample for class c,wcis a
class-weight for class cand yn,c is the output prob-
ability of class cfor sample nusing SoftMax activation:
yc,n =ezc,n
C
P
c0=1
ezc0,n
(2)
of the logits zn,c, which are the network outputs.
The three different training approaches differ with
respect to the targets tn,c and the weights wc, and how
摘要:

Thisisapre-printoftheoriginalpaperacceptedattheWinterConferenceonApplicationsofComputerVision(WACV)2023.Large-ScaleOpen-SetClassi cationProtocolsforImageNetAndresPalechorAnneshaBhoumikManuelGuntherDepartmentofInformatics,UniversityofZurich,Andreasstrasse15,CH-8050Zurichhttps://www.ifi.uzh.ch/en/aim...

展开>> 收起<<
This is a pre-print of the original paper accepted at the Winter Conference on Applications of Computer Vision WACV 2023. Large-Scale Open-Set Classification Protocols for ImageNet Andres Palechor Annesha Bhoumik Manuel G unther.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:866.89KB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注