2. Related Work
In open-set classification, a classifier is expected to
correctly classify known test samples into their respec-
tive classes, and correctly detect that unknown test
samples do not belong to any known class. The study
of unknown instances is not new in the literature. For
example, novelty detection, which is also known as
anomaly detection and has a high overlap with out-
of-distribution detection, focuses on identifying test in-
stances that do not belong to training classes. It can be
seen as a binary classification problem that determines
if an instance belongs to any of the training classes or
not, but without exactly deciding which class [4], and
includes approaches in supervised, semi-supervised and
unsupervised learning [13, 23, 10].
However, all these approaches only consider the clas-
sification of samples into known and unknown, leaving
the later classification of known samples into their re-
spective classes as a second step. Ideally, these two
steps should be incorporated into one method. An easy
approach would be to threshold on the maximum class
probability of the SoftMax classifier using a confidence
threshold, assuming that for an unknown input, the
probability would be distributed across all the classes
and, hence, would be low [17]. Unfortunately, often in-
puts overlap significantly with known decision regions
and tend to get misclassified as a known class with
high confidence [6]. It is therefore essential to devise
techniques that are more effective than simply thresh-
olding SoftMax probabilities in detecting unknown in-
puts. Some initial approaches include extensions of 1-
class and binary Support Vector Machines (SVMs) as
implemented by Scheirer et al. [27] and devising recog-
nition systems to continuously learn new classes [1, 25].
While the above methods make use only of known
samples in order to disassociate unknown samples,
other approaches require samples of some negative
classes, hoping that these samples generalize to all un-
seen classes. For example, Dhamija et al. [6] utilize neg-
ative samples to train the network to provide low confi-
dence values for all known classes when presented with
a sample from an unknown class. Many researchers
[8, 29, 20] utilize generative adversarial networks to
produce negative samples from the known samples.
Zhou et al. [30] combined pairs of known samples to
define negatives, both in input space and deeper in the
network. Other approaches to open-set recognition are
discussed by Geng et al. [9].
One problem that all the above methods possess is
that they are evaluated on small-scale datasets with
low-resolution images and low numbers of classes. Such
datasets include MNIST [16], SVHN [21] and CIFAR-
10 [14] where oftentimes a few random classes are used
as known and the remaining classes as unknown [9].
Sometimes, other datasets serve the roles of unknowns,
e.g., when MNIST build the known classes, EMNIST
letters [11] are used as negatives and/or unknowns.
Similarly, the known classes are composed of CIFAR-10
and other classes from CIFAR-100 or SVHN are nega-
tives or unknowns [15, 6]. Only few papers make use
of large-scale datasets such as ImageNet, where they
either use the classes of ILSVRC 2012 as known and
other classes from ImageNet as unknown [2, 28], or
random partitions of ImageNet [25, 24].
Oftentimes, evaluation protocols are home-grown
and, thus, the comparison across algorithms is very
difficult. Additionally, there is no clear distinction on
the similarities between known, negative and unknown
classes, which makes it impossible to judge in which
scenarios a method will work, and in which not. Fi-
nally, the employed evaluation metrics are most often
not designed for open-set classification and, hence, fail
to address typical use-cases of open-set recognition.
3. Approach
3.1. ImageNet Protocols
Based on [3], we design three different protocols to
create three different artificial open spaces, with in-
creasing level of similarity in appearance between in-
puts – and increasing complexity and overlap between
features – of known and unknown classes. To allow
for the comparison of algorithms that require negative
samples for training, we carefully design and include
negative classes into our protocols. This also allows
us to compare how well these algorithms work on pre-
viously seen negative classes and how on previously
unseen unknown classes.
In order to define our three protocols, we make use
of the WordNet hierarchy that provides us with a tree
structure for the 1000 classes of ILSVRC 2012. Partic-
ularly, we exploit the robustness Python library [7]
to parse the ILSVRC tree. All the classes in ILSVRC
are represented as leaf nodes of that graph, and we
use descendants of several intermediate nodes to form
our known and unknown classes. The definition of the
protocols and their open-set partitions are presented in
Fig. 1, a more detailed listing of classes can be found
in the supplemental material. We design the protocols
such that the difficulty levels of closed- and open-set
evaluation varies. While protocol P1is easy for open-
set, it is hard for closed-set classification. On the con-
trary, P3is more easy for closed-set classification and
more difficult in open-set. Finally, P2is somewhere in
the middle, but small enough to run hyperparameter
optimization that can be transferred to P1and P3.