the single-stage direction of NCD.
However, we find that the current single-stage based
NCD methods do not sufficiently take advantage of the
essence of the NCD setting, that is to say, overlooking the
disjoint characteristic between the labelled and unlabelled
classes. In this sense, on one hand, the labelled and unla-
belled samples cannot be effectively separated, weakening
the discriminability of the learned features. On the other
hand, because the labelled data is learned under supervision
while the unlabelled data has no supervision, i.e., an im-
balanced learning process (learning with different supervi-
sion strengths), it will make the learned feature representa-
tions biased toward the labelled data. In addition, we notice
that although some methods [10,35] have used data aug-
mentation to generate additional samples and gained sig-
nificant performance improvements, they generally employ
the mean squared error (MSE) as the consistency regular-
ization, which cannot constrain the consistency well with a
good generalization ability.
To address the above two issues, we propose to model
both Inter-class and Intra-class Constraints (IIC for short)
built on the symmetric Kullback-Leibler divergence (sKLD)
for discovering the novel classes. To be specific, an inter-
class sKLD constraint is proposed to explicitly learn to sep-
arate different classes between labelled and unlabelled data,
enhancing the discriminability of learned feature represen-
tations. Moreover, an intra-class sKLD constraint is pre-
sented to fully learn the intra-relationship between sam-
ples and their augmentations. According to our experi-
ments, such an intra-class sKLD constraint can also stable
the training process in the training phase. We have con-
ducted extensive experiments on three benchmarks, includ-
ing CIFAR10 [21], CIFAR100 [21] and ImageNet [7], and
show that the proposed two constraints can significantly and
consistently outperform the existing novel class discovery
methods by a large margin.
To summarize, our contributions are as follows:
• We propose a new inter-class Kullback-Leibler diver-
gence constraint to sufficiently model the relationship
between the labelled and unlabelled datasets to learn
more discriminative feature representations, which is
somewhat overlooked in the literature.
• We propose a new intra-class Kullback-Leibler diver-
gence constraint to effectively exploit the relationship
between a sample and its different transformations to
learn invariant feature representations.
• We evaluate the proposed constraints on three bench-
mark datasets for novel class discovery and obtain sig-
nificant performance improvements over the state-of-
the-art methods, which successfully demonstrates the
effectiveness of the proposed method.
2. Related Work
Novel class discovery (NCD) is a new task attracted wide
attention in recent years, which aims at discovering new
classes in an unlabelled dataset given a class-disjoint la-
belled dataset as supervision. A variety of advanced NCD
methods have been proposed and have tangibly improved
the clustering performance on multiple benchmark datasets.
The early methods of NCD include KCL [14], MCL [15]
and DTC [11]. In general, these methods first learn an em-
bedding network of feature representations on the labelled
data, and then use it directly for the unlabelled data. Specif-
ically, KCL and MCL propose a framework for both cross-
domain and cross-task transfer learning that leverages the
pairwise similarity to represent categorical information, and
learn the clustering network based on the pairwise simi-
larity prediction through different objective functions, re-
spectively. DTC extends the deep embedding clustering
method [31] into a transfer learning setting and proposes
a two-stage method. Importantly, Han et al. [11] formalize
the task of novel class discovery for the first time.
Since then, the current NCD methods [8,10,19,20,33,
35,37,38] are almost single-stage and can take greater ad-
vantage of both labelled and unlabelled data. RS [10] in-
troduces a three-step learning pipeline, which first trains
the representation network with all labelled and unlabelled
samples using self-supervised learning, and then uses rank-
ing statistics to obtain pairwise similarity between unla-
belled samples, and finally use the pairwise similarity to
discover novel classes. DualRank [35] expands RS to a
two-branch framework from both global and local levels.
Similarly, DualRank uses dual ranking statistics and mutual
knowledge distillation to generate pseudo labels and ensure
the consistency between two branches. In order to gener-
ate pairwise pseudo labels, Joint [19] employs a Winner-
Take-All (WTA) hashing algorithm [32] on the shared fea-
ture space for NCD.
NCL [37] and OpenMix [38] are largely motivated by
contrastive learning [5,12] and Mixup [34], respectively.
NCL introduces the contrastive loss to learn more discrim-
inative representations. On the other hand, OpenMix uses
Mixup to mix labelled and unlabelled samples, building a
learnable relationship between the two parts of data. In-
stead of using multiple objectives, UNO [8] introduces a
unified objective function to transfer knowledge from the la-
belled set to unlabelled set. More recently, Joseph et al. [20]
categorize the existing NCD methods into two classes (i.e.,
two- and single-stage based methods), according to whether
the labelled and unlabelled samples are available at the
same time or not. They also propose a spacing loss to en-
force separability between labelled and unlabelled points in
the embedding space. ComEx [33] focuses on the gener-
alized NCD (GNCD), aka generalized category discovery
(GCD) [29], and proposes two groups of compositional ex-
2