on additional structures, such as GNN and label estima-
tor, which further increase the complexity of networks.
A natural question is whether this problem can be
effectively solved without significantly increasing the
network complexity.
•It is still not clear how the missing ratio of the labels
affects the classification performance, which is of great
importance for us to balance the performance of classifier
and the annotation cost.
•Due to imbalance between positive and negative labels,
most methods dealing with missing labels require that
there is at least one positive label per instance, i.e., PPL,
instead of POL, which is more common in real life.
With these observations, this paper investigates new ap-
proaches for multi-label classification with missing labels.
The main contributions are summarized as follows:
•We propose a pseudo-label-based approach to predict
all possible categories with missing labels, which can
effectively balance the performance of classifiers and the
cost of annotation. The network structure in our approach
is the same as the classifier trained with full labels, which
means that our approach will not increase the network
complexity. The major difference lies in the novel design
of loss functions and training schemes.
•We provide systematical and quantitative analysis of the
impact of labels’ missing ratio on the classifier’s per-
formance. In particular, we relax the strict requirement
that the label space of each instance must contain at
least one positive label, which is commonly seen in the
related work [4], [5]. Therefore, our method is applicable
to general POL settings, not only PPL.
•Comprehensive experiments verify that our approach
can be effectively applied to missing-label classification.
Specifically, our approach outperforms most existing
missing-label learning approaches, and in some cases
even approaches trained with fully labeled datasets. More
importantly, our approach can adopt POL settings, which
is incompatible with most existing methods.
The rest of the paper is organized as follows. Section II
discusses the related work. The problem is formulated in Sec-
tion III and our proposed method is presented in Section IV.
Section V shows the experimental results. Finally, conclusions
are drawn in Section VI.
II. RELATED WORK
A. Multi-label Learning with Missing Labels
Recently, numerous methods have been proposed for multi-
label classification with missing labels. Herein, we briefly
review the relevant studies.
Binary Relevance (BR). A straightforward approach for
multi-label learning with missing labels is BR [1], [13], which
decomposes the task into a number of binary classification
problems, each for one label. Such an approach encounters
many difficulties, mainly due to ignoring correlations be-
tween labels. To address this issue, many correlation-enabling
extensions to binary relevance have been proposed [12],
[14]–[17]. However, most of these methods require solving
an optimization problem while keeping the training set in
memory at the same time. So it is extremely hard, if not
impossible, to apply a mini-batch strategy to fine-tune the
model [2], which will limit the use of pre-trained neural
networks (NN) [18].
Positive and Unlabeled Learning (PU-learning). PU-learning
is an alternative solution [19], which studies the problem with
a small number of positive examples and a large number of
unlabeled examples for training. Most methods can be divided
into the following three categories: two-step techniques [20]–
[22], biased learning [23], [24], and class prior incorpora-
tion [25], [26]. All these methods require that the training
data consists of positive and unlabeled examples [27]. In
other words, they treat the negative labels as unlabeled, which
discard the existing negatives and does not make full use of
existing labels.
Pseudo Label. Pseudo-labeling was first proposed in [28]. The
goal of pseudo-labeling is to generate pseudo-labels for unla-
beled samples [29]. There are different methods to generate
pseudo labels: the work in [28], [30] uses the predictions of
a trained NN to assign pseudo labels. Neighborhood graphs
are used in [31]. The approach in [32] updates pseudo labels
through an optimization framework. It is worth mention-
ing that MixMatch-family semi-supervised learning methods
[33]–[36] achieve SOTA on multi-class problem by utilizing
pseudo labels and consistency regularization [37]. However,
creation of negative pseudo-labels (i.e. labels which specify
the absence of specific classes) is not supported by these
methods, which therefore affects the performance of classifier
by neglecting negative labels [30]. Instead, the work in [30]
obtains the reference values of pseudo labels directly from the
network predictions and then generates hard pseudo labels by
setting confidence thresholds for positive and negative labels,
respectively. Different from [30], we simplify this process
by studying the proportion of positive and negative labels to
generate pseudo labels.
B. Imbalance
A key characteristic of multi-label classification is the
inherent positive-negative imbalance created when the overall
number of labels is large [38]. Missing labels exacerbate the
imbalance and plague recognizing positives [5]. Therefore, the
work in [4], [5] mandates that each instance in the training
set must have at least one positive label, which means that
they focus on PPL setting instead of “real” POL. Obviously,
this assumption may not always hold in real life scenarios. To
relax this assumption, a trivial solution is to treat the instances
with only negative labels as unlabeled instances. In this case,
however, it wastes the value of negative labels.
In this work, we allow the instances in training sets
with only negative labels (that is POL setting). From this