focus on the correction of noisy samples once they have been
detected. We specifically propose a novel metric, the pseudo-
loss, which is able to retrieve correctly guessed pseudo-labels
and that we show to be superior to the pseudo-label confi-
dence previously used in the semi-supervised literature. We
find that incorrectly guessed pseudo-labels are especially
damaging to the supervised contrastive objectives that have
been used in recent contributions [
23
,
1
,
18
]. We propose an
interpolated contrastive objective between class-conditional
(supervised) for the clean or correctly corrected samples,
where we encourage the network to learn similar represen-
tation for images belonging to the same class; and an unsu-
pervised objective for the incorrectly corrected noise. This
results in
P
¯
seudo-
L
¯
oss
S
¯
election (PLS) a two-stage noise
detection algorithm where the first stage detects all noisy
samples in the dataset while the second stage removes incor-
rect corrections. We then train a neural network to jointly
minimize a classification and a supervised contrastive objec-
tive. We design PLS on synthetically corrupted datasets and
validate our findings on two real world noisy web crawled
datasets. Figure 1 illustrates our proposed improvement to
label noise robust algorithms. Our contributions are:
•
A two-stage noise detection using a novel metric where
we ensure that the corrected targets for noisy samples
are accurate;
•
A novel softly interpolated confidence guided con-
trastive loss term between supervised and unsupervised
objective to learn robust features from all images;
•
Extensive experiments of synthetically corrupted and
web-crawled noisy datasets to demonstrate the perfor-
mance of our algorithm.
2. Related work
Label noise robust algorithms
Label noise in web crawled datasets has been evidenced
to be a mixture between in-distribution (ID) noise and out-of-
distribution (OOD) noise [
2
]. In-distribution noise denotes
an image that was assigned an incorrect label but can be
corrected to another label in the label distribution. Out-of-
distribution noise are images whose true label lie outside of
the label distribution and cannot be directly corrected. While
some algorithms have been designed to detect ID and OOD
separately, others reach good results by assuming all noise
is ID. The rest of this section will introduce state-of-the-art
approaches to detect and correct noisy samples.
2.1. Label noise detection
Label noise in datasets can be detected by exacerbating
the natural resistance of neural networks to noise. Small
loss algorithms [
3
,
17
,
22
] observe that noisy samples tend
to be learned slower than their clean counterparts and that
a bi-modal distribution can be observed in the training loss
where noisy samples belong to the high loss mode. A mix-
ture model is then fit to the loss distribution to retrieve the
two modes in an unsupervised manner. Other approaches
evaluate the neighbor coherence in the network feature space
where images are expected to have many neighbors from
the same class [
23
,
18
,
25
] and a hyper-parameter threshold
is used on the number of neighbors from the same class to
allow to identify the noisy samples. In some cases, a separate
OOD detection can be performed to differentiate between
correctable ID noise and uncorrectable OOD samples. OOD
samples are detected by evaluating the uncertainty of the cur-
rent neural network prediction. EvidentialMix [
24
] uses the
evidential loss [
26
], JoSRC evaluates the Jensen-Shannon di-
vergence between predictions [
38
], and DSOS [
2
] computes
the collision entropy. An alternative approach is to use a
clean subset to learn to detect label noise in a meta-learning
fashion [
36
,
10
,
35
,
37
] but we will assume in this paper that
a trusted set is unavailable.
2.2. Noise correction
Once the noisy samples have been detected, state-of-the-
art approaches guess true labels using current knowledge
learned by the network. Options include guessing using the
prediction of the network on unaugmented samples [
3
,
21
],
semi-supervised learning [
17
,
23
], or neighboring samples
in the feature space [
18
]. Some approaches also simply
discard the detected noisy examples to train on the clean
data alone [
11
,
12
,
27
,
40
]. In the case where a separate
out-of-distribution detection is performed, the samples can
either be removed from the dataset [
24
], assigned a uniform
label distribution over the classes to promote rejection by the
network [38, 2], or used in an unsupervised objective [1].
2.3. Noise regularization
Another strategy when training on label noise datasets
is to use strong regularization either in the form of data
augmentation such as mixup [
43
] or using a dedicated loss
term [
21
]. Unsupervised regularization has also shown to
help improve the classification accuracy of neural networks
trained on label noise datasets [18, 30].
3. PLS
We consider an image dataset dataset
X={xi}N
i=1
asso-
ciated with one-hot encoded classification labels
Y
over
C
classes. An unknown percentage of labels in
Y={yi}N
i=1
are noisy, i.e.
yi
is different from the true label of
xi
. We
aim to train a neural network
ϕ
on the imperfect label noise
dataset to perform accurate classification on a held out test
set.