Is your noise correction noisy PLS Robustness to label noise with two stage detection Paul Albert Eric Arazo Tarun Krishna Noel E. OConnor Kevin McGuinness

2025-05-03 0 0 602.75KB 13 页 10玖币
侵权投诉
Is your noise correction noisy?
PLS: Robustness to label noise with two stage detection
Paul Albert, Eric Arazo, Tarun Krishna, Noel E. O’Connor, Kevin McGuinness
School of Electronic Engineering,
Insight SFI Centre for Data Analytics, Dublin City University (DCU)
paul.albert@insight-centre.org
Abstract
Designing robust algorithms capable of training accurate
neural networks on uncurated datasets from the web has
been the subject of much research as it reduces the need for
time consuming human labor. The focus of many previous
research contributions has been on the detection of differ-
ent types of label noise; however, this paper proposes to
improve the correction accuracy of noisy samples once they
have been detected. In many state-of-the-art contributions,
a two phase approach is adopted where the noisy samples
are detected before guessing a corrected pseudo-label in
a semi-supervised fashion. The guessed pseudo-labels are
then used in the supervised objective without ensuring that
the label guess is likely to be correct. This can lead to con-
firmation bias, which reduces the noise robustness. Here
we propose the pseudo-loss, a simple metric that we find
to be strongly correlated with pseudo-label correctness on
noisy samples. Using the pseudo-loss, we dynamically down
weight under-confident pseudo-labels throughout training
to avoid confirmation bias and improve the network accu-
racy. We additionally propose to use a confidence guided
contrastive objective that learns robust representation on an
interpolated objective between class bound (supervised) for
confidently corrected samples and unsupervised represen-
tation for under-confident label corrections. Experiments
demonstrate the state-of-the-art performance of our Pseudo-
Loss Selection (PLS) algorithm on a variety of benchmark
datasets including curated data synthetically corrupted with
in-distribution and out-of-distribution noise, and two real
world web noise datasets. Our experiments are fully repro-
ducible github.com/PaulAlbert31/PLS.
1. Introduction
Standard supervised datasets for image classification us-
ing deep learning [
15
,
7
,
20
,
14
] are constituted by large
amounts of images gathered from the web which have been
True label guess
Turtle Turtle Turtle
Classification loss minimization
True label guess
Turtle Turtle Turtle
Alligator
Classification loss minimization
AlligatorAlligator Alligator
Pseudo loss filtering
x
Label noise robust algorithm Ours
Figure 1. Two stage label noise mitigation on detected noisy sam-
ples. Contrary to state-of-the-art label noise robust algorithms, we
filter out incorrect pseudo-labels using the pseudo-loss to avoid
confirmation bias on incorrect corrections.
heavily curated by multiple human annotators. In this
paper, we propose to devise an algorithm which aims to
train an accurate classification network on a web crawled
dataset [
19
,
32
] where the human curation process was
skipped. By doing so, the dataset creation time is greatly
reduced but label noise becomes an issue [
2
] and can greatly
degrade the classification accuracy [
42
]. To counter the
effect of noisy annotations, previous contributions have fo-
cused on detecting the noisy samples using the natural robust-
ness of deep learning architectures to noise in early training
stages [
3
,
4
]. These algorithms will identify noisy sam-
ples because they tend to be learned slower than their clean
counterpart [
17
], because of incoherences with the labels
of close neighbors in the feature space [
23
,
18
], a confident
prediction from the neural net in a different class than the
target class [
38
,
21
], inconsistent predictions across itera-
tions [
22
,
34
], and more. Once the noisy samples are identi-
fied, a corrected label is produced, yet ensuring that labels
are correctly guessed is less studied in the label noise litera-
ture. Some propositions inspired by semi-supervised learn-
ing [
28
,
41
] have been made recently by Li et al. [
18
] where
only pseudo-labels whose value in the max softmax bin (con-
fidence) is superior to a hyper-parameter threshold value are
kept or by Song et al. [
29
] where low entropy predictions
indicate a confident pseudo-label. This paper proposes to
arXiv:2210.04578v2 [cs.CV] 15 Oct 2022
focus on the correction of noisy samples once they have been
detected. We specifically propose a novel metric, the pseudo-
loss, which is able to retrieve correctly guessed pseudo-labels
and that we show to be superior to the pseudo-label confi-
dence previously used in the semi-supervised literature. We
find that incorrectly guessed pseudo-labels are especially
damaging to the supervised contrastive objectives that have
been used in recent contributions [
23
,
1
,
18
]. We propose an
interpolated contrastive objective between class-conditional
(supervised) for the clean or correctly corrected samples,
where we encourage the network to learn similar represen-
tation for images belonging to the same class; and an unsu-
pervised objective for the incorrectly corrected noise. This
results in
P
¯
seudo-
L
¯
oss
S
¯
election (PLS) a two-stage noise
detection algorithm where the first stage detects all noisy
samples in the dataset while the second stage removes incor-
rect corrections. We then train a neural network to jointly
minimize a classification and a supervised contrastive objec-
tive. We design PLS on synthetically corrupted datasets and
validate our findings on two real world noisy web crawled
datasets. Figure 1 illustrates our proposed improvement to
label noise robust algorithms. Our contributions are:
A two-stage noise detection using a novel metric where
we ensure that the corrected targets for noisy samples
are accurate;
A novel softly interpolated confidence guided con-
trastive loss term between supervised and unsupervised
objective to learn robust features from all images;
Extensive experiments of synthetically corrupted and
web-crawled noisy datasets to demonstrate the perfor-
mance of our algorithm.
2. Related work
Label noise robust algorithms
Label noise in web crawled datasets has been evidenced
to be a mixture between in-distribution (ID) noise and out-of-
distribution (OOD) noise [
2
]. In-distribution noise denotes
an image that was assigned an incorrect label but can be
corrected to another label in the label distribution. Out-of-
distribution noise are images whose true label lie outside of
the label distribution and cannot be directly corrected. While
some algorithms have been designed to detect ID and OOD
separately, others reach good results by assuming all noise
is ID. The rest of this section will introduce state-of-the-art
approaches to detect and correct noisy samples.
2.1. Label noise detection
Label noise in datasets can be detected by exacerbating
the natural resistance of neural networks to noise. Small
loss algorithms [
3
,
17
,
22
] observe that noisy samples tend
to be learned slower than their clean counterparts and that
a bi-modal distribution can be observed in the training loss
where noisy samples belong to the high loss mode. A mix-
ture model is then fit to the loss distribution to retrieve the
two modes in an unsupervised manner. Other approaches
evaluate the neighbor coherence in the network feature space
where images are expected to have many neighbors from
the same class [
23
,
18
,
25
] and a hyper-parameter threshold
is used on the number of neighbors from the same class to
allow to identify the noisy samples. In some cases, a separate
OOD detection can be performed to differentiate between
correctable ID noise and uncorrectable OOD samples. OOD
samples are detected by evaluating the uncertainty of the cur-
rent neural network prediction. EvidentialMix [
24
] uses the
evidential loss [
26
], JoSRC evaluates the Jensen-Shannon di-
vergence between predictions [
38
], and DSOS [
2
] computes
the collision entropy. An alternative approach is to use a
clean subset to learn to detect label noise in a meta-learning
fashion [
36
,
10
,
35
,
37
] but we will assume in this paper that
a trusted set is unavailable.
2.2. Noise correction
Once the noisy samples have been detected, state-of-the-
art approaches guess true labels using current knowledge
learned by the network. Options include guessing using the
prediction of the network on unaugmented samples [
3
,
21
],
semi-supervised learning [
17
,
23
], or neighboring samples
in the feature space [
18
]. Some approaches also simply
discard the detected noisy examples to train on the clean
data alone [
11
,
12
,
27
,
40
]. In the case where a separate
out-of-distribution detection is performed, the samples can
either be removed from the dataset [
24
], assigned a uniform
label distribution over the classes to promote rejection by the
network [38, 2], or used in an unsupervised objective [1].
2.3. Noise regularization
Another strategy when training on label noise datasets
is to use strong regularization either in the form of data
augmentation such as mixup [
43
] or using a dedicated loss
term [
21
]. Unsupervised regularization has also shown to
help improve the classification accuracy of neural networks
trained on label noise datasets [18, 30].
3. PLS
We consider an image dataset dataset
X={xi}N
i=1
asso-
ciated with one-hot encoded classification labels
Y
over
C
classes. An unknown percentage of labels in
Y={yi}N
i=1
are noisy, i.e.
yi
is different from the true label of
xi
. We
aim to train a neural network
ϕ
on the imperfect label noise
dataset to perform accurate classification on a held out test
set.
3.1. Detecting the noisy samples
Our contributions do not include detecting the noisy la-
bels but we propose to focus here on improving the cor-
rection of the noisy samples once they have been detected.
We use a known phenomenon in previous research for la-
bel noise classification [
3
,
17
,
22
] where in early stages of
training, the cross-entropy loss between
ϕ
s prediction on
an unaugmented view on an image
ϕ(xi)
and the associated
(possibly noisy) ground-truth label
yi
is observed to separate
into a low loss clean mode and high loss noisy mode. We
therefore propose to fit a Gaussian Mixture Model (GMM)
to the training loss to retrieve each mode in an unsupervised
fashion. Clean samples are finally identified as belonging to
the low loss mode with a probability superior to a threshold
t= 0.95
. Alternative metrics have been proposed to retrieve
noisy labels but we find that while approaches retrieve noisy
samples very similarly for synthetic noise, the training loss
is more accurate in the case of real world noise. We justify
this statement in Section 4.2.
3.2. Confident correction of noisy labels
3.2.1 Guessing labels for detected noisy samples
To guess the true label of detected noisy samples, we pro-
pose to use a consistency regularization approach. Given
an image
xi
associated to a noisy label, we produce two
weakly augmented views
xi1
and
xi2
. Weak augmentations
are random cropping after zero-padding and random hori-
zontal flipping. Using the current state of
ϕ
, we guess the
pseudo-label ˆyias
ˆyi=ϕ(xi1) + ϕ(xi2)
2γ
,(1)
with
γ= 2
being a temperature hyper-parameter. We then
apply a max normalization over
ˆyi
to ensure that the values
of the pseudo-label are between 0 and 1.
3.2.2 Correcting only confident pseudo-labels
We propose to only correct those pseudo-labels that are likely
to be correctly guessed by
ϕ
. This solution has already
been explored in the semi-supervised literature [
28
,
41
]
where pseudo-labels are only kept if the value of the maxi-
mum probability is superior to an hyperparameter threshold.
Both prediction confidence measured by highest probability
bin [
18
] or prediction entropy [
29
] has also been successfully
applied in the label noise literature. We propose to identify
correct pseudo-labels by evaluating a different metric, which
we name the pseudo-loss. The pseudo-loss evaluates the
cross-entropy loss between the pseudo-label
ˆyi
and the pre-
diction of the model on an unaugmented view ϕ(xi):
lpseudo =ˆyilog ϕ(xi).(2)
We observe that much like the noise detection loss in Sec-
tion 3.1, the pseudo-loss is bi-modal (see Figure 1 and Sec-
tion 4.3). We propose to fit a second GMM to the pseudo-loss
and to use the posterior probability of a sample to belong to
the low
lpseudo
mode (correct pseudo-label, left-most gaus-
sian) as
w
, a weight in the classification loss
lclassif
that re-
duces the impact of incorrect pseudo-labels. Underconfident,
high pseudo-loss samples are weighed with values close to 0
(low probability of belonging to the low pseudo-loss mode)
while confident pseudo-labels are weighed with values close
to 1 (high probability of belonging to the low pseudo-loss
mode). The classification loss we use is a weighed cross-
entropy with mixup:
lclassif =1
PN
i=1 wmix,i
N
X
i=1
wmix,i ˆymix,i log ϕ(xmix,i),
(3)
where
wmix
,
xmix
and
ymix
are linearly interpolated with
another random sample in the mini-batch using parameter
λ∼ U(0,1)
, sampled for every mini-batch (mixup [
43
]).
We evaluate how the pseudo-loss compares to pseudo-label
confidence in Section 4.3.
3.2.3 Supervised contrastive learning
To improve the quality of representations learned by
ϕ
, we
propose to train a supervised contrastive objective jointly
with the classification loss. We compute the contrastive fea-
tures as a linear projection
g
from the classification features
to the
L2
normalized contrastive space. A contrastive ob-
jective aims to learn similar contrastive features for images
belonging to the same class. Given a training mini-batch
of images
Xb
with associated classification labels
Yb
, we
produce a weakly augmented view
Xb1
and a strongly aug-
mented view
X
b
. The strong augmentations are the SimCLR
augs [
5
]: random resized crop, color jitter, random grayscale,
and random horizontal flipping. We compute the label sim-
ilarity matrix
L=YbYt
b
and the feature similarity matrix:
P=g(ϕ(Xi1))g(ϕ(X
i))T
µ,(4)
with
µ= 0.2
being a temperature scaling parameter. Both
P
and
L
are
B×B
matrices with
B
the mini-batch size. The
contrastive loss is the row-wise cross-entropy loss:
lnaivecont =1
B
B
X
i=1
Lilog Pi
PC
c=1 Li,c
,(5)
where
Li
and
Pi
denote the row
i
of the corresponding
matrix. Because label noise is present in the datasets we
train on, minimizing
lnaivecont
directly is detrimental since
similarities will be enforced between samples whose pseudo-
label cannot be trusted. We propose instead to account for
摘要:

Isyournoisecorrectionnoisy?PLS:RobustnesstolabelnoisewithtwostagedetectionPaulAlbert,EricArazo,TarunKrishna,NoelE.O’Connor,KevinMcGuinnessSchoolofElectronicEngineering,InsightSFICentreforDataAnalytics,DublinCityUniversity(DCU)paul.albert@insight-centre.orgAbstractDesigningrobustalgorithmscapableoftr...

展开>> 收起<<
Is your noise correction noisy PLS Robustness to label noise with two stage detection Paul Albert Eric Arazo Tarun Krishna Noel E. OConnor Kevin McGuinness.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:602.75KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注