Is your noise correction noisy PLS Robustness to label noise with two stage detection Paul Albert Eric Arazo Tarun Krishna Noel E. OConnor Kevin McGuinness

2025-05-03 0 0 602.75KB 13 页 10玖币

侵权投诉

Is your noise correction noisy?

PLS: Robustness to label noise with two stage detection

Paul Albert, Eric Arazo, Tarun Krishna, Noel E. O’Connor, Kevin McGuinness

School of Electronic Engineering,

Insight SFI Centre for Data Analytics, Dublin City University (DCU)

paul.albert@insight-centre.org

Abstract

Designing robust algorithms capable of training accurate

neural networks on uncurated datasets from the web has

been the subject of much research as it reduces the need for

time consuming human labor. The focus of many previous

research contributions has been on the detection of differ-

ent types of label noise; however, this paper proposes to

improve the correction accuracy of noisy samples once they

have been detected. In many state-of-the-art contributions,

a two phase approach is adopted where the noisy samples

are detected before guessing a corrected pseudo-label in

a semi-supervised fashion. The guessed pseudo-labels are

then used in the supervised objective without ensuring that

the label guess is likely to be correct. This can lead to con-

ﬁrmation bias, which reduces the noise robustness. Here

we propose the pseudo-loss, a simple metric that we ﬁnd

to be strongly correlated with pseudo-label correctness on

noisy samples. Using the pseudo-loss, we dynamically down

weight under-conﬁdent pseudo-labels throughout training

to avoid conﬁrmation bias and improve the network accu-

racy. We additionally propose to use a conﬁdence guided

contrastive objective that learns robust representation on an

interpolated objective between class bound (supervised) for

conﬁdently corrected samples and unsupervised represen-

tation for under-conﬁdent label corrections. Experiments

demonstrate the state-of-the-art performance of our Pseudo-

Loss Selection (PLS) algorithm on a variety of benchmark

datasets including curated data synthetically corrupted with

in-distribution and out-of-distribution noise, and two real

world web noise datasets. Our experiments are fully repro-

ducible github.com/PaulAlbert31/PLS.

1. Introduction

Standard supervised datasets for image classiﬁcation us-

ing deep learning [

] are constituted by large

amounts of images gathered from the web which have been

True label guess

Turtle Turtle Turtle

Classification loss minimization

True label guess

Turtle Turtle Turtle

Alligator

Classification loss minimization

AlligatorAlligator Alligator

Pseudo loss filtering

Label noise robust algorithm Ours

Figure 1. Two stage label noise mitigation on detected noisy sam-

ples. Contrary to state-of-the-art label noise robust algorithms, we

ﬁlter out incorrect pseudo-labels using the pseudo-loss to avoid

conﬁrmation bias on incorrect corrections.

heavily curated by multiple human annotators. In this

paper, we propose to devise an algorithm which aims to

train an accurate classiﬁcation network on a web crawled

dataset [

] where the human curation process was

skipped. By doing so, the dataset creation time is greatly

reduced but label noise becomes an issue [

] and can greatly

degrade the classiﬁcation accuracy [

]. To counter the

effect of noisy annotations, previous contributions have fo-

cused on detecting the noisy samples using the natural robust-

ness of deep learning architectures to noise in early training

stages [

]. These algorithms will identify noisy sam-

ples because they tend to be learned slower than their clean

counterpart [

], because of incoherences with the labels

of close neighbors in the feature space [

], a conﬁdent

prediction from the neural net in a different class than the

target class [

], inconsistent predictions across itera-

tions [

], and more. Once the noisy samples are identi-

ﬁed, a corrected label is produced, yet ensuring that labels

are correctly guessed is less studied in the label noise litera-

ture. Some propositions inspired by semi-supervised learn-

ing [

] have been made recently by Li et al. [

] where

only pseudo-labels whose value in the max softmax bin (con-

ﬁdence) is superior to a hyper-parameter threshold value are

kept or by Song et al. [

] where low entropy predictions

indicate a conﬁdent pseudo-label. This paper proposes to

arXiv:2210.04578v2 [cs.CV] 15 Oct 2022

focus on the correction of noisy samples once they have been

detected. We speciﬁcally propose a novel metric, the pseudo-

loss, which is able to retrieve correctly guessed pseudo-labels

and that we show to be superior to the pseudo-label conﬁ-

dence previously used in the semi-supervised literature. We

ﬁnd that incorrectly guessed pseudo-labels are especially

damaging to the supervised contrastive objectives that have

been used in recent contributions [

]. We propose an

interpolated contrastive objective between class-conditional

(supervised) for the clean or correctly corrected samples,

where we encourage the network to learn similar represen-

tation for images belonging to the same class; and an unsu-

pervised objective for the incorrectly corrected noise. This

results in

seudo-

oss

election (PLS) a two-stage noise

detection algorithm where the ﬁrst stage detects all noisy

samples in the dataset while the second stage removes incor-

rect corrections. We then train a neural network to jointly

minimize a classiﬁcation and a supervised contrastive objec-

tive. We design PLS on synthetically corrupted datasets and

validate our ﬁndings on two real world noisy web crawled

datasets. Figure 1 illustrates our proposed improvement to

label noise robust algorithms. Our contributions are:

•

A two-stage noise detection using a novel metric where

we ensure that the corrected targets for noisy samples

are accurate;

•

A novel softly interpolated conﬁdence guided con-

trastive loss term between supervised and unsupervised

objective to learn robust features from all images;

•

Extensive experiments of synthetically corrupted and

web-crawled noisy datasets to demonstrate the perfor-

mance of our algorithm.

2. Related work

Label noise robust algorithms

Label noise in web crawled datasets has been evidenced

to be a mixture between in-distribution (ID) noise and out-of-

distribution (OOD) noise [

]. In-distribution noise denotes

an image that was assigned an incorrect label but can be

corrected to another label in the label distribution. Out-of-

distribution noise are images whose true label lie outside of

the label distribution and cannot be directly corrected. While

some algorithms have been designed to detect ID and OOD

separately, others reach good results by assuming all noise

is ID. The rest of this section will introduce state-of-the-art

approaches to detect and correct noisy samples.

2.1. Label noise detection

Label noise in datasets can be detected by exacerbating

the natural resistance of neural networks to noise. Small

loss algorithms [

] observe that noisy samples tend

to be learned slower than their clean counterparts and that

a bi-modal distribution can be observed in the training loss

where noisy samples belong to the high loss mode. A mix-

ture model is then ﬁt to the loss distribution to retrieve the

two modes in an unsupervised manner. Other approaches

evaluate the neighbor coherence in the network feature space

where images are expected to have many neighbors from

the same class [

] and a hyper-parameter threshold

is used on the number of neighbors from the same class to

allow to identify the noisy samples. In some cases, a separate

OOD detection can be performed to differentiate between

correctable ID noise and uncorrectable OOD samples. OOD

samples are detected by evaluating the uncertainty of the cur-

rent neural network prediction. EvidentialMix [

] uses the

evidential loss [

], JoSRC evaluates the Jensen-Shannon di-

vergence between predictions [

], and DSOS [

] computes

the collision entropy. An alternative approach is to use a

clean subset to learn to detect label noise in a meta-learning

fashion [

] but we will assume in this paper that

a trusted set is unavailable.

2.2. Noise correction

Once the noisy samples have been detected, state-of-the-

art approaches guess true labels using current knowledge

learned by the network. Options include guessing using the

prediction of the network on unaugmented samples [

semi-supervised learning [

], or neighboring samples

in the feature space [

]. Some approaches also simply

discard the detected noisy examples to train on the clean

data alone [

]. In the case where a separate

out-of-distribution detection is performed, the samples can

either be removed from the dataset [

], assigned a uniform

label distribution over the classes to promote rejection by the

network [38, 2], or used in an unsupervised objective [1].

2.3. Noise regularization

Another strategy when training on label noise datasets

is to use strong regularization either in the form of data

augmentation such as mixup [

] or using a dedicated loss

term [

]. Unsupervised regularization has also shown to

help improve the classiﬁcation accuracy of neural networks

trained on label noise datasets [18, 30].

3. PLS

We consider an image dataset dataset

X={xi}N

i=1

asso-

ciated with one-hot encoded classiﬁcation labels

over

classes. An unknown percentage of labels in

Y={yi}N

i=1

are noisy, i.e.

is different from the true label of

. We

aim to train a neural network

on the imperfect label noise

dataset to perform accurate classiﬁcation on a held out test

set.

3.1. Detecting the noisy samples

Our contributions do not include detecting the noisy la-

bels but we propose to focus here on improving the cor-

rection of the noisy samples once they have been detected.

We use a known phenomenon in previous research for la-

bel noise classiﬁcation [

] where in early stages of

training, the cross-entropy loss between

’s prediction on

an unaugmented view on an image

ϕ(xi)

and the associated

(possibly noisy) ground-truth label

is observed to separate

into a low loss clean mode and high loss noisy mode. We

therefore propose to ﬁt a Gaussian Mixture Model (GMM)

to the training loss to retrieve each mode in an unsupervised

fashion. Clean samples are ﬁnally identiﬁed as belonging to

the low loss mode with a probability superior to a threshold

t= 0.95

. Alternative metrics have been proposed to retrieve

noisy labels but we ﬁnd that while approaches retrieve noisy

samples very similarly for synthetic noise, the training loss

is more accurate in the case of real world noise. We justify

this statement in Section 4.2.

3.2. Conﬁdent correction of noisy labels

3.2.1 Guessing labels for detected noisy samples

To guess the true label of detected noisy samples, we pro-

pose to use a consistency regularization approach. Given

an image

associated to a noisy label, we produce two

weakly augmented views

xi1

and

xi2

. Weak augmentations

are random cropping after zero-padding and random hori-

zontal ﬂipping. Using the current state of

, we guess the

pseudo-label ˆyias

ˆyi=ϕ(xi1) + ϕ(xi2)

2γ

,(1)

with

γ= 2

being a temperature hyper-parameter. We then

apply a max normalization over

ˆyi

to ensure that the values

of the pseudo-label are between 0 and 1.

3.2.2 Correcting only conﬁdent pseudo-labels

We propose to only correct those pseudo-labels that are likely

to be correctly guessed by

. This solution has already

been explored in the semi-supervised literature [

]

where pseudo-labels are only kept if the value of the maxi-

mum probability is superior to an hyperparameter threshold.

Both prediction conﬁdence measured by highest probability

bin [

] or prediction entropy [

] has also been successfully

applied in the label noise literature. We propose to identify

correct pseudo-labels by evaluating a different metric, which

we name the pseudo-loss. The pseudo-loss evaluates the

cross-entropy loss between the pseudo-label

ˆyi

and the pre-

diction of the model on an unaugmented view ϕ(xi):

lpseudo =−ˆyilog ϕ(xi).(2)

We observe that much like the noise detection loss in Sec-

tion 3.1, the pseudo-loss is bi-modal (see Figure 1 and Sec-

tion 4.3). We propose to ﬁt a second GMM to the pseudo-loss

and to use the posterior probability of a sample to belong to

the low

lpseudo

mode (correct pseudo-label, left-most gaus-

sian) as

, a weight in the classiﬁcation loss

lclassif

that re-

duces the impact of incorrect pseudo-labels. Underconﬁdent,

high pseudo-loss samples are weighed with values close to 0

(low probability of belonging to the low pseudo-loss mode)

while conﬁdent pseudo-labels are weighed with values close

to 1 (high probability of belonging to the low pseudo-loss

mode). The classiﬁcation loss we use is a weighed cross-

entropy with mixup:

lclassif =1

i=1 wmix,i

i=1

−wmix,i ˆymix,i log ϕ(xmix,i),

(3)

where

wmix

xmix

and

ymix

are linearly interpolated with

another random sample in the mini-batch using parameter

λ∼ U(0,1)

, sampled for every mini-batch (mixup [

]).

We evaluate how the pseudo-loss compares to pseudo-label

conﬁdence in Section 4.3.

3.2.3 Supervised contrastive learning

To improve the quality of representations learned by

, we

propose to train a supervised contrastive objective jointly

with the classiﬁcation loss. We compute the contrastive fea-

tures as a linear projection

from the classiﬁcation features

to the

normalized contrastive space. A contrastive ob-

jective aims to learn similar contrastive features for images

belonging to the same class. Given a training mini-batch

of images

with associated classiﬁcation labels

, we

produce a weakly augmented view

Xb1

and a strongly aug-

mented view

X′

. The strong augmentations are the SimCLR

augs [

]: random resized crop, color jitter, random grayscale,

and random horizontal ﬂipping. We compute the label sim-

ilarity matrix

L=YbYt

and the feature similarity matrix:

P=g(ϕ(Xi1))g(ϕ(X′

i))T

µ,(4)

with

µ= 0.2

being a temperature scaling parameter. Both

and

are

B×B

matrices with

the mini-batch size. The

contrastive loss is the row-wise cross-entropy loss:

lnaivecont =1

i=1

−Lilog Pi

c=1 Li,c

,(5)

where

and

denote the row

of the corresponding

matrix. Because label noise is present in the datasets we

train on, minimizing

lnaivecont

directly is detrimental since

similarities will be enforced between samples whose pseudo-

label cannot be trusted. We propose instead to account for

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Isyournoisecorrectionnoisy?PLS:RobustnesstolabelnoisewithtwostagedetectionPaulAlbert,EricArazo,TarunKrishna,NoelE.O’Connor,KevinMcGuinnessSchoolofElectronicEngineering,InsightSFICentreforDataAnalytics,DublinCityUniversity(DCU)paul.albert@insight-centre.orgAbstractDesigningrobustalgorithmscapableoftr...

展开>> 收起<<

Is your noise correction noisy PLS Robustness to label noise with two stage detection Paul Albert Eric Arazo Tarun Krishna Noel E. OConnor Kevin McGuinness.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Is your noise correction noisy PLS Robustness to label noise with two stage detection Paul Albert Eric Arazo Tarun Krishna Noel E. OConnor Kevin McGuinness

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: