BoundaryFace A mining framework with noise label self-correction for Face Recognition Shijie Wu and Xun Gong

2025-04-30 0 0 896.94KB 18 页 10玖币

侵权投诉

BoundaryFace: A mining framework with noise

label self-correction for Face Recognition

Shijie Wu and Xun Gong

School of Computing and Artiﬁcial Intelligence, Southwest Jiaotong University,

Chengdu, Sichuan, China

xgong@swjtu.edu.cn

Abstract. Face recognition has made tremendous progress in recent

years due to the advances in loss functions and the explosive growth in

training sets size. A properly designed loss is seen as key to extract dis-

criminative features for classiﬁcation. Several margin-based losses have

been proposed as alternatives of softmax loss in face recognition. How-

ever, two issues remain to consider: 1) They overlook the importance

of hard sample mining for discriminative learning. 2) Label noise ubiq-

uitously exists in large-scale datasets, which can seriously damage the

model’s performance. In this paper, starting from the perspective of deci-

sion boundary, we propose a novel mining framework that focuses on the

relationship between a sample’s ground truth class center and its nearest

negative class center. Speciﬁcally, a closed-set noise label self-correction

module is put forward, making this framework work well on datasets

containing a lot of label noise. The proposed method consistently out-

performs SOTA methods in various face recognition benchmarks. Train-

ing code has been released at https://github.com/SWJTU-3DVision/

BoundaryFace.

Keywords: Face Recognition, Noise Label, Hard Sample Mining, Deci-

sion Boundary

1 Introduction

Face recognition is one of the most widely studied topics in the computer vision

community. Large-scale datasets, network architectures, and loss functions have

fueled the success of Deep Convolutional Neural Networks (DCNNs) on face

recognition. Particularly, with an aim to extract discriminative features, the

latest works have proposed some intuitively reasonable loss functions.

For face recognition, the current existing losses can be divided into two ap-

proaches: one deems the face recognition task to be a general classiﬁcation prob-

lem, and networks are therefore trained using softmax [13,27,28,26,3,1,5,18,33];

the other approaches the problem using metric learning and directly learns an

embedding, such as [23,19,22]. Since metric learning loss usually suﬀers from

sample batch combination explosion and semi-hard sample mining, the second

problem needs to be addressed by more sophisticated sampling strategies. Loss

functions have therefore attracted increased attention.

arXiv:2210.04567v1 [cs.CV] 10 Oct 2022

2 Shijie Wu, Xun Gong

……

Class 2 Class 1

……

Class 3

Step 1

Step 2

Step 3

Fig. 1. The motivation of BoundaryFace. Step 1 denotes closed-set noise label self-

correction. Step 2 denotes nearest negative class match. Step 3 denotes hard sample

handle. For a noisy hard sample, we ﬁrst correct its label, then match the nearest

negative class based on the correct label, and ﬁnally emphasize it using the decision

boundary consisting of this sample’s ground truth class center and the nearest negative

class center.

It has been pointed out that the classical classiﬁcation loss function (i.e.,

Softmax loss) cannot obtain discriminative features. Based on current testing

protocols, the probe commonly has no overlap with the training images, so it

is particularly crucial to extract features with high discriminative ability. To

this end, Center loss [31] and NormFace [27] have been successively proposed

to obtain discriminative features. Wen et al. [31] developed a center loss that

learns each subject’s center. To ensure the training process is consistent with

testing, Wang et al. [27] made the features extracted by the network and the

weight vectors of the last fully connected layer lay on the unit hypersphere.

Recently, some margin-based softmax loss functions [13,28,26,3,14] have also

been proposed to enhance intra-class compactness while enlarging inter-class

discrepancy, resulting in more discriminative features.

The above approaches have achieved relatively satisfactory results. However,

there are two very signiﬁcant issues that must still be addressed: 1) Previous

research has ignored the importance of hard sample mining for discriminative

learning. As illustrated in [8,2], hard sample mining is a crucial step in improving

performance. Therefore, some mining-based softmax losses have emerged. Very

recently, MV-Arc-Softmax [30], and CurricularFace [9] were proposed. They were

inspired by integrating both margin and mining into one framework. However,

both consider the relationship between the sample ground truth class and all

negative classes, which may complicate the optimization of the decision bound-

ary. 2) Both margin-based softmax loss and mining-based softmax loss ignore

the inﬂuence of label noise. Noise in face recognition datasets is composed of

two types: closed-set noise, in which some samples are falsely given the labels of

BoundaryFace 3

other identities within the same dataset, and open-set noise, in which a subset

of samples that do not belong to any of the classes, are mistakenly assigned one

of their labels, or contain some non-faces. Wang et al. [25] noted that noise,

especially closed-set noise, can seriously impact model’s performance. Unfortu-

nately, removing noise is expensive and, in many cases, impracticable. Intuitively,

the mining-based softmax loss functions can negatively impact the model if the

training set is noisy. That is, mining-based softmax is likely to perform less well

than baseline methods on datasets with severe noise problems. Designing a loss

function that can perform hard sample mining and tolerate noise simultaneously

is still an open problem.

In this paper, starting from the perspective of decision boundary, we pro-

pose a novel mining framework with tolerating closed-set noise. Fig. 1 illustrates

our motivation using a noisy hard sample processing. Speciﬁcally, based on the

premise of closed-set noise label correction, the framework directly emphasizes

hard sample features that are between the ground truth class center and the

nearest negative class center. We ﬁnd out that if a sample is a closed-set noise,

there is a high probability that the sample is distributed within the nearest

negative class’s decision boundary, and the nearest negative class is likely to be

the ground truth class of the noisy sample. Based on this ﬁnding, we propose

a module that automatically discovers closed-set noise during training and dy-

namically corrects its labels. Based on this module, the mining framework can

work well on large-scale datasets under the impact of severe noise. To sum up,

the contributions of this work are:

–We propose a novel mining framework with noise label self-correction, named

BoundaryFace, to explicitly perform hard sample mining as a guidance of

the discriminative feature learning.

–The closed-set noise module can be used in any of the existing margin-based

softmax losses with negligible computational overhead. To the best of our

knowledge, this is the ﬁrst solution for closed-set noise from the perspective

of the decision boundary.

–We have conducted extensive experiments on popular benchmarks, which

have veriﬁed the superiority of our BoundaryFace over the baseline softmax

and the mining-based softmax losses.

2 Related Work

2.1 Margin-based softmax

Most recently, researchers have mainly focused on designing loss functions in

the ﬁeld of face recognition. Since basic softmax loss cannot guarantee facial

features that are suﬃciently discriminative, some margin-based softmax losses

[14,13,28,26,3,35], aiming at enhancing intra-class compactness while enlarging

inter-class discrepancy, have been proposed. Liu et al. [14] brought in multiplica-

tive margin to face recognition in order to produce discriminative feature. Liu

et al. [13] introduced an angular margin (A-Softmax) between ground truth class

4 Shijie Wu, Xun Gong

and other classes to encourage larger inter-class discrepancy. Since multiplica-

tive margin could encounter optimization problems, Wang et al. [28] proposed

an additive margin to stabilize optimization procedure. Deng et al. [3] changed

the form of the additive margin, which generated a loss with clear geometric sig-

niﬁcance. Zhang et al. [35] studied on the eﬀect of two crucial hyper-parameters

of traditional margin-based softmax losses, and proposed the AdaCos, by an-

alyzing how they modulated the predicted clasiﬁcation probability. Even these

margin-based softmax losses have achieved relatively good performance, none of

them takes into account the eﬀects of hard sample mining and label noise.

2.2 Mining-based softmax

There are two well-known hard sample mining methods, i.e., Focal loss [12], On-

line Hard Sample Mining (OHEM) [21]. Wang et al. [30] has shown that naive

combining them to current popular face recognition methods has limited im-

provement. Some recent work, MV-Arc-Softmax [30], and CurricularFace [9] are

inspired by integrating both margin and mining into one framework. MV-Arc-

Softmax explicitly deﬁnes mis-classiﬁed samples as hard samples and adaptively

strengthens them by increasing the weights of corresponding negative cosine

similarities, eventually producing a larger feature margin between the ground

truth class and the corresponding negative target class. CurricularFace applies

curriculum learning to face recognition, focusing on easy samples in the early

stage and hard samples in the later stage. However, on the one hand, both

take the relationship between the sample ground truth class and all negative

classes into consideration, which may complicate the optimization of the de-

cision boundary; on the other hand, label noise poses some adverse eﬀect on

mining. It is well known that the success of face recognition nowadays beneﬁts

from large-scale training data. Noise is inevitably in these million-scale datasets.

Unfortunately, Building a “clean enough” face dataset, however, is both costly

and diﬃcult. Both MV-Arc-Softmax and CurricularFace assume that the dataset

is clean (i.e., almost noiseless), but this assumption is not true in many cases.

Intuitively, the more noise the dataset contains, the worse performance of the

mining-based softmax loss will be. Unlike open-set noise, closed-set noise can be

part of the clean data as soon as we correct their labels. Overall, our method

diﬀers from the currently popular mining-based softmax in that our method can

conduct hard sample mining along with the closed-set noise well being handled,

while the current methods cannot do so.

2.3 Metric learning loss

Triplet loss[19] is a classical metric learning algorithm. Even though the prob-

lem of combinatorial explosion has led many researchers to turn their attention

to the adaptation of traditional softmax, there are still some researchers who

explore the optimization of metric loss. Introducing the idea of proxy to metric

learning is the mainstream choice at present. The proxy-triplet[27,16] replaces

the positive and negative samples in the standard triplet loss with positive and

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BoundaryFace:Aminingframeworkwithnoiselabelself-correctionforFaceRecognitionShijieWuandXunGongSchoolofComputingandArtificialIntelligence,SouthwestJiaotongUniversity,Chengdu,Sichuan,Chinaxgong@swjtu.edu.cnAbstract.Facerecognitionhasmadetremendousprogressinrecentyearsduetotheadvancesinlossfunctionsand...

展开>> 收起<<

BoundaryFace A mining framework with noise label self-correction for Face Recognition Shijie Wu and Xun Gong.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

BoundaryFace A mining framework with noise label self-correction for Face Recognition Shijie Wu and Xun Gong

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: