BoundaryFace A mining framework with noise label self-correction for Face Recognition Shijie Wu and Xun Gong

2025-04-30 0 0 896.94KB 18 页 10玖币
侵权投诉
BoundaryFace: A mining framework with noise
label self-correction for Face Recognition
Shijie Wu and Xun Gong
School of Computing and Artificial Intelligence, Southwest Jiaotong University,
Chengdu, Sichuan, China
xgong@swjtu.edu.cn
Abstract. Face recognition has made tremendous progress in recent
years due to the advances in loss functions and the explosive growth in
training sets size. A properly designed loss is seen as key to extract dis-
criminative features for classification. Several margin-based losses have
been proposed as alternatives of softmax loss in face recognition. How-
ever, two issues remain to consider: 1) They overlook the importance
of hard sample mining for discriminative learning. 2) Label noise ubiq-
uitously exists in large-scale datasets, which can seriously damage the
model’s performance. In this paper, starting from the perspective of deci-
sion boundary, we propose a novel mining framework that focuses on the
relationship between a sample’s ground truth class center and its nearest
negative class center. Specifically, a closed-set noise label self-correction
module is put forward, making this framework work well on datasets
containing a lot of label noise. The proposed method consistently out-
performs SOTA methods in various face recognition benchmarks. Train-
ing code has been released at https://github.com/SWJTU-3DVision/
BoundaryFace.
Keywords: Face Recognition, Noise Label, Hard Sample Mining, Deci-
sion Boundary
1 Introduction
Face recognition is one of the most widely studied topics in the computer vision
community. Large-scale datasets, network architectures, and loss functions have
fueled the success of Deep Convolutional Neural Networks (DCNNs) on face
recognition. Particularly, with an aim to extract discriminative features, the
latest works have proposed some intuitively reasonable loss functions.
For face recognition, the current existing losses can be divided into two ap-
proaches: one deems the face recognition task to be a general classification prob-
lem, and networks are therefore trained using softmax [13,27,28,26,3,1,5,18,33];
the other approaches the problem using metric learning and directly learns an
embedding, such as [23,19,22]. Since metric learning loss usually suffers from
sample batch combination explosion and semi-hard sample mining, the second
problem needs to be addressed by more sophisticated sampling strategies. Loss
functions have therefore attracted increased attention.
arXiv:2210.04567v1 [cs.CV] 10 Oct 2022
2 Shijie Wu, Xun Gong
……
Class 2 Class 1
……
……
Class 3
Step 1
2
C
3
C
Step 2
Step 3
Fig. 1. The motivation of BoundaryFace. Step 1 denotes closed-set noise label self-
correction. Step 2 denotes nearest negative class match. Step 3 denotes hard sample
handle. For a noisy hard sample, we first correct its label, then match the nearest
negative class based on the correct label, and finally emphasize it using the decision
boundary consisting of this sample’s ground truth class center and the nearest negative
class center.
It has been pointed out that the classical classification loss function (i.e.,
Softmax loss) cannot obtain discriminative features. Based on current testing
protocols, the probe commonly has no overlap with the training images, so it
is particularly crucial to extract features with high discriminative ability. To
this end, Center loss [31] and NormFace [27] have been successively proposed
to obtain discriminative features. Wen et al. [31] developed a center loss that
learns each subject’s center. To ensure the training process is consistent with
testing, Wang et al. [27] made the features extracted by the network and the
weight vectors of the last fully connected layer lay on the unit hypersphere.
Recently, some margin-based softmax loss functions [13,28,26,3,14] have also
been proposed to enhance intra-class compactness while enlarging inter-class
discrepancy, resulting in more discriminative features.
The above approaches have achieved relatively satisfactory results. However,
there are two very significant issues that must still be addressed: 1) Previous
research has ignored the importance of hard sample mining for discriminative
learning. As illustrated in [8,2], hard sample mining is a crucial step in improving
performance. Therefore, some mining-based softmax losses have emerged. Very
recently, MV-Arc-Softmax [30], and CurricularFace [9] were proposed. They were
inspired by integrating both margin and mining into one framework. However,
both consider the relationship between the sample ground truth class and all
negative classes, which may complicate the optimization of the decision bound-
ary. 2) Both margin-based softmax loss and mining-based softmax loss ignore
the influence of label noise. Noise in face recognition datasets is composed of
two types: closed-set noise, in which some samples are falsely given the labels of
BoundaryFace 3
other identities within the same dataset, and open-set noise, in which a subset
of samples that do not belong to any of the classes, are mistakenly assigned one
of their labels, or contain some non-faces. Wang et al. [25] noted that noise,
especially closed-set noise, can seriously impact model’s performance. Unfortu-
nately, removing noise is expensive and, in many cases, impracticable. Intuitively,
the mining-based softmax loss functions can negatively impact the model if the
training set is noisy. That is, mining-based softmax is likely to perform less well
than baseline methods on datasets with severe noise problems. Designing a loss
function that can perform hard sample mining and tolerate noise simultaneously
is still an open problem.
In this paper, starting from the perspective of decision boundary, we pro-
pose a novel mining framework with tolerating closed-set noise. Fig. 1 illustrates
our motivation using a noisy hard sample processing. Specifically, based on the
premise of closed-set noise label correction, the framework directly emphasizes
hard sample features that are between the ground truth class center and the
nearest negative class center. We find out that if a sample is a closed-set noise,
there is a high probability that the sample is distributed within the nearest
negative class’s decision boundary, and the nearest negative class is likely to be
the ground truth class of the noisy sample. Based on this finding, we propose
a module that automatically discovers closed-set noise during training and dy-
namically corrects its labels. Based on this module, the mining framework can
work well on large-scale datasets under the impact of severe noise. To sum up,
the contributions of this work are:
We propose a novel mining framework with noise label self-correction, named
BoundaryFace, to explicitly perform hard sample mining as a guidance of
the discriminative feature learning.
The closed-set noise module can be used in any of the existing margin-based
softmax losses with negligible computational overhead. To the best of our
knowledge, this is the first solution for closed-set noise from the perspective
of the decision boundary.
We have conducted extensive experiments on popular benchmarks, which
have verified the superiority of our BoundaryFace over the baseline softmax
and the mining-based softmax losses.
2 Related Work
2.1 Margin-based softmax
Most recently, researchers have mainly focused on designing loss functions in
the field of face recognition. Since basic softmax loss cannot guarantee facial
features that are sufficiently discriminative, some margin-based softmax losses
[14,13,28,26,3,35], aiming at enhancing intra-class compactness while enlarging
inter-class discrepancy, have been proposed. Liu et al. [14] brought in multiplica-
tive margin to face recognition in order to produce discriminative feature. Liu
et al. [13] introduced an angular margin (A-Softmax) between ground truth class
4 Shijie Wu, Xun Gong
and other classes to encourage larger inter-class discrepancy. Since multiplica-
tive margin could encounter optimization problems, Wang et al. [28] proposed
an additive margin to stabilize optimization procedure. Deng et al. [3] changed
the form of the additive margin, which generated a loss with clear geometric sig-
nificance. Zhang et al. [35] studied on the effect of two crucial hyper-parameters
of traditional margin-based softmax losses, and proposed the AdaCos, by an-
alyzing how they modulated the predicted clasification probability. Even these
margin-based softmax losses have achieved relatively good performance, none of
them takes into account the effects of hard sample mining and label noise.
2.2 Mining-based softmax
There are two well-known hard sample mining methods, i.e., Focal loss [12], On-
line Hard Sample Mining (OHEM) [21]. Wang et al. [30] has shown that naive
combining them to current popular face recognition methods has limited im-
provement. Some recent work, MV-Arc-Softmax [30], and CurricularFace [9] are
inspired by integrating both margin and mining into one framework. MV-Arc-
Softmax explicitly defines mis-classified samples as hard samples and adaptively
strengthens them by increasing the weights of corresponding negative cosine
similarities, eventually producing a larger feature margin between the ground
truth class and the corresponding negative target class. CurricularFace applies
curriculum learning to face recognition, focusing on easy samples in the early
stage and hard samples in the later stage. However, on the one hand, both
take the relationship between the sample ground truth class and all negative
classes into consideration, which may complicate the optimization of the de-
cision boundary; on the other hand, label noise poses some adverse effect on
mining. It is well known that the success of face recognition nowadays benefits
from large-scale training data. Noise is inevitably in these million-scale datasets.
Unfortunately, Building a “clean enough” face dataset, however, is both costly
and difficult. Both MV-Arc-Softmax and CurricularFace assume that the dataset
is clean (i.e., almost noiseless), but this assumption is not true in many cases.
Intuitively, the more noise the dataset contains, the worse performance of the
mining-based softmax loss will be. Unlike open-set noise, closed-set noise can be
part of the clean data as soon as we correct their labels. Overall, our method
differs from the currently popular mining-based softmax in that our method can
conduct hard sample mining along with the closed-set noise well being handled,
while the current methods cannot do so.
2.3 Metric learning loss
Triplet loss[19] is a classical metric learning algorithm. Even though the prob-
lem of combinatorial explosion has led many researchers to turn their attention
to the adaptation of traditional softmax, there are still some researchers who
explore the optimization of metric loss. Introducing the idea of proxy to metric
learning is the mainstream choice at present. The proxy-triplet[27,16] replaces
the positive and negative samples in the standard triplet loss with positive and
摘要:

BoundaryFace:Aminingframeworkwithnoiselabelself-correctionforFaceRecognitionShijieWuandXunGongSchoolofComputingandArtificialIntelligence,SouthwestJiaotongUniversity,Chengdu,Sichuan,Chinaxgong@swjtu.edu.cnAbstract.Facerecognitionhasmadetremendousprogressinrecentyearsduetotheadvancesinlossfunctionsand...

展开>> 收起<<
BoundaryFace A mining framework with noise label self-correction for Face Recognition Shijie Wu and Xun Gong.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:896.94KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注