Mathematical Justification of Hard Negative Mining via Isometric Approximation Theorem Albert Xu Jhih-Yi Hsieh Bhaskar Vundurthy Eliana Cohen Howie Choset Lu Li

2025-05-02 0 0 1.44MB 9 页 10玖币
侵权投诉
Mathematical Justification of Hard Negative Mining
via Isometric Approximation Theorem
Albert Xu, Jhih-Yi Hsieh, Bhaskar Vundurthy, Eliana Cohen, Howie Choset, Lu Li
Abstract
In deep metric learning, the Triplet Loss has emerged as a
popular method to learn many computer vision and natural
language processing tasks such as facial recognition, object
detection, and visual-semantic embeddings. One issue that
plagues the Triplet Loss is network collapse, an undesirable
phenomenon where the network projects the embeddings of
all data onto a single point. Researchers predominately solve
this problem by using triplet mining strategies. While hard
negative mining is the most effective of these strategies, exist-
ing formulations lack strong theoretical justification for their
empirical success. In this paper, we utilize the mathematical
theory of isometric approximation to show an equivalence be-
tween the Triplet Loss sampled by hard negative mining and
an optimization problem that minimizes a Hausdorff-like dis-
tance between the neural network and its ideal counterpart
function. This provides the theoretical justifications for hard
negative mining’s empirical efficacy. In addition, our novel
application of the isometric approximation theorem provides
the groundwork for future forms of hard negative mining that
avoid network collapse. Our theory can also be extended to
analyze other Euclidean space-based metric learning meth-
ods like Ladder Loss or Contrastive Learning.
Introduction
Research in deep metric learning studies techniques for
training deep neural networks to learn similarities and dis-
similarities between data samples, typically by learning a
distance metric via feature embeddings in Rn. Most ex-
tensively, deep metric learning is used in face recognition
(Schroff, Kalenichenko, and Philbin 2015; Liu et al. 2017;
Hermans, Beyer, and Leibe 2017) and other computer vi-
sion tasks (Tack et al. 2020; Chen et al. 2020a) where there
are an abundance of label values.
Common deep metric learning techniques include con-
trastive loss (Hadsell, Chopra, and LeCun 2006) and triplet
loss (Schroff, Kalenichenko, and Philbin 2015). Moreover,
each of these methods have variants to address specific ap-
plications. SimCLR (Chen et al. 2020a,b), for example, is a
recent contrastive loss variant designed to address unsuper-
vised deep metric learning with state-of-the-art performance
on ImageNet (Russakovsky et al. 2015). Ladder Loss (Zhou
Copyright © 2022, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
et al. 2019), a generalized variant of triplet loss, improved
upon existing methods for coherent visual-semantic embed-
ding and has important applications in multiple visual and
language understanding tasks (Karpathy, Joulin, and Fei-Fei
2014; Ma, Lu, and Li 2015; Vinyals et al. 2014). Given the
success of metric learning in a wide range of applications,
we see value in investigating its underlying theories. In this
paper, we present a theoretical framework which explains
observed but previously unexplained behaviors of the Triplet
Loss.
Literature Review
We choose to analyze Triplet Loss’s underlying theory due
to its strong dependence on the triplet selection strategy.
This makes the Triplet Loss fickle to work with, as empir-
ical results had shown that randomly sampling these triplets
yielded unsatisfactory results. On the other hand, success-
ful triplet selection strategies like hard negative mining can
face issues like network collapse, a phenomenon where the
network projects all data points onto a single point (Schroff,
Kalenichenko, and Philbin 2015), while more stable triplet
selection strategies do not perform as well in practice (Her-
mans, Beyer, and Leibe 2017).
In the original FaceNet paper, Schroff et al. find that
with large batch sizes (thousands), hard negative mining
lead to collapsed solutions. To address this, they instead
used a strategy they called semi-hard mining (Schroff,
Kalenichenko, and Philbin 2015). On the other hand, Her-
man et al. find that with smaller batch sizes (N= 72),
the hardest mining strategy significantly out-performs other
mining strategies, and does not suffer from collapsed solu-
tions (Hermans, Beyer, and Leibe 2017). These seemingly
contradictory results showcase the need for a theoretical
framework to explain the theory of hard negative mining and
the root cause of collapsed solutions.
There has been some prior literature investigating the
phenomenon of network collapse. Xuan et al. show that
hard negative mining leads to collapsed solutions by ana-
lyzing the gradients of a simplified neural network model
(Xuan et al. 2020). However, they do not account for the
many cases where hard negative mining does work. Levi et
al. prove that, under a label randomization assumption, the
globally optimal solution to the triplet loss necessarily ex-
hibits network collapse (Levi et al. 2021). Rather than inves-
tigating functional hard mining strategies, Levi et al. instead
suggest using the less effective easy positive mining to avoid
network collapse.
In literature, there are plenty of claims that hard negative
mining succeeds (Hermans, Beyer, and Leibe 2017; Faghri
et al. 2017), and numerous examples where it fails (Schroff,
Kalenichenko, and Philbin 2015; Ge, Gao, and Liu 2019;
Oh Song et al. 2016). Our work explains why network col-
lapse happens by using the theory of isometric approxima-
tion to better characterize the behavior of the Triplet Loss.
Background and Definitions
Establishing the notation used in the paper, let Xbe the data
manifold and let Ybe the classes with |Y| =cbeing the
number of classes. Let h:X Y be the true hypoth-
esis function, or true labels of the data. Then the dataset
consists of pairs {(xk, yk)}N
k=1 with xk X , yk∈ Y and
yk=h(xk). We define the learned neural network as a func-
tion fθ:X Rnwhich maps similar points in the data
manifold Xto similar points in Rn.
As our paper focuses on metric learning, we define the
similarity between embeddings to be the Euclidean distance
d(r1, r2) = ||r1r2|| where r1, r2Rn. Further, we de-
fine the shorthand dθ(x1, x2) = ||fθ(x1)fθ(x2)|| where
x1, x2 X .
Triplet Loss and Hard Negative Mining
In this section, we discuss the Triplet Loss, one of the more
successful approaches to supervised metric learning intro-
duced by Schroff et al. (Schroff, Kalenichenko, and Philbin
2015). The Triplet Loss considers samples as triplets of data,
composed of the anchor (x X ), positive (x+), and nega-
tive (x)samples, described in (1). The similarity relation
(1a) requires that the anchor and positive samples must be of
the same class, while the dissimilarity relation (1b) requires
the anchor and negative must be of different classes.
x+∈ {x0 X |h(x) = h(x0)}(1a)
x∈ {x0 X |h(x)6=h(x0)}(1b)
Restating the objective of supervised metric learning, the
embedding of the anchor sample must be closer to the posi-
tive than the negative for every triplet. An example of a sat-
isfactory triplet is shown in Figure 1. Formally, we express
this relation via (2), where αis the margin term.
dθ(x, x+) + αdθ(x, x)x, x+, x∈ X (2)
This leads to the definition of the Triplet Loss in (3).
LTriplet =dθ(x, x+)dθ(x, x) + α+(3)
The function [·]+= max(·,0) zeroes negative values in
order to ignore all the triplets that already satisfy the desired
relation. In addition, as the margin αadds only a constant
value to the loss function, its effect is negligible for small
α. Therefore, we will assume a zero value for the margin
(α= 0) for the remainder of this paper.
Figure 1: An example Anchor, Positive, and Negative triplet.
The blue dotted contour is the Triplet-Separated boundary
for Class A. It is computed by considering inequality (4) for
all points in Class A. Because Class B is outside the Triplet-
Separated boundary for Class A, the Triplet Loss for this
example is zero.
Definition 1. Triplet-Separated. We refer to mnon-empty
subsets X1,· · · , XmRnas Triplet-Separated if for ev-
ery Xiand Xjwith i6=jwe have
||xy|| ≤ ||xz|| ∀x, y Xi,zXj(4)
This property can be extended to a function fθ:X Rn
by checking whether the embedding subsets Xi
fθare Triplet-
Separated.
Xi
fθ={fθ(x)|x X , h(x) = i}(5)
It is worth noting that LTriplet(fθ) = 0 if and only if fθis
Triplet-Separated. An example of two Triplet-Separated sets
is shown in Figure 1.
As mentioned in the Literature Review section, the Triplet
Loss relies heavily on its triplet mining strategy to achieve
its performance for two popularly accepted reasons: First,
enumerating all O(N3)triplets of data every iteration would
be too computationally intensive to guarantee fast training.
Second, improper sampling of triplets risks network col-
lapse (Xuan et al. 2020). Our work substantiates the use of
hard negative mining, a successful triplet mining strategy, by
characterizing conditions that lead to network collapse.
Isometric Approximation
We will present a novel application of the isometric ap-
proximation theorem in Euclidean subsets in order to math-
ematically justify hard negative mining. The isometric ap-
proximation theorem primarily defines the behavior of near-
isometries, or functions that are close to isometries, as given
by Definition 2.
摘要:

MathematicalJusticationofHardNegativeMiningviaIsometricApproximationTheoremAlbertXu,Jhih-YiHsieh,BhaskarVundurthy,ElianaCohen,HowieChoset,LuLiAbstractIndeepmetriclearning,theTripletLosshasemergedasapopularmethodtolearnmanycomputervisionandnaturallanguageprocessingtaskssuchasfacialrecognition,object...

展开>> 收起<<
Mathematical Justification of Hard Negative Mining via Isometric Approximation Theorem Albert Xu Jhih-Yi Hsieh Bhaskar Vundurthy Eliana Cohen Howie Choset Lu Li.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:9 页 大小:1.44MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注