Mathematical Justiﬁcation of Hard Negative Mining via Isometric Approximation Theorem Albert Xu Jhih-Yi Hsieh Bhaskar Vundurthy Eliana Cohen Howie Choset Lu Li

2025-05-02 0 0 1.44MB 9 页 10玖币

侵权投诉

Mathematical Justiﬁcation of Hard Negative Mining

via Isometric Approximation Theorem

Albert Xu, Jhih-Yi Hsieh, Bhaskar Vundurthy, Eliana Cohen, Howie Choset, Lu Li

Abstract

In deep metric learning, the Triplet Loss has emerged as a

popular method to learn many computer vision and natural

language processing tasks such as facial recognition, object

detection, and visual-semantic embeddings. One issue that

plagues the Triplet Loss is network collapse, an undesirable

phenomenon where the network projects the embeddings of

all data onto a single point. Researchers predominately solve

this problem by using triplet mining strategies. While hard

negative mining is the most effective of these strategies, exist-

ing formulations lack strong theoretical justiﬁcation for their

empirical success. In this paper, we utilize the mathematical

theory of isometric approximation to show an equivalence be-

tween the Triplet Loss sampled by hard negative mining and

an optimization problem that minimizes a Hausdorff-like dis-

tance between the neural network and its ideal counterpart

function. This provides the theoretical justiﬁcations for hard

negative mining’s empirical efﬁcacy. In addition, our novel

application of the isometric approximation theorem provides

the groundwork for future forms of hard negative mining that

avoid network collapse. Our theory can also be extended to

analyze other Euclidean space-based metric learning meth-

ods like Ladder Loss or Contrastive Learning.

Introduction

Research in deep metric learning studies techniques for

training deep neural networks to learn similarities and dis-

similarities between data samples, typically by learning a

distance metric via feature embeddings in Rn. Most ex-

tensively, deep metric learning is used in face recognition

(Schroff, Kalenichenko, and Philbin 2015; Liu et al. 2017;

Hermans, Beyer, and Leibe 2017) and other computer vi-

sion tasks (Tack et al. 2020; Chen et al. 2020a) where there

are an abundance of label values.

Common deep metric learning techniques include con-

trastive loss (Hadsell, Chopra, and LeCun 2006) and triplet

loss (Schroff, Kalenichenko, and Philbin 2015). Moreover,

each of these methods have variants to address speciﬁc ap-

plications. SimCLR (Chen et al. 2020a,b), for example, is a

recent contrastive loss variant designed to address unsuper-

vised deep metric learning with state-of-the-art performance

on ImageNet (Russakovsky et al. 2015). Ladder Loss (Zhou

et al. 2019), a generalized variant of triplet loss, improved

upon existing methods for coherent visual-semantic embed-

ding and has important applications in multiple visual and

language understanding tasks (Karpathy, Joulin, and Fei-Fei

2014; Ma, Lu, and Li 2015; Vinyals et al. 2014). Given the

success of metric learning in a wide range of applications,

we see value in investigating its underlying theories. In this

paper, we present a theoretical framework which explains

observed but previously unexplained behaviors of the Triplet

Loss.

Literature Review

We choose to analyze Triplet Loss’s underlying theory due

to its strong dependence on the triplet selection strategy.

This makes the Triplet Loss ﬁckle to work with, as empir-

ical results had shown that randomly sampling these triplets

yielded unsatisfactory results. On the other hand, success-

ful triplet selection strategies like hard negative mining can

face issues like network collapse, a phenomenon where the

network projects all data points onto a single point (Schroff,

Kalenichenko, and Philbin 2015), while more stable triplet

selection strategies do not perform as well in practice (Her-

mans, Beyer, and Leibe 2017).

In the original FaceNet paper, Schroff et al. ﬁnd that

with large batch sizes (thousands), hard negative mining

lead to collapsed solutions. To address this, they instead

used a strategy they called semi-hard mining (Schroff,

Kalenichenko, and Philbin 2015). On the other hand, Her-

man et al. ﬁnd that with smaller batch sizes (N= 72),

the hardest mining strategy signiﬁcantly out-performs other

mining strategies, and does not suffer from collapsed solu-

tions (Hermans, Beyer, and Leibe 2017). These seemingly

contradictory results showcase the need for a theoretical

framework to explain the theory of hard negative mining and

the root cause of collapsed solutions.

There has been some prior literature investigating the

phenomenon of network collapse. Xuan et al. show that

hard negative mining leads to collapsed solutions by ana-

lyzing the gradients of a simpliﬁed neural network model

(Xuan et al. 2020). However, they do not account for the

many cases where hard negative mining does work. Levi et

al. prove that, under a label randomization assumption, the

globally optimal solution to the triplet loss necessarily ex-

hibits network collapse (Levi et al. 2021). Rather than inves-

tigating functional hard mining strategies, Levi et al. instead

suggest using the less effective easy positive mining to avoid

network collapse.

In literature, there are plenty of claims that hard negative

mining succeeds (Hermans, Beyer, and Leibe 2017; Faghri

et al. 2017), and numerous examples where it fails (Schroff,

Kalenichenko, and Philbin 2015; Ge, Gao, and Liu 2019;

Oh Song et al. 2016). Our work explains why network col-

lapse happens by using the theory of isometric approxima-

tion to better characterize the behavior of the Triplet Loss.

Background and Deﬁnitions

Establishing the notation used in the paper, let Xbe the data

manifold and let Ybe the classes with |Y| =cbeing the

number of classes. Let h:X → Y be the true hypoth-

esis function, or true labels of the data. Then the dataset

consists of pairs {(xk, yk)}N

k=1 with xk∈ X , yk∈ Y and

yk=h(xk). We deﬁne the learned neural network as a func-

tion fθ:X → Rnwhich maps similar points in the data

manifold Xto similar points in Rn.

As our paper focuses on metric learning, we deﬁne the

similarity between embeddings to be the Euclidean distance

d(r1, r2) = ||r1−r2|| where r1, r2∈Rn. Further, we de-

ﬁne the shorthand dθ(x1, x2) = ||fθ(x1)−fθ(x2)|| where

x1, x2∈ X .

Triplet Loss and Hard Negative Mining

In this section, we discuss the Triplet Loss, one of the more

successful approaches to supervised metric learning intro-

duced by Schroff et al. (Schroff, Kalenichenko, and Philbin

2015). The Triplet Loss considers samples as triplets of data,

composed of the anchor (x∈ X ), positive (x+), and nega-

tive (x−)samples, described in (1). The similarity relation

(1a) requires that the anchor and positive samples must be of

the same class, while the dissimilarity relation (1b) requires

the anchor and negative must be of different classes.

x+∈ {x0∈ X |h(x) = h(x0)}(1a)

x−∈ {x0∈ X |h(x)6=h(x0)}(1b)

Restating the objective of supervised metric learning, the

embedding of the anchor sample must be closer to the posi-

tive than the negative for every triplet. An example of a sat-

isfactory triplet is shown in Figure 1. Formally, we express

this relation via (2), where αis the margin term.

dθ(x, x+) + α≤dθ(x, x−)∀x, x+, x−∈ X (2)

This leads to the deﬁnition of the Triplet Loss in (3).

LTriplet =dθ(x, x+)−dθ(x, x−) + α+(3)

The function [·]+= max(·,0) zeroes negative values in

order to ignore all the triplets that already satisfy the desired

relation. In addition, as the margin αadds only a constant

value to the loss function, its effect is negligible for small

α. Therefore, we will assume a zero value for the margin

(α= 0) for the remainder of this paper.

Figure 1: An example Anchor, Positive, and Negative triplet.

The blue dotted contour is the Triplet-Separated boundary

for Class A. It is computed by considering inequality (4) for

all points in Class A. Because Class B is outside the Triplet-

Separated boundary for Class A, the Triplet Loss for this

example is zero.

Deﬁnition 1. Triplet-Separated. We refer to mnon-empty

subsets X1,· · · , Xm⊂Rnas Triplet-Separated if for ev-

ery Xiand Xjwith i6=jwe have

||x−y|| ≤ ||x−z|| ∀x, y ∈Xi,∀z∈Xj(4)

This property can be extended to a function fθ:X → Rn

by checking whether the embedding subsets Xi

fθare Triplet-

Separated.

fθ={fθ(x)|x∈ X , h(x) = i}(5)

It is worth noting that LTriplet(fθ) = 0 if and only if fθis

Triplet-Separated. An example of two Triplet-Separated sets

is shown in Figure 1.

As mentioned in the Literature Review section, the Triplet

Loss relies heavily on its triplet mining strategy to achieve

its performance for two popularly accepted reasons: First,

enumerating all O(N3)triplets of data every iteration would

be too computationally intensive to guarantee fast training.

Second, improper sampling of triplets risks network col-

lapse (Xuan et al. 2020). Our work substantiates the use of

hard negative mining, a successful triplet mining strategy, by

characterizing conditions that lead to network collapse.

Isometric Approximation

We will present a novel application of the isometric ap-

proximation theorem in Euclidean subsets in order to math-

ematically justify hard negative mining. The isometric ap-

proximation theorem primarily deﬁnes the behavior of near-

isometries, or functions that are close to isometries, as given

by Deﬁnition 2.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MathematicalJusticationofHardNegativeMiningviaIsometricApproximationTheoremAlbertXu,Jhih-YiHsieh,BhaskarVundurthy,ElianaCohen,HowieChoset,LuLiAbstractIndeepmetriclearning,theTripletLosshasemergedasapopularmethodtolearnmanycomputervisionandnaturallanguageprocessingtaskssuchasfacialrecognition,object...

展开>> 收起<<

Mathematical Justiﬁcation of Hard Negative Mining via Isometric Approximation Theorem Albert Xu Jhih-Yi Hsieh Bhaskar Vundurthy Eliana Cohen Howie Choset Lu Li.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Mathematical Justiﬁcation of Hard Negative Mining via Isometric Approximation Theorem Albert Xu Jhih-Yi Hsieh Bhaskar Vundurthy Eliana Cohen Howie Choset Lu Li

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: