Matching Map Recovery with an Unknown Number of Outliers Arshak Minasyan Tigran Galstyan Sona Hunanyan Arnak Dalalyan CREST ENSAE IP Paris

2025-05-02 0 0 3.27MB 16 页 10玖币
侵权投诉
Matching Map Recovery with an Unknown Number of Outliers
Arshak Minasyan* Tigran Galstyan Sona Hunanyan Arnak Dalalyan
CREST, ENSAE, IP Paris
5 av. Henry Le Chatelier
91764 Palaiseau
arshak.minasyan@ensae.fr
RAU, YerevaNN
20 Charents street
0025 Yerevan
tigran@yerevann.com
Yerevan State University
1 Alek Manukyan
0025 Yerevan
hunan.sona@gmail.com
CREST, ENSAE, IP Paris
5 av. Henry Le Chatelier
91764 Palaiseau
arnak.dalalyan@ensae.fr
Abstract
We consider the problem of finding the match-
ing map between two sets of
d
-dimensional noisy
feature-vectors. The distinctive feature of our set-
ting is that we do not assume that all the vectors
of the first set have their corresponding vector in
the second set. If
n
and
m
are the sizes of these
two sets, we assume that the matching map that
should be recovered is defined on a subset of un-
known cardinality
kmin(n, m)
. We show
that, in the high-dimensional setting, if the signal-
to-noise ratio is larger than
5(dlog(4nm/α))1/4
,
then the true matching map can be recovered
with probability
1α
. Interestingly, this thresh-
old does not depend on
k
and is the same as
the one obtained in prior work in the case of
k= min(n, m)
. The procedure for which the
aforementioned property is proved is obtained
by a data-driven selection among candidate map-
pings
{bπk:k[min(n, m)]}
. Each
bπk
mini-
mizes the sum of squares of distances between
two sets of size
k
. The resulting optimization
problem can be formulated as a minimum-cost
flow problem, and thus solved efficiently. Finally,
we report the results of numerical experiments on
both synthetic and real-world data that illustrate
our theoretical results and provide further insight
into the properties of the algorithms studied in
this work.
1 INTRODUCTION
The problem of finding the best matching between two point
clouds has been extensively studied, both theoretically and
Proceedings of the 26
th
International Conference on Artificial Intel-
ligence and Statistics (AISTATS) 2023, Valencia, Spain. PMLR:
Volume 206. Copyright 2023 by the author(s).
experimentally. The matching problem arises in various
applications, for instance in computer vision and natural
language processing. In computer vision, finding the corre-
spondence between two sets of local descriptors extracted
from two images of the same scene is a well-known exam-
ple of a matching problem. In natural language processing,
in particular, in machine translation, the correspondence
between vector representations of the same text in two dif-
ferent languages is another example of a matching problem.
Clearly, in these problems, not all the points have their
matching point and one can hardly know in advance how
many points have their corresponding matching points. The
goal of the present work is to focus on this setting and to
gain a theoretical understanding of the statistical limitations
of the matching problem.
To formulate the problem and to state the main result, let
X= (X1, . . . , Xn)
and
X#= (X#
1, . . . , X#
m)
be two
sequences of feature vectors of sizes
n
and
m
such that
mn2
. We assume that these sequences are noisy
versions of some feature-vectors, i.e.,
(Xi=θi+σξi,
X#
j=θ#
j+σ#ξ#
j,i[n]and j[m],(1)
where
θ= (θ1, . . . , θn)
and
θ#= (θ#
1, . . . , θ#
m)
are two
sequences of deterministic vectors from
Rd
, corresponding
to the original feature-vectors. The noise components of
X
and
X#
are two independent sequences of i.i.d. isotropic
Gaussian random vectors. Formally,
ξ1, . . . , ξn, ξ#
1, . . . , ξ#
m
i.i.d.
∼ N(0,Id),
where
Id
is the identity matrix of size
d×d
. We assume
that for some
S[n]
of cardinality
k
, there exists an
injective mapping
π:S[m]
such that
θi=θ#
π(i)
holds for all
iS
. We call the observations
(Xi:iS)
and
(X#
π(i):iS)
inliers, while the other vectors from
the sequences
X
and
X#
are considered to be outliers. The
ultimate goal is to recover
π
based on the observations
X
and X#only.
Various versions of this problem have been studied in the
literature. Collier and Dalalyan [2013,2016] considered the
arXiv:2210.13354v2 [math.ST] 9 Mar 2023
Matching Map Recovery with an Unknown Number of Outliers
X1
X2
X3
X4
X#
1
X#
2
X#
3
X#
4
X#
5
src sink
XX#
d1,1
d2,1
d3,5
Figure 1: Matching as a Minimum Cost Flow (MCF) prob-
lem. The idea is to augment the graph with two nodes,
source and sink, and
n+m
edges. The capacities of orange
edges should be set to
1
, while the cost should be set to
0
. Setting the total flow sent through the graph to
k
, the
solution of the MCF becomes a matching of size k.
outlier-free case with equal sizes of sequences
X
and
X#
(i.e.,
m=n
and
S= [n]
), whereas Galstyan et al. [2022]
investigated the case with outliers in one of the sequences
only (i.e.,
mn
and
S= [n]
). Other variations of the
matching problem under Hamming loss have been studied
by Wang et al. [2022], Chen et al. [2022b], Kunisky and
Niles-Weed [2022]. These papers obtain minimax-optimal
separation rates and, in most cases, despite the discrete
nature of the matching problem, provide computationally
tractable procedures to achieve these rates.
When
S
is an arbitrary subset of
[n]
, which is the setting we
focus on in this work, one can wonder whether the minimax
separation rate is the same as in the case of known
S
. Since
the absence of knowledge on
S
brings additional combi-
natorial complexity to the problem, one can also wonder
whether it is still possible to conciliate statistical optimality
and computational tractability. We show in this work that
the answers to these questions are affirmative.
To explain our result, let us introduce the quantity
κi,j =kθiθ#
jk2/(σ2+σ#2)1/2,
which is the signal-to-noise ratio of the difference
XiX#
j
of a pair of feature-vectors. Clearly, for matching pairs this
difference vanishes. Furthermore, if
κi,j
vanishes or is very
small for a non-matching pair, then there is an identifiability
issue and consistent recovery of underlying true matching
is impossible. Therefore, a natural condition for making
consistent recovery possible is to assume that the quantity
¯κall ,min
i[n]min
j[m]\{π(i)}κi,j
is bounded away from zero. A recovery procedure
bπ
is con-
sidered to be good, if the threshold
λ
such that
bπ
recovers
π
with high probability as soon as
¯κall λ
is as small as
possible. It was proved in [Collier and Dalalyan,2016] that
when
k=n=m
, one can recover
π
with probability
1α
for
λ= 4dlog(4n2/α)1/48 log(4n2/α)1/2
.
Furthermore, it was proved that this threshold is minimax
optimal, i.e., optimal in the family of all possible recovery
procedures. This implies that there are two regimes. In the
low dimensional regime
d.log n
, the separation rate is
dimension independent. In contrast with this, the separa-
tion rate scales roughly as
d1/4
in the (moderately) high
dimensional regime d&log n.
Let us set
λn,m,d,α = 4dlog( 4nm
α)1
/48 log(4nm
α)1
/2.(2)
The main contributions of this work are the following.
For any given
k[min(n, m)]
, we show that the
k
Least Sum of Squares (
k
-LSS) procedure, based on
maximizing profile likelihood among matching maps
between two sets of size
k
, can be efficiently computed
using the minimum cost flow problem. We denote the
matching obtained using k-LSS by bπLSS
k.
If the value
k
turns out to be smaller than
k
and
¯κall
λn,m,d,α
, we prove that
bπLSS
k
makes no mistake with
probability 1α.
We design a data-driven model selection algorithm
that adaptively chooses
b
k
such that with probability
1α
, we have
b
k=k
and
bπLSS
b
k=π
as soon as
¯κall (5/4)λn,m,d,α.
The last item above implies that our data-driven algorithm
bπLSS
b
k
achieves the minimax separation rate. More surpris-
ingly, this shows that there is no gap in statistical complexi-
ties between the problems of recovering matching maps in
outlier-free and outliers-present-on-both-sides settings.
2 RELATED WORK
In statistical hypothesis testing, the separation rates became
key objects for measuring the quality of statistical proce-
dures, see the seminal papers [Burnashev,1979,Ingster,
1982] as well as the monographs [Ingster and Suslina,2003,
Juditsky and Nemirovski,2020]. Currently, this approach is
widely adopted in machine learning literature [Xing et al.,
2020,Wolfer and Kontorovich,2020,Blanchard et al.,2018,
Ramdas et al.,2016,Wei et al.,2019,Collier,2012]. Beyond
the classical setting of two hypotheses, it can also be ap-
plied to multiple testing frameworks, for instance, variable
selection [Ndaoud and Tsybakov,2020,Azaïs and de Cas-
tro,2020,Comminges and Dalalyan,2012,2013] or the
matching problem considered here.
In computer vision, feature matching is a well-studied prob-
lem. One of the main directions is to accelerate matching
Minasyan, Galstyan, Hunanyan, Dalalyan
algorithms, based on fast approximate methods (see e.g.
Malkov and Yashunin [2020], Wang et al. [2018], Harwood
and Drummond [2016], Jiang et al. [2016]). Another di-
rection is to improve the matching quality by considering
alternative local descriptors [Rublee et al.,2011,Chen et al.,
2010,Calonder et al.,2010] for given keypoints. The choice
of keypoints is considered in Tian et al. [2020], Bai et al.
[2020].
The minimum cost flow problem was first studied in the
context of the Hungarian algorithm [Kuhn,2012] and the
assignment problem, which is a special case of minimum
cost flow on bipartite graphs with all edges having unit ca-
pacity. Generalization of Hungarian algorithm for graphs
with arbitrary edge costs guarantees
O((n+F)m)
time
complexity, where
n
is the number of nodes in the graph,
m
is the number of edges and
F
is the total flow sent through
the graph. There have also been other algorithms with sim-
ilar complexity guarantees [Fulkerson,1961,Ahuja et al.,
1992]. Since then many algorithms have been proposed for
solving minimum cost flow problems in strongly polynomial
time [Orlin et al.,1993,Orlin,1993,1996,Goldberg and
Tarjan,1989,Galil and Tardos,1988] with the fastest run-
time of around
O(nm)
. Recent advances for solving MCF
problems have been proposed in Goldberg et al. [2015] and
Chen et al. [2022a]. The latter proposes an algorithm with
an almost-linear computational time.
Permutation estimation and related problems have been re-
cently investigated in different contexts such as statistical
seriation [Flammarion et al.,2019,Giraud et al.,2021,Cai
and Ma,2022], noisy sorting [Mao et al.,2018], regression
with shuffled data [Pananjady et al.,2017,Slawski and Ben-
David,2019], isotonic regression and matrices [Mao et al.,
2020,Pananjady and Samworth,2020,Ma et al.,2020],
crowd labeling [Shah et al.,2021], recovery of general dis-
crete structure [Gao and Zhang,2019], and multitarget track-
ing [Chertkov et al.,2010,Kunisky and Niles-Weed,2022].
3 MAIN THEORETICAL RESULT
This section contains the main theoretical contribution of
the present work. In order to be able to recover
S
and the
matching map
π
, the key ingredient we use is the max-
imization of the profile likelihood. This corresponds to
looking for the least sum of squares (LSS) of errors over
all injective mappings defined on a subset of
[n]
of size
k
.
Formally, if we define
Pk:= π:S[m]such that S[n],|S|=k,
πis injective
to be the set of all
k
-matching maps, we can define the
procedure k-LSS as a solution to the optimization problem
bπLSS
karg min
π∈PkX
iSπkXiX#
π(i)k2
2,(3)
where
Sπ
denotes the support of function
π
. In the particular
case of
k=n
, the optimization above is conducted over
all the injective mappings from
[n]
to
[m]
. This coincides
with the LSS method from [Galstyan et al.,2022].
Let b
Φ(k)be the error of bπLSS
k, that is
b
Φ(k) = min
π∈PkXiSπkXiX#
π(i)k2
2.
For some values of tuning parameters
λ > 0
and
γ > 0
, as
well as for some kmin [n], initialize kkmin and
1. Compute b
Φ(k)and b
Φ(k+ 1).
2. Set ¯σ2
k=b
Φ(k)/(kd).
3. If k=nor b
Φ(k+ 1) b
Φ(k)>d+λ
1γ¯σ2
k,
then output (k, ¯σk,bπLSS
k).
4. Otherwise, increase kk+ 1 and go to Step 1.
In the sequel, we denote by
(b
k, ¯σb
k,bπLSS
b
k)
the output of this
procedure. Notice that we start with the value of k=kmin,
which in the absence of any information on the number of
inliers might be set to
k= 1
. However, using a higher
value of
kmin
might considerably speed up the procedure
and improve its quality.
For appropriately chosen values of
γ
and
λ
, as stated in the
next theorem, the described procedure outputs the correct
values of kand πwith high probability.
Theorem 1.
Let
α(0,1)
and
λn,m,d,α
be defined by
(2)
. If
¯κall >(5
/4)λn,m,d,α
, then the output
(b
k, bπLSS
b
k)
of the model selection algorithm with parameters
λ=
(1
/4)λ2
n,m,d,α, γ =λ
/dsatisfies P(bπLSS
b
k=π)1α.
Since the condition on the separation distance
¯κall
compared
to the case of known
k
is different by only a slightly larger
constant, from the perspective of statistical accuracy, the
case of unknown
k
is not more challenging than that of the
known k.
In the sequel, without much loss of generality, we assume
that the sizes of
X
and
X#
are equal, i.e.,
n=m
. Indeed,
in the case,
m>n
one can add
mn
points arbitrarily far
from the rest of the points to the smaller set
X
obtaining
equal size sets X+and X#.
Notice that in the optimization problem
(3)
the domain of
π
is a finite set of injective functions. For a given value
of
k
, the number of such functions is
k!n
k2
making thus
an exhaustive search computationally infeasible. Instead,
we show in Section 5that the optimization problem formu-
lated in
(3)
can indeed be solved efficiently with complexity
e
O(k n2)
, where the notation
e
O
hides polylogarithmic fac-
tors, i.e., up to polylogarithmic factors, the computational
cost is of order kn2.
Matching Map Recovery with an Unknown Number of Outliers
4 INTERMEDIATE RESULTS AND
PROOF OF THEOREM 1
This section is devoted to the proof of our main result. Along
the way, we establish some intermediate results which are of
interest on their own. The proofs of some technical lemmas
are deferred to Appendix A.
4.1 Sub-mapping Recovery by LSS for kk
The first question we address in this section is under which
conditions the LSS estimator
bπLSS
k
from
(3)
recovers correct
matches. Of course, the only way of correctly estimating
the true matching is to choose
k=k
. However, it turns
out that even if we overestimate the number of outliers and
choose a value
k
which is smaller than the true value
k
,
with high probability the
k
-LSS estimator makes no wrong
matches. Naturally, this result, stated in the next theorem,
is valid under the condition that the relative signal-to-noise
ratio of all incorrect pairs of original features is larger than
some threshold.
Theorem 2
(Quality of
k
-LSS when
kk
)
.
Let
b
S=
supp(bπ)for bπ=bπLSS
kdefined by (3),α(0,1) and
λn,d,α = 4dlog(4n2/α)1/48 log(4n2/α)1/2.(4)
If
kk
and the signal-to-noise ratio satisfies the condition
¯κall λn,d,α
then, with probability at least
1α
, the
support of the estimator
bπ
is included in
S
and
bπ
coincides
with πon the set b
S. Formally,
Pb
SSand bπ(i) = π(i),ib
S1α.
Proof of Theorem 2.Note that the random vectors
ηij = (σξiσ#ξ#
j)/pσ2+σ#2
are standard Gaussian and define the following quantities
ζ1,max
i,j6=π(i)
|(θiθ#
j)>ηij |
kθiθ#
jk2
,
ζ2,d1/2max
i,j kηij k2
2d.
(5)
For the ease of notation, for any matching map
π
we also
define L(π)as follows
L(π) = XiSπ
kXiX#
π(i)k2
2
σ2+σ#2.
We start with two auxiliary lemmas that will be used in other
proofs as well. The proofs of these lemmas are deferred to
the appendix.
Lemma 1.
Let
π
be any matching map that can not be
obtained as a restriction of
π
on a subset of
[n]
. Let
S0
S
be an arbitrary set satisfying
|S0| ≤ |Sπ|
and
{i
SπS:π(i) = π(i)} ⊂ S0
and let
π0
be the restriction
of
π
to
S0
. On the event
0={8ζ1¯κall; 4d ζ2¯κ2
all}
,
we have
L(π)L(π0)(1
/4)¯κ2
all +d(|Sπ|−|S0|).
Let
π
be any matching map from
Pk
that is not a restriction
of
π
. Since
|Sπ|=kk
, there exists necessarily a
π0
as in Lemma 1such that
|S0|=|Sπ|
. For this
π0
, we
have
L(π)L(π0)(1
/4)¯κ2
all >0
. This implies that
π
cannot be a minimizer of
L(·)
over
Pk
. As a consequence,
on
0
, any minimizer of
L(·)
over
Pk
is a restriction of
π
. Therefore, on
0
, we have
b
SS
and
bπk=π|b
S
. It
remains to prove that P(Ω0)1α.
Lemma 2.
Let
0,x ={8ζ1x}∩{42x2}
with
ζ1, ζ2
defined as in
(5)
. Then, for every
x > 0
,
P(Ω{
0,x)
is
upper bounded by
2n2exp nx2
128o+ exp nx2
128dx24do.
We apply Lemma 2with
x= ¯κall
to show that
P(Ω0)
1α. Clearly, a sufficient condition for the latter is
2n2exp ¯κ2
all/128α/2,
2n2exp (¯κall/16)2
d2¯κ2
all 8dα/2.
This system is equivalent to
¯κall 82 log 4n2
α1/2and ¯κall 4d
2log 4n2
α1/4.
Therefore, if the signal-to-noise ratio satisfies
¯κall 4dlog(4n2/α)1/48 log(4n2/α)1/2,
we have P(Ω0)1α.
4.2 Matching Map Recovery for Unknown k
If no information on
k
is available, and the goal is to
recover the entire mapping
π
, one can proceed by model
selection. More precisely, one can compute the collection of
estimators
{bπLSS
k:k[n]}
and select one of those using a
suitable criterion. To define the selection criterion proposed
in this paper, let us remark that
b
Φ(k) = min
π∈PkXiSπkXiX#
π(i)k2
2
is an increasing function. The increments of this function for
kk
are not large, since they essentially correspond to the
squared norm of a pure noise vector distributed according
to a scaled
χ2
distribution with
d
degrees of freedom. The
main idea behind the criterion we propose below is that the
increment of
b
Φ
at
k
is significantly larger than the previous
摘要:

MatchingMapRecoverywithanUnknownNumberofOutliersArshakMinasyan*TigranGalstyanSonaHunanyanArnakDalalyanCREST,ENSAE,IPParis5av.HenryLeChatelier91764Palaiseauarshak.minasyan@ensae.frRAU,YerevaNN20Charentsstreet0025Yerevantigran@yerevann.comYerevanStateUniversity1AlekManukyan0025Yerevanhunan.sona@gmail....

展开>> 收起<<
Matching Map Recovery with an Unknown Number of Outliers Arshak Minasyan Tigran Galstyan Sona Hunanyan Arnak Dalalyan CREST ENSAE IP Paris.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:3.27MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注