Learning Multi-resolution Functional Maps with Spectral Attention for Robust Shape Matching Lei Li

2025-04-27 0 0 4.67MB 17 页 10玖币
侵权投诉
Learning Multi-resolution Functional Maps with
Spectral Attention for Robust Shape Matching
Lei Li
LIX, École Polytechnique, IP Paris
lli@lix.polytechnique.fr
Nicolas Donati
LIX, École Polytechnique, IP Paris
nicolas.donati@polytechnique.edu
Maks Ovsjanikov
LIX, École Polytechnique, IP Paris
maks@lix.polytechnique.fr
Abstract
In this work, we present a novel non-rigid shape matching framework based
on multi-resolution functional maps with spectral attention. Existing functional
map learning methods all rely on the critical choice of the spectral resolution
hyperparameter, which can severely affect the overall accuracy or lead to overfitting,
if not chosen carefully. In this paper, we show that spectral resolution tuning can
be alleviated by introducing spectral attention. Our framework is applicable in
both supervised and unsupervised settings, and we show that it is possible to train
the network so that it can adapt the spectral resolution, depending on the given
shape input. More specifically, we propose to compute multi-resolution functional
maps that characterize correspondence across a range of spectral resolutions, and
introduce a spectral attention network that helps to combine this representation into
a single coherent final correspondence. Our approach is not only accurate with near-
isometric input, for which a high spectral resolution is typically preferred, but also
robust and able to produce reasonable matching even in the presence of significant
non-isometric distortion, which poses great challenges to existing methods. We
demonstrate the superior performance of our approach through experiments on a
suite of challenging near-isometric and non-isometric shape matching benchmarks.
1 Introduction
Shape matching is a critical task in 3D shape analysis and has been paramount to a broad spectrum of
downstream applications, including registration, deformation, and texture transfer [
1
,
2
], to name
a few. The algorithmic challenge of robust shape matching primarily lies in the fact that shapes
may undergo significant variations, such as arbitrary non-rigid deformations. Earlier works to tackle
non-rigid shape correspondence conventionally build upon hand-crafted features and pipelines [
3
],
while with the advent of deep learning, the research focus has largely shifted to data-driven and
learning-based approaches for improved matching robustness and accuracy [4].
To learn for non-rigid shape matching, a growing body of literature [
7
,
8
,
9
,
10
,
11
,
12
] advocates the
use of spectral techniques, in particular, the functional map representation [
13
], which compactly
encodes correspondences as small-sized matrices using a reduced spectral basis. A number of
advances have been made to the functional map-based networks in terms of probe feature learning [
5
,
14
], differentiable map regularization [
5
], supervised [
7
] and unsupervised learning [
9
,
8
,
15
], among
many others. Despite this progress, existing works nearly always learn functional maps in a single
spectral resolution (the number of basis functions used), which is often set empirically. However,
the functional map resolution plays a crucial role in the non-rigid shape matching performance,
arXiv:2210.06373v1 [cs.CV] 12 Oct 2022
Figure 1: (a) The unstable matching performance of GeomFmaps [
5
] w.r.t. the critical Spectral
Resolution hyperparameter on an animal shape dataset SMAL [
6
]. (b) Correspondence visualization
by texture transfer for near-isometric (top) and non-isometric (bottom) shapes. GeomFmaps is trained
with ground truth supervision, while our approach is not.
as observed in existing literature [
9
]. As a concrete example, Fig. 1-(a) shows that the matching
performance of a state-of-the-art supervised learning method GeomFmaps [
5
] fluctuates significantly
when trained with a different map resolution (i.e., size of the functional map). Therefore, to improve
robustness, it is desirable to enable networks to adaptively change the resolution in a data-dependent
manner: for near-isometric shapes, adopting a higher spectral resolution allows high-frequency details
to be leveraged for more precise matching; while for non-isometric shapes, adopting a lower spectral
resolution is advantageous to obtain approximate but potentially more robust maps.
Motivated by the above discussion, in this work, we propose a novel learning-based functional map
framework that learns to adaptively combine multi-resolution maps with a mechanism that we call
spectral attention, which can accommodate both near-isometric and non-isometric shapes at the same
time. Specifically, our framework consists of two novel components (Fig. 2): (1) multi-resolution
functional maps and (2) the spectral attention module.
Given as input a pair of non-rigid shapes, we first use a functional map network to estimate a series
of maps with varying spectral resolution. Next, we feed the obtained functional maps to a spectral
attention network to predict a weight for each map. The attention weights are used to combine
all the intermediate maps into a final coherent map. To enable such an assembly, we design a
differentiable spectral upsampling module that can transform the intermediate maps to the same
spectral resolution within a learnable network. Finally, to train our network, we propose to impose
penalties on the intermediate multi-resolution functional maps as well as the final map. This is
different from existing approaches, e.g., [
7
,
9
,
5
,
10
,
16
], which work with and penalize a single
hand-picked spectral resolution. Our method can be trained in both supervised and unsupervised
settings and can directly benefit from other advances in deep functional map training, such as
improved architectures or regularization. To evaluate our model, we perform a comprehensive set of
experiments on several challenging non-rigid shape matching datasets, where our model achieves
superior matching performance over existing methods.
In a nutshell, the main contributions of our work are as follows: (1) We introduce a powerful
non-rigid shape matching framework equipped with multi-resolution functional maps with spectral
attention for handling diverse shape inputs. (2) We propose a novel spectral attention network and a
differentiable spectral upsampling module for robust functional map learning. (3) We demonstrate the
superior performance of our model compared to existing approaches through extensive experiments
on challenging non-rigid shape matching benchmarks. Our code and data are publicly available1.
2 Related Work
In shape analysis, the field of non-rigid shape matching is both extensive and well-studied. In the
following paragraphs, we review the works that are most closely related to our approach. A more
complete overview can be found in recent surveys [17,18], and more recently [4] (Section 4).
1https://github.com/craigleili/AttentiveFMaps
2
Functional Maps
Our method is based on the functional maps framework, first introduced in
[
13
], and extended in various works such as [
19
,
20
,
21
,
22
,
23
,
24
] among others (see [
25
] for
an overview). This general approach is based on encoded maps between shapes using a reduced
basis representation. Consequently, the problem of map optimization becomes both linear and more
compact. Besides, this framework allows to represent natural constraints such as near-isometry or
bijectivity as linear-algebraic regularization. It has also been extended to the partial setting [26,27].
One of the bottlenecks of this framework is the estimation of so-called “descriptor functions” that
are key to the functional map computation. Early methods have relied on axiomatic features, mainly
based on multi-scale diffusion-based descriptors, e.g., HKS and WKS [28,29].
Learning-based methods
Several approaches have proposed to learn maps between shapes by
formulating it as a dense segmentation problem, e.g., [
30
,
31
,
32
,
33
,
34
,
35
]. However, these methods
(1) usually require many labeled training shapes, which can be hard to obtain, and (2) tend to overfit
to the training connectivity, making the methods unstable to triangulation change.
Closer to our approach are deep shape matching methods that also rely on the functional map
framework, pioneered by FMNet [
7
]. In this work, SHOT descriptors [
36
] are given as input to the
network, whose goal is to refine these descriptors in order to yield a functional map as close to the
ground-truth as possible. The key advantage of this approach is that it directly estimates and optimizes
for the map itself, thus injecting more structure in the learning problem. FMNet introduced the idea
of learning for shape pairs, using the same feature extractor (in their case, a SHOT MLP-based
refiner) for the source and target shapes in a Siamese fashion to produce improved output descriptors
for functional map estimation. However, later experiments conducted in [
5
] have highlighted that
SHOT-based pipelines suffer greatly from connectivity overfitting. Thus, in more recent works, the
authors in [
5
,
15
,
14
] advocate for learning directly from shapes’ geometry, while exploiting strong
regularizers for functional map estimation.
The major upside of using the functional map framework for deep shape matching is that it relies
on the intrinsic information of shapes, which results in overall good generalization from training to
testing, especially across pose changes, which involve minimal intrinsic deformation.
Unsupervised spectral learning
The methods described above are supervised deep shape matching
pipelines. While these methods usually give good correspondence prediction, they need ground-
truth supervision at training time. Consequently, other methods have focused on training for shape
matching using the functional map framework, without ground-truth supervision. This was originally
performed directly on top of FMNet by enforcing either geodesic distance preservation [
8
,
37
], or
natural properties on the output functional map [9], as well as by promoting cycle consistency [38].
To disambiguate symmetries present in many organic shapes, some works choose to rely on so-called
“weak-supervision”, by rigidly aligning all shapes (on the same three axes) as a pre-processing step
[
15
,
39
], and then use the extrinsic embedding information to resolve the symmetry ambiguity. This,
however, limits their utility to correspondences between shapes with the same rigid alignment as the
training set. Another solution is to use input signals that are independent to the shape alignment, such
as SHOT [
36
] descriptors as done in the original FMNet. One of these recent methods [
11
], makes
use of optimal transport on top of this SHOT-refiner to align the shapes at different spectral scales.
This method, like ours, computes the functional map at different scales via progressive upsampling,
but they only keep the last map as the output whereas we propose to let the network learn the best
combination of different resolutions. Additionally, this method is dependent on the SHOT input,
which makes it unstable towards change in triangulation. In-network refinement is also performed in
DG2N [40], but not in the spectral space.
Attention-based spectral learning
The attention mechanism was originally introduced in deep
learning for natural language processing, and consists in putting relative weights on different words of
an input sentence [
41
]. This mechanism can be applied in different contexts, including that of shape
analysis. Indeed, attention learned in the feature domain can be used to focus on different parts of a
3D shape, for instance in partial shape matching, as done in [
10
]. As we show in this paper, attention
can also be used in the spectral domain by letting the network focus on different levels of details
depending on the input shapes and their resulting functional maps at different spectral resolutions.
Indeed, the utility of considering different resolutions of a functional map, e.g., via upsampling of its
size, has been highlighted in [
42
,
43
]. Here we propose to let the network learn to adaptively combine
all the intermediate functional maps into a final coherent correspondence.
3
3 Background
Our work proposes a learning-based framework for non-rigid shape matching by building upon the
functional map representation [
13
], and especially its learning-based variant GeomFmaps, introduced
in [
5
]. Before describing our approach in Sec. 4, we first briefly describe the basic learning pipeline
with functional maps for shape correspondence. We refer the interested reader to relevant works [
25
,
7,8,9,5] for more technical details.
Deep Functional Map Pipeline
We consider a pair of shapes
S1
and
S2
, represented as triangle
meshes with
n1
and
n2
vertices, respectively. The goal is to compute a high quality dense correspon-
dence between these shapes in an efficient way. The basic learning pipeline estimates a functional
map between S1and S2using the following four steps [25].
Compute the first
k
eigenfunctions of the Laplace-Beltrami operator [
44
] on each shape,
which will be used as a basis for decomposing smooth functions on these shapes. The
Laplacian is discretized as
S1W
, where
S
is the diagonal matrix of lumped area elements
for mesh vertices and
W
is the classical cotangent weight matrix [
45
]. The eigenfunctions
are stored as columns in matrices Φ1Rn1×kand Φ2Rn2×k.
Second, a set of descriptors (also known as probe, or feature functions) on each shape
are extracted by a feature extractor network [
7
,
5
] here denoted by
FΘ
with learnable
parameters
Θ
. These feature functions are expected to be approximately preserved by the
unknown map. We denote the learned feature functions as
FΘ(S1) = G1Rn1×d
and
FΘ(S2) = G2Rn2×d
, where
d
is the number of descriptors. After projecting them
onto the respective eigenbases, the resulting coefficients are stored as columns of matrices
A1,A2Rk×d, respectively.
Next, we compute the optimal functional map CRk×kby solving:
C= arg min
CkCA1A2k2+αkC∆12Ck2,(1)
where the first term promotes preservation of the probe functions, and the second term
regularizes the map by measuring its commutativity with the Laplace-Beltrami operators [
13
,
5], which in the reduced basis become diagonal matrices of the eigenvalues 1and 2.
As a last step, the estimated map
C
can be converted to a point-to-point map commonly
by nearest neighbor search between the aligned spectral embeddings
Φ1C>
and
Φ2
, with
possible post-refinement applied [46,22,42,47].
To train the feature extractor network
FΘ
, one defines a set of training shape pairs, and another set
of shape pairs for testing. As shown in [
5
], the solution to Eq.
(1)
can be obtained in closed form
within a neural network in a differentiable manner, and constitutes what is called “FMReg” in Fig. 2.
Using this insight, during training time, the network aims at reducing a loss
L(C)
, defined on the
output functional map
C(FΘ(S1),FΘ(S2))
estimated from the learned descriptors using the closed
form solution of Eq.
(1)
. Through backpropagation, the parameters
Θ
are then updated to make the
network produce better features for the next pair of shapes.
We stress that in this pipeline, the size
k
of the functional map is a critical non-learned hyperparameter,
which can strongly affect matching results (as highlighted in Fig. 1and in existing literature [9]).
4 Method
The main goal of our work is to robustly and adaptively estimate functional maps for shape pairs with
diverse geometric properties, including both near-isometric and non-isometric transformations. In
this section, we describe the technical details of our proposed non-rigid shape matching framework.
We illustrate the whole pipeline in Fig. 2. Our framework has two main stages: multi-resolution
functional map learning (Sec. 4.1) and spectral attention learning (Sec. 4.2).
4.1 Multi-resolution Functional Maps
In the first stage of our framework, given the input shapes
S1
and
S2
, we follow the basic learning
pipeline, as described in Sec. 3, to infer multi-resolution functional maps for extensively characterizing
4
摘要:

LearningMulti-resolutionFunctionalMapswithSpectralAttentionforRobustShapeMatchingLeiLiLIX,ÉcolePolytechnique,IPParislli@lix.polytechnique.frNicolasDonatiLIX,ÉcolePolytechnique,IPParisnicolas.donati@polytechnique.eduMaksOvsjanikovLIX,ÉcolePolytechnique,IPParismaks@lix.polytechnique.frAbstractInthiswo...

展开>> 收起<<
Learning Multi-resolution Functional Maps with Spectral Attention for Robust Shape Matching Lei Li.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:4.67MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注