Learning Multi-resolution Functional Maps with Spectral Attention for Robust Shape Matching Lei Li

2025-04-27 1 0 4.67MB 17 页 10玖币

侵权投诉

Learning Multi-resolution Functional Maps with

Spectral Attention for Robust Shape Matching

Lei Li

LIX, École Polytechnique, IP Paris

lli@lix.polytechnique.fr

Nicolas Donati

LIX, École Polytechnique, IP Paris

nicolas.donati@polytechnique.edu

Maks Ovsjanikov

LIX, École Polytechnique, IP Paris

maks@lix.polytechnique.fr

Abstract

In this work, we present a novel non-rigid shape matching framework based

on multi-resolution functional maps with spectral attention. Existing functional

map learning methods all rely on the critical choice of the spectral resolution

hyperparameter, which can severely affect the overall accuracy or lead to overﬁtting,

if not chosen carefully. In this paper, we show that spectral resolution tuning can

be alleviated by introducing spectral attention. Our framework is applicable in

both supervised and unsupervised settings, and we show that it is possible to train

the network so that it can adapt the spectral resolution, depending on the given

shape input. More speciﬁcally, we propose to compute multi-resolution functional

maps that characterize correspondence across a range of spectral resolutions, and

introduce a spectral attention network that helps to combine this representation into

a single coherent ﬁnal correspondence. Our approach is not only accurate with near-

isometric input, for which a high spectral resolution is typically preferred, but also

robust and able to produce reasonable matching even in the presence of signiﬁcant

non-isometric distortion, which poses great challenges to existing methods. We

demonstrate the superior performance of our approach through experiments on a

suite of challenging near-isometric and non-isometric shape matching benchmarks.

1 Introduction

Shape matching is a critical task in 3D shape analysis and has been paramount to a broad spectrum of

downstream applications, including registration, deformation, and texture transfer [

], to name

a few. The algorithmic challenge of robust shape matching primarily lies in the fact that shapes

may undergo signiﬁcant variations, such as arbitrary non-rigid deformations. Earlier works to tackle

non-rigid shape correspondence conventionally build upon hand-crafted features and pipelines [

while with the advent of deep learning, the research focus has largely shifted to data-driven and

learning-based approaches for improved matching robustness and accuracy [4].

To learn for non-rigid shape matching, a growing body of literature [

] advocates the

use of spectral techniques, in particular, the functional map representation [

], which compactly

encodes correspondences as small-sized matrices using a reduced spectral basis. A number of

advances have been made to the functional map-based networks in terms of probe feature learning [

], differentiable map regularization [

], supervised [

] and unsupervised learning [

], among

many others. Despite this progress, existing works nearly always learn functional maps in a single

spectral resolution (the number of basis functions used), which is often set empirically. However,

the functional map resolution plays a crucial role in the non-rigid shape matching performance,

arXiv:2210.06373v1 [cs.CV] 12 Oct 2022

Figure 1: (a) The unstable matching performance of GeomFmaps [

] w.r.t. the critical Spectral

Resolution hyperparameter on an animal shape dataset SMAL [

]. (b) Correspondence visualization

by texture transfer for near-isometric (top) and non-isometric (bottom) shapes. GeomFmaps is trained

with ground truth supervision, while our approach is not.

as observed in existing literature [

]. As a concrete example, Fig. 1-(a) shows that the matching

performance of a state-of-the-art supervised learning method GeomFmaps [

] ﬂuctuates signiﬁcantly

when trained with a different map resolution (i.e., size of the functional map). Therefore, to improve

robustness, it is desirable to enable networks to adaptively change the resolution in a data-dependent

manner: for near-isometric shapes, adopting a higher spectral resolution allows high-frequency details

to be leveraged for more precise matching; while for non-isometric shapes, adopting a lower spectral

resolution is advantageous to obtain approximate but potentially more robust maps.

Motivated by the above discussion, in this work, we propose a novel learning-based functional map

framework that learns to adaptively combine multi-resolution maps with a mechanism that we call

spectral attention, which can accommodate both near-isometric and non-isometric shapes at the same

time. Speciﬁcally, our framework consists of two novel components (Fig. 2): (1) multi-resolution

functional maps and (2) the spectral attention module.

Given as input a pair of non-rigid shapes, we ﬁrst use a functional map network to estimate a series

of maps with varying spectral resolution. Next, we feed the obtained functional maps to a spectral

attention network to predict a weight for each map. The attention weights are used to combine

all the intermediate maps into a ﬁnal coherent map. To enable such an assembly, we design a

differentiable spectral upsampling module that can transform the intermediate maps to the same

spectral resolution within a learnable network. Finally, to train our network, we propose to impose

penalties on the intermediate multi-resolution functional maps as well as the ﬁnal map. This is

different from existing approaches, e.g., [

], which work with and penalize a single

hand-picked spectral resolution. Our method can be trained in both supervised and unsupervised

settings and can directly beneﬁt from other advances in deep functional map training, such as

improved architectures or regularization. To evaluate our model, we perform a comprehensive set of

experiments on several challenging non-rigid shape matching datasets, where our model achieves

superior matching performance over existing methods.

In a nutshell, the main contributions of our work are as follows: (1) We introduce a powerful

non-rigid shape matching framework equipped with multi-resolution functional maps with spectral

attention for handling diverse shape inputs. (2) We propose a novel spectral attention network and a

differentiable spectral upsampling module for robust functional map learning. (3) We demonstrate the

superior performance of our model compared to existing approaches through extensive experiments

on challenging non-rigid shape matching benchmarks. Our code and data are publicly available1.

2 Related Work

In shape analysis, the ﬁeld of non-rigid shape matching is both extensive and well-studied. In the

following paragraphs, we review the works that are most closely related to our approach. A more

complete overview can be found in recent surveys [17,18], and more recently [4] (Section 4).

1https://github.com/craigleili/AttentiveFMaps

Functional Maps

Our method is based on the functional maps framework, ﬁrst introduced in

[

], and extended in various works such as [

] among others (see [

] for

an overview). This general approach is based on encoded maps between shapes using a reduced

basis representation. Consequently, the problem of map optimization becomes both linear and more

compact. Besides, this framework allows to represent natural constraints such as near-isometry or

bijectivity as linear-algebraic regularization. It has also been extended to the partial setting [26,27].

One of the bottlenecks of this framework is the estimation of so-called “descriptor functions” that

are key to the functional map computation. Early methods have relied on axiomatic features, mainly

based on multi-scale diffusion-based descriptors, e.g., HKS and WKS [28,29].

Learning-based methods

Several approaches have proposed to learn maps between shapes by

formulating it as a dense segmentation problem, e.g., [

]. However, these methods

(1) usually require many labeled training shapes, which can be hard to obtain, and (2) tend to overﬁt

to the training connectivity, making the methods unstable to triangulation change.

Closer to our approach are deep shape matching methods that also rely on the functional map

framework, pioneered by FMNet [

]. In this work, SHOT descriptors [

] are given as input to the

network, whose goal is to reﬁne these descriptors in order to yield a functional map as close to the

ground-truth as possible. The key advantage of this approach is that it directly estimates and optimizes

for the map itself, thus injecting more structure in the learning problem. FMNet introduced the idea

of learning for shape pairs, using the same feature extractor (in their case, a SHOT MLP-based

reﬁner) for the source and target shapes in a Siamese fashion to produce improved output descriptors

for functional map estimation. However, later experiments conducted in [

] have highlighted that

SHOT-based pipelines suffer greatly from connectivity overﬁtting. Thus, in more recent works, the

authors in [

] advocate for learning directly from shapes’ geometry, while exploiting strong

regularizers for functional map estimation.

The major upside of using the functional map framework for deep shape matching is that it relies

on the intrinsic information of shapes, which results in overall good generalization from training to

testing, especially across pose changes, which involve minimal intrinsic deformation.

Unsupervised spectral learning

The methods described above are supervised deep shape matching

pipelines. While these methods usually give good correspondence prediction, they need ground-

truth supervision at training time. Consequently, other methods have focused on training for shape

matching using the functional map framework, without ground-truth supervision. This was originally

performed directly on top of FMNet by enforcing either geodesic distance preservation [

], or

natural properties on the output functional map [9], as well as by promoting cycle consistency [38].

To disambiguate symmetries present in many organic shapes, some works choose to rely on so-called

“weak-supervision”, by rigidly aligning all shapes (on the same three axes) as a pre-processing step

[

], and then use the extrinsic embedding information to resolve the symmetry ambiguity. This,

however, limits their utility to correspondences between shapes with the same rigid alignment as the

training set. Another solution is to use input signals that are independent to the shape alignment, such

as SHOT [

] descriptors as done in the original FMNet. One of these recent methods [

], makes

use of optimal transport on top of this SHOT-reﬁner to align the shapes at different spectral scales.

This method, like ours, computes the functional map at different scales via progressive upsampling,

but they only keep the last map as the output whereas we propose to let the network learn the best

combination of different resolutions. Additionally, this method is dependent on the SHOT input,

which makes it unstable towards change in triangulation. In-network reﬁnement is also performed in

DG2N [40], but not in the spectral space.

Attention-based spectral learning

The attention mechanism was originally introduced in deep

learning for natural language processing, and consists in putting relative weights on different words of

an input sentence [

]. This mechanism can be applied in different contexts, including that of shape

analysis. Indeed, attention learned in the feature domain can be used to focus on different parts of a

3D shape, for instance in partial shape matching, as done in [

]. As we show in this paper, attention

can also be used in the spectral domain by letting the network focus on different levels of details

depending on the input shapes and their resulting functional maps at different spectral resolutions.

Indeed, the utility of considering different resolutions of a functional map, e.g., via upsampling of its

size, has been highlighted in [

]. Here we propose to let the network learn to adaptively combine

all the intermediate functional maps into a ﬁnal coherent correspondence.

3 Background

Our work proposes a learning-based framework for non-rigid shape matching by building upon the

functional map representation [

], and especially its learning-based variant GeomFmaps, introduced

in [

]. Before describing our approach in Sec. 4, we ﬁrst brieﬂy describe the basic learning pipeline

with functional maps for shape correspondence. We refer the interested reader to relevant works [

7,8,9,5] for more technical details.

Deep Functional Map Pipeline

We consider a pair of shapes

and

, represented as triangle

meshes with

and

vertices, respectively. The goal is to compute a high quality dense correspon-

dence between these shapes in an efﬁcient way. The basic learning pipeline estimates a functional

map between S1and S2using the following four steps [25].

•

Compute the ﬁrst

eigenfunctions of the Laplace-Beltrami operator [

] on each shape,

which will be used as a basis for decomposing smooth functions on these shapes. The

Laplacian is discretized as

S−1W

, where

is the diagonal matrix of lumped area elements

for mesh vertices and

is the classical cotangent weight matrix [

]. The eigenfunctions

are stored as columns in matrices Φ1∈Rn1×kand Φ2∈Rn2×k.

•

Second, a set of descriptors (also known as probe, or feature functions) on each shape

are extracted by a feature extractor network [

] here denoted by

FΘ

with learnable

parameters

. These feature functions are expected to be approximately preserved by the

unknown map. We denote the learned feature functions as

FΘ(S1) = G1∈Rn1×d

and

FΘ(S2) = G2∈Rn2×d

, where

is the number of descriptors. After projecting them

onto the respective eigenbases, the resulting coefﬁcients are stored as columns of matrices

A1,A2∈Rk×d, respectively.

• Next, we compute the optimal functional map C∈Rk×kby solving:

C= arg min

CkCA1−A2k2+αkC∆1−∆2Ck2,(1)

where the ﬁrst term promotes preservation of the probe functions, and the second term

regularizes the map by measuring its commutativity with the Laplace-Beltrami operators [

5], which in the reduced basis become diagonal matrices of the eigenvalues ∆1and ∆2.

•

As a last step, the estimated map

can be converted to a point-to-point map commonly

by nearest neighbor search between the aligned spectral embeddings

Φ1C>

and

Φ2

, with

possible post-reﬁnement applied [46,22,42,47].

To train the feature extractor network

FΘ

, one deﬁnes a set of training shape pairs, and another set

of shape pairs for testing. As shown in [

], the solution to Eq.

(1)

can be obtained in closed form

within a neural network in a differentiable manner, and constitutes what is called “FMReg” in Fig. 2.

Using this insight, during training time, the network aims at reducing a loss

L(C)

, deﬁned on the

output functional map

C(FΘ(S1),FΘ(S2))

estimated from the learned descriptors using the closed

form solution of Eq.

(1)

. Through backpropagation, the parameters

are then updated to make the

network produce better features for the next pair of shapes.

We stress that in this pipeline, the size

of the functional map is a critical non-learned hyperparameter,

which can strongly affect matching results (as highlighted in Fig. 1and in existing literature [9]).

4 Method

The main goal of our work is to robustly and adaptively estimate functional maps for shape pairs with

diverse geometric properties, including both near-isometric and non-isometric transformations. In

this section, we describe the technical details of our proposed non-rigid shape matching framework.

We illustrate the whole pipeline in Fig. 2. Our framework has two main stages: multi-resolution

functional map learning (Sec. 4.1) and spectral attention learning (Sec. 4.2).

4.1 Multi-resolution Functional Maps

In the ﬁrst stage of our framework, given the input shapes

and

, we follow the basic learning

pipeline, as described in Sec. 3, to infer multi-resolution functional maps for extensively characterizing

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LearningMulti-resolutionFunctionalMapswithSpectralAttentionforRobustShapeMatchingLeiLiLIX,ÉcolePolytechnique,IPParislli@lix.polytechnique.frNicolasDonatiLIX,ÉcolePolytechnique,IPParisnicolas.donati@polytechnique.eduMaksOvsjanikovLIX,ÉcolePolytechnique,IPParismaks@lix.polytechnique.frAbstractInthiswo...

展开>> 收起<<

Learning Multi-resolution Functional Maps with Spectral Attention for Robust Shape Matching Lei Li.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Learning Multi-resolution Functional Maps with Spectral Attention for Robust Shape Matching Lei Li

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: