Neural Shape Deformation Priors Jiapeng Tang1Lev Markhasin2Bi Wang2Justus Thies3Matthias Nießner1 1Technical University of Munich2Sony Europe RDC Stuttgart

2025-04-24 0 0 5.62MB 22 页 10玖币

侵权投诉

Neural Shape Deformation Priors

Jiapeng Tang1Lev Markhasin2Bi Wang2Justus Thies3Matthias Nießner1

1Technical University of Munich 2Sony Europe RDC Stuttgart

3Max Planck Institute for Intelligent Systems, Tübingen, Germany

https://tangjiapeng.github.io/projects/NSDP/

Figure 1: Neural shape deformation priors allow for intuitive shape manipulation of existing source

meshes. A user can create novel shapes by dragging handles (red circles) deﬁned on the region of

interest (red regions) to desired locations (blue circles).

Abstract

We present Neural Shape Deformation Priors, a novel method for shape manip-

ulation that predicts mesh deformations of non-rigid objects from user-provided

handle movements. State-of-the-art methods cast this problem as an optimization

task, where the input source mesh is iteratively deformed to minimize an objective

function according to hand-crafted regularizers such as ARAP [

]. In this work,

we learn the deformation behavior based on the underlying geometric properties of

a shape, while leveraging a large-scale dataset containing a diverse set of non-rigid

deformations. Speciﬁcally, given a source mesh and desired target locations of

handles that describe the partial surface deformation, we predict a continuous

deformation ﬁeld that is deﬁned in 3D space to describe the space deformation.

To this end, we introduce transformer-based deformation networks that represent

a shape deformation as a composition of local surface deformations. It learns a

set of local latent codes anchored in 3D space, from which we can learn a set of

continuous deformation functions for local surfaces. Our method can be applied to

challenging deformations and generalizes well to unseen deformations. We validate

our approach in experiments using the DeformingThing4D dataset, and compare to

both classic optimization-based and recent neural network-based methods.

1 Introduction

Editing and deforming 3D shapes is a key component in animation creation and computer aided

design pipelines. Given as little user input as possible, the goal is to create new deformed instances

of the original 3D shape which look natural and behave like real objects or animals. The user input is

assumed to be very sparse, such as vertex handles that can be dragged around. For example, users

can animate a 3D model of an animal by dragging its feet forward. This problem is severely ill-

posed and typically under-constrained, as there are many possible deformations that can be matched

with the provided partial surface deformations of handles, especially for large surface deformations.

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.05616v2 [cs.CV] 1 Feb 2023

Thus, strong priors encoding deformation regularity are necessary to tackle this problem. Physics

and differential geometry provide solutions that use various analytical priors which deﬁne natural-

looking mesh deformations, such as elasticity [

], Laplacian smoothness [

], and

rigidity [

] priors. They update mesh vertex coordinates by iteratively optimizing energy

functions that satisfy constraints from both the pre-deﬁned deformation priors and given handle

locations. Although these algorithms can preserve geometric details of the original source model,

they still have limited capacity to model realistic deformations, since the deformation priors are

region independent, e.g., the head region deforms in a similar way as the tail of an animal, resulting

in unrealistic deformation states.

Hence, motivated by the recent success of deep neural networks for 3D shape modeling [

], we propose to learn shape deformation priors of a

speciﬁc object class, e.g., quadruped animals, to complete surface deformations beyond observed

handles. We formulate the following properties of such a learned model; (1) it should be robust to

different mesh quality and number of vertices, (2) the source mesh is not limited to canonical pose

(i.e., the input mesh can have arbitrary pose), and (3) it should generalize well to new deformations.

Towards these goals, we represent deformations as a continuous deformation ﬁeld which is deﬁned

in the near-surface region to describe the space deformation caused by the corresponding surface

deformation. The continuity property enables us to manipulate meshes with inﬁnite number of

vertices and disconnected components. To handle source meshes in arbitrary poses, we learn shape

deformations via canonicalization. Speciﬁcally, the overall deformation process consists of two

stages: arbitrary-to-canonical transformation and canonical-to-arbitrary transformation. To obtain

more detailed surface deformations and better generalization capabilities to unseen deformations,

we propose to learn local deformation ﬁelds conditioned on local latent codes encoding geometry-

dependent deformation priors, instead of global deformation ﬁelds conditioned on a single latent

code. To this end, we propose Transformer-based Deformation Networks (TD-Nets), which learns

encoder-based local deformation ﬁelds on point cloud approximations of the input mesh. Concretely,

TD-Nets encode an input point cloud with surface geometry information and incomplete deformation

ﬂow into a sparse set of local latent codes and a global feature vector by using the vector attention

blocks proposed in [

]. The deformation vectors of spatial points are estimated by an attentive

decoder, which aggregates the information of neighboring local latent codes of a spatial point based

on the feature similarity relationships. The aggregated feature vectors are ﬁnally passed to a multi-

layer-perceptron (MLP) to predict displacement vectors which can be applied to the source mesh to

compute the ﬁnal output mesh.

To summarize, we introduce transformer-based local deformation ﬁeld networks which are capable

to learn shape deformation priors for the task of user-driven shape manipulation. The deformation

networks learn a set of anchor features based on a vector attention mechanism, enhancing the

global deformation context, and selecting the most informative local deformation descriptors for

displacement vector estimations, leading to an improved generalization ability to new deformations.

In comparison to classical hand-crafted deformation priors as well as recent neural network-based

deformation predictors, our method achieves more accurate and natural shape deformations.

2 Related Work

User-guided shape manipulation lies at the intersection of computer graphics and computer vision.

Our proposed method is related to polygonal mesh geometry processing, neural ﬁeld representations,

as well as vision transformers.

Optimization-based Shape Manipulation.

Classical methods formulate shape manipulation as

a mathematical optimization problem. They perform mesh deformations by either deforming the

vertices [

] or the 3D space [

]. Performing mesh deformation without any other

information about the target shape, but only using limited user-provided correspondences is an under-

constrained problem. To this end, the optimization methods require deformation priors to constraint

the deformation regularity as well as the smoothness of the deformed surface. Various analytic

priors have been proposed which encourage smooth surface deformations, such as elasticity [

Laplacian smoothness [

], and rigidity [

]. These methods use efﬁcient linear solvers

to iteratively optimize energy functions that satisfy constraints from both the pre-deﬁned deformation

prior and provided handle movements. Recently, NFGP [

] was proposed to optimize neural

networks with non-linear deformation regularizations. Speciﬁcally, it performs shape deformations

by warping the neural implicit ﬁelds of the source model through a deformation vector ﬁeld, which

is constrained by modeling implicitly represented surfaces as elastic shells. NeuralMLS [

]

learned a geometry-aware weight function of a shape and given control points for moving least

squares(MLS) deformations, which smoothly interpolates the control point displacements over space.

Although they can preserve many geometric details of the source shape, they struggle to model

complex deformations, as local surfaces are simply constrained to be transformed in a similar manner.

In contrast, we aim to learn deformation priors based on local geometries to infer hidden surface

deformations.

Learning-based Shape Reconstruction and Manipulation.

Learning-based shape manipulation

has been studied to learn shape priors based on shape auto-encoding or auto-decoding. [

]

map a class of shapes into a latent space. During inference, given handle positions as input, they

ﬁnd an optimal latent code whose 3D interpretation is the most similar to the observation. In

contrast, we learn explicit deformation priors to directly predict 3D surface deformations. Jakab et

al. [

] proposed to control shapes via unsupervised 3D keypoint discovery. Instead, we use partial

surface deformations represented by handle displacements as input observations, rather than keypoint

displacements. There exist a series of methods that use deep neural networks to complete non-rigid

shapes [

] from partial scans. Our task is partially related to this task,

but our shape manipulation task from user input requires completion of the deformation ﬁeld. In

contrast to shape completion, our setting is more under-constrained, as the user-provided handle

correspondences are very sparse and more incomplete than partial point clouds from scans. Recent

methods for clothed-human body reconstruction choose to canonicalize the captured scan into a pre-

deﬁned T-pose [

] using the skeletal deformation model of SMPL [

] or STAR [

] which

can also be used to later animate the human. Inspired by this, we also perform a canonicalization to

enable editing of source meshes with arbitrary poses, before applying the actual deformation towards

the target pose handles.

Continuous Neural Fields.

Continuous neural ﬁeld representations have been widely used in

3D shape modeling [

] and 4D dynamics capture [

]. Recent work that

represents 3D shapes as continuous signed distance ﬁelds [

] or occupancy ﬁelds [

] can theoretically obtain volumetric reconstructions with inﬁnite

resolutions, as they are not bound to the resolution of a discrete grid structure. Similarly, we learn

continuous deformation ﬁelds deﬁned in 3D space for shape deformations [

]. Due to

the continuity of the deformation ﬁelds, our method is not limited by the number of mesh vertices,

or disconnected components. Different from ShapeFlow [

], OFlow [

], LPDC-Net [

] and

NPMs [

] that learn a deformation ﬁeld from a single latent code, inspired by local implicit ﬁeld

learning [

], we model the deformation ﬁeld as a composition of local deformation

functions, improving the representation capability of describing complex deformations as well as

generalization to new deformations.

Visual Transformers.

Recently, transformer architectures [

] from natural language processing

have revolutionized many computer vision tasks, including image classiﬁcation [

], object

recognition [

], semantic segmentation [

], or 3D reconstruction [

]. We refer the

reader to [

] for a detailed survey of visual transformers. In this work, we propose the usage of

a transformer architecture to learn deformation ﬁelds. Given the input point cloud sampled from

the source mesh with partial deformation ﬂow (deﬁned by the user handles), we employ the vector

attention blocks from Point Transformer [

] as a main point cloud processing module to extract a

sparse set of local latent codes, enhancing the global understanding of deformation behaviours. Based

on the obtained local deformation descriptors, our attentive deformation decoder learns to attend to

the most informative features from near-by local codes to predict a deformation ﬁeld.

3 Approach

Given a source mesh

S={V,F}

where

and

denote the set of vertices and the set of faces,

respectively, we aim to deform

to obtain a target mesh

by selecting a sparse set of mesh vertices

H={hi}`

i=1

as handles, and dragging them to target locations

O={oi}`

i=1

. The key idea in this

work is to use deformation priors to complete hidden surface deformations. Speciﬁcally, the goal

User Input

Source Mesh

Target Handle Locations

Backward

Deformation

Networks

Forward

Deformation

Networks

Shape Deformation via Canonicalization

Target Mesh

User Input

Source Mesh

Target Handle Locations

Backward

Deformation

Networks

Forward

Deformation

Networks

Shape Deformation via Canonicalization

Target Mesh

Figure 2:

Overview

. Given a source mesh

with sparse handles

(red circles) and their respective

target locations

(blue circles) as input, our method deforms the mesh to the target mesh

via

canonicalization

. The backward

Ωb

and forward

Ωf

deformation networks store the deformation

priors that allow our method to produce consistent and natural-looking outputs.

is to learn a continuous deformation ﬁeld

deﬁned in 3D space, from which we can obtain the

deformed mesh

T0={V +D(V),F}

through vertex deformations of the source mesh

. The overall

pipeline of the proposed approach is shown in Figure 2. Our method can be applied to input meshes

in arbitrary poses by leveraging learned shape deformation via canonicalization (see Section 3.1).

To represent the underlying deformation prior, we propose neural deformation ﬁelds as described in

Section 3.2 which can be learned from large deformation datasets (see Section 3.3).

3.1 Learning Shape Deformations via Canonicalization

To ensure robustness w.r.t. varying input mesh quality (topology and resolution), we operate on

point clouds instead of meshes. Speciﬁcally, we sample a point cloud

PS={pi}n

i=0 ∈Rn×3

from

of size

n= 5000

. We deﬁne the target handle point locations

PO={oi}n

i=0 ∈Rn×3

, where

we use zeros to represent unknown point ﬂows. Further, to avoid the ambiguity of zero point ﬂow,

we deﬁne the corresponding binary user handle masks

M={bi}n

i=0 ∈Rn

where

bi= 1

is a

handle or otherwise bi= 0.

To learn the shape transformation between two arbitrary non-rigidly deformed poses, one can learn

deformation ﬁelds that directly map the source deformed space to target space. However, it would be

difﬁcult to learn the deformation priors well, as there could be inﬁnite deformation state transformation

pairs. To decrease the learning complexity, we introduce a canonical space as an intermediate state.

We divide the shape transformation process into two steps; a backward deformation that aligns the

source deformed space to canonical space, and a forward deformation that maps the canonical space to

the target deformation space. Concretely,

is passed into the backward transformation network

Ωb

to learn the backward deformation ﬁeld

which transforms the input shape

into a canonical pose

. Similarly, the querying non-surface point set

QS={qi}m

i=0 ∈Rm×3, m = 5000

randomly

sampled in the 3D space of

is also mapped to canonical space through

C=QS+Db(QS)

Lastly, given

, and

as input, a forward transformation network

Ωf

is learned to represent

the forward deformation ﬁeld Dfthat predicts ﬁnal locations Q0

T=Q0

C+Df(Q0

C).

3.2 Transformer-based Deformation Networks (TD-Nets)

The deformation via canonicalization is based on two deformation ﬁeld predictors (forward and

backward deformations). Both networks share the same architecture, thus, in the following, we

will only describe the forward deformation network as visualized in Figure 3while the backward

deformation network is analogous. It consists of a transformer-based deformation encoder and a

vector cross attention-based decoder network.

Point transformer encoder.

Given a point set

with handle locations

and a binary mask

as inputs, we use point transformer layers from [

] to build our encoder modules. The point

transformer layer is based on the vector attention mechanism [

]. Let

X={xi,fi}i

and

{yi,gi}i

be the query and key-value sequences, where

and

denote the coordinates of query and

key-value points with corresponding feature vectors

and

. The vector cross attention operator

User Input

VCA

Transformer

Encoder

MLP

Sampling

…Target mesh

Pooling & FCs

kNN

Query

Key-Values

Figure 3:

Transformer-based Forward Deformation Networks

. Given a canonical mesh

with

handle positions

(red circles) and desired handle locations

(blue circles), we perform surface

sampling to obtain a point cloud

with additional channels of handle mask

and point ﬂow

A point-transformer encoder is devised to extract a sparse set of local latent codes

Z={ci,zi}i

from

this point cloud, where

are the anchor positions of the latent features

. For a speciﬁc point

3D space (i.e. a vertex from the source mesh), based on the

zglo

, a vector cross attention (

VCA

) block

is used to effectively fuse the information of

into

from the

nearest neighbouring latent codes

. Using a multi-layer perceptron (MLP) conditioned on

, we predict the deformed location

in the target space.

VCA is deﬁned as:

VCA(X,Y) : f0

i=X

j∈Ni

ρ(γ(ϕ(gj)−ψ(fi) + δ)) (α(fi) + δ),(1)

where

are the aggregated features,

, and

are linear projections implemented by a fully-

connected layer.

is a mapping function implemented by a two-layer MLP to predict attention

vectors.

is the attention weight normalization function, in our case softmax.

δ:= θ(xi−yj)

is the

positional embedding module [

] implemented by a two linear layers with a single ReLU [

It leverages relatively positional information of

and

to beneﬁt the network training. Then, with

the deﬁnition of VCA, the vector self-attention operator VSA can be deﬁned as:

VSA(X) := VCA(X,X).(2)

Based on VCA and VSA, we can deﬁne two basic modules to build our encoder network, i.e. the

point transformer block (PTB) and the point abstraction block (PAB). The deﬁnition of the point

transformer block

PTB

is a combination of the BatchNorm (BN) layer [

], VSA, and residual

connections, formulated as:

PTB(X) := BN(X+ VSA(X)).(3)

For each point

, it encapsulates the information from

kenc = 16

nearest neighborhoods while

keeping the point’s position

unchanged. The point abstraction block

PAB

consists of farthest

point sampling (FPS), BN, VCA, and VSA, which is deﬁned as follow:

PAB(X) := BN(FPS(X) + VSA(VCA(FPS(X),X)).(4)

The point cloud

with handle mask

and ﬂow

as additional channels are passed to a point

transformer block (PTB) to obtain a feature point cloud

Z0={c0

i,z0

i}n

i=1

. By using two consecutive

point abstraction blocks (PABs) with intermediate set size of

n1= 500

and

n2= 100

, we obtain

Z1={c1

i,z1

i}n1

i=1

and

Z2={c2

i,z2

i}n2

i=1

. To enhance global deformation priors, we stack 4 point

transformer blocks with full self-attention whose

kenc

is set to 100 to exchange the global information

in the whole set of

. By doing so, we can obtain a sparse set of local deformation descriptors

Z={ci,zi}100

i=1

that are anchored in

{ci}

. Finally, we perform a global max-pooling operation

followed by two linear layers to obtain the global latent vector zglo.

Attentive deformation decoder.

Based on the learned local latent codes

Z={ci,zi}100

i=1

and global

latent vector

zglo

, the deformation decoder deﬁnes the forward deformation function

Df:R3−→ R3

which maps a point

from the canonical space of

to the 3D space of

. Similar to tri-linear

interpolation operations in grid-based implicit ﬁeld learning, a straightforward way to ﬁnd the

corresponding feature vector

is to use the weighted combination of

kdec = 16

nearby local

codes

Zq={ck,zk}kdec

k=1

. Intuitively, the weight is inversely proportional to the euclidean distance

between

and the anchoring location

[

]. However, distance-based feature queries ignore

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NeuralShapeDeformationPriorsJiapengTang1LevMarkhasin2BiWang2JustusThies3MatthiasNießner11TechnicalUniversityofMunich2SonyEuropeRDCStuttgart3MaxPlanckInstituteforIntelligentSystems,Tübingen,Germanyhttps://tangjiapeng.github.io/projects/NSDP/Figure1:Neuralshapedeformationpriorsallowforintuitiveshapema...

展开>> 收起<<

Neural Shape Deformation Priors Jiapeng Tang1Lev Markhasin2Bi Wang2Justus Thies3Matthias Nießner1 1Technical University of Munich2Sony Europe RDC Stuttgart.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Neural Shape Deformation Priors Jiapeng Tang1Lev Markhasin2Bi Wang2Justus Thies3Matthias Nießner1 1Technical University of Munich2Sony Europe RDC Stuttgart

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: