ALHAIJA ET AL. XDGAN - 3D SHAPE GENERATION IN 2D SPACE 1 XDGAN Multi-Modal 3D Shape Generation in 2D Space_2

2025-04-30 0 0 6.22MB 16 页 10玖币

侵权投诉

ALHAIJA ET AL.: XDGAN - 3D SHAPE GENERATION IN 2D SPACE 1

XDGAN: Multi-Modal 3D Shape

Generation in 2D Space

Hassan Abu Alhaija1

habualhaija@nvidia.com

Alara Dirik2

alara@pch-innovations.com

André Knörig2

andre@pch-innovations.com

Sanja Fidler1,3

sﬁdler@nvidia.com

Maria Shugrina1

mshugrina@nvidia.com

1NVIDIA

2PCH Innovations GmbH

3University of Toronto

Abstract

Generative models for 2D images has recently seen tremendous progress in

quality, resolution and speed as a result of the efﬁciency of 2D convolutional ar-

chitectures. However it is difﬁcult to extend this progress into the 3D domain

since most current 3D representations rely on custom network components. This

paper addresses a central question: Is it possible to directly leverage 2D im-

age generative models to generate 3D shapes instead? To answer this, we pro-

pose XDGAN, an effective and fast method for applying 2D image GAN ar-

chitectures to the generation of 3D object geometry combined with additional

surface attributes, like color textures and normals. Speciﬁcally, we propose a

novel method to convert 3D shapes into compact 1-channel geometry images

and leverage StyleGAN3 and image-to-image translation networks to generate

3D objects in 2D space. The generated geometry images are quick to convert to

3D meshes, enabling real-time 3D object synthesis, visualization and interactive

editing. Moreover, the use of standard 2D architectures can help bring more 2D

advances into the 3D realm. We show both quantitatively and qualitatively that

our method is highly effective at various tasks such as 3D shape generation, sin-

gle view reconstruction and shape manipulation, while being signiﬁcantly faster

and more ﬂexible compared to recent 3D generative models.

1 Introduction

Generative Adversarial Networks [15] have achieved remarkable progress in generat-

ing high-resolution realistic images, typically using convolutional architectures such

as [5,23]. Extending these advances to the 3D domain remains a challenge and

an active area of research, with newest methods introducing implicit or explicit 3D

awareness into the GAN generative process [7,8,17,34,55]. In this work, we show

that it is possible to directly leverage a 2D GAN architecture designed for images to

arXiv:2210.03007v1 [cs.CV] 6 Oct 2022

2ALHAIJA ET AL.: XDGAN - 3D SHAPE GENERATION IN 2D SPACE

Generation

sample 1 sample 2

sample 3 sample 4

Projection

original projection (all)

w/o texture w/o normals

Figure 1: XDGAN allows generation of high-resolution textured 3D meshes, sup-

ports projection of 3D models into the latent space, where generation of multiple tex-

tures and semantic editing are possible. Results from the model trained with HQCars

dataset.

generate high-quality 3D shapes. The key to our approach is parameterization of 3D

shapes as 2D planar geometry images [18], which we use as the training dataset for

an image-based generator.

Perhaps one of the greatest challenges to the development of generative mod-

els for 3D content is converging on the right representation. Direct extension of 2D

pixel grids to 3D voxel grids [50] suffers from high memory demands of 3D convo-

lutions, limiting resolution. Point cloud samples of surface geometry, while popular

for generative tasks [2,6,28], are limited in their ability to model sharp features or

high-resolution textures. Recently implicit 3D representations, such as NeRF [33]

and DeepSDF [35], as well as hybrids, have shown great promise for generative tasks

[7,11,52]. However, these approaches typically are too slow for interactive appli-

cations and their output does not interface with existing 3D tools, calling for costly

or lossy conversion of models before use. To this date, 3D meshes, augmented with

normals and textures, remain the most widely adopted 3D representation in 3D soft-

ware, movies and games. Our method uses 2D generative architectures to produce

ﬁxed-topology textured meshes with a single forward pass, and is thus immediately

practical.

We present XDGAN (X-Dimensional GAN), a method for using standard fast 2D

GAN architectures for generating high-resolution 3D meshes with additional surface

properties like textures and normals. XDGAN is trained on a collection of 3D shapes,

which are ﬁrst unwrapped into planar geometry images. Each pixel of a geometry

image represents a vertex position of a ﬁxed topology mesh, and is thus a direct rep-

resentation of surface geometry. To our knowledge, ours is the ﬁrst work to show that

2D convolutional generators can produce high-ﬁdelity 3D shapes by operating on pla-

nar parameterization of 3D geometry, an idea that can help bring more advances in 2D

architectures to the 3D realm. Speciﬁcally, we experiment with StyleGAN [23,24]

modiﬁed to train on higher-precision geometry images that represent 3D shapes. In

ALHAIJA ET AL.: XDGAN - 3D SHAPE GENERATION IN 2D SPACE 3

order to augment generated geometry with additional per-vertex properties, we show

that it is possible to directly apply an image-to-image translation framework [36] to

predict these properties for each geometry image. We demonstrate that our approach

can generate diverse, high-quality textured 3D meshes, beating existing generators

in quality of surface detail, presence of color or practical usefulness of output repre-

sentation. Further, we demonstrate that powerful properties of the StyleGAN latent

space can also be exploited for 3D generation, reconstruction, as well as supervised

and unsupervised manipulation, just as it has been shown for 2D images.

2 Related Work

Generative Adversarial Networks (GANs) have become a popular technique for im-

age generation since their introduction by Goodfellow et al. [15]. Striking progress

in the quality and resolution of GAN-generated images has been achieved in recent

years [5,22,23,24], including in conditional settings [36]. While some degree of

view control of such models has been exploited for downstream 3D tasks [56], these

architectures have remained primarily conﬁned to generation in the 2D domain. Most

recently, approaches that leverage an intermediate 3D representation to improve 3D

consistency and view control of GANs have also been proposed [4,7,8,17,34]. Our

method is orthogonal to this line of work as we show that it is possible to leverage

unmodiﬁed 2D GAN architectures, to learn 3D geometry directly. Beyond uncondi-

tional generation, latent spaces of large-scale GANs have been successfully used for

both coarse and ﬁne-grained manipulations of images. While some techniques focus

on editing real images [1,3,26,30,40], others devise meaningful exploration of the

GAN latent space. Supervised approaches typically use pre-trained attribute classi-

ﬁers to optimize edit directions [14,43,44]. Other works show that it is possible to

ﬁnd meaningful directions in latent space in an unsupervised way [20,21,49]. In our

work, we use a modiﬁed version of InterFaceGAN [44] to demonstrate how latent

space manipulation techniques, originally proposed for 2D image manipulation, can

be leveraged for 3D generation as well.

While there have been remarkable advances in 3D shape generation in the recent

years, deployment of 3D generative models poses challenges due to generation qual-

ity and speed, as well as compatibility of the output format with downstream tasks

(see Tb.1for summery).

The most direct 3D analog of the 2D pixel array is the 3D voxel grid, and early

generative approaches adapted convolutional architectures to generate objects by op-

erating on this 3D grid [50,51]. Due to ﬁxed resolution of the voxel grid and high

memory demands of 3D convolutions these approaches are limited to low quality

outputs. Later methods mitigate high memory requirements, for example, by using

octrees to represent the 3D space more efﬁciently [47] or through local generation

[54], but are still constrained in their ability to represent high-resolution smoothly

varying geometry and texture (Tb.1, A).

Due to their simplicity, point clouds, or unordered (x,y,z)samples of surface ge-

ometry, are a popular representation for 3D generation [2,6,28], reconstruction [29]

and segmentation [13,38,39]. Despite continued progress in generative modeling

4ALHAIJA ET AL.: XDGAN - 3D SHAPE GENERATION IN 2D SPACE

Representations

Voxels Points Implicit Mesh

[47,54] [31] [11,19,35] [45] [48] [16]Ours

Real-time generation 3 3 73 3 3 3

Real-time rendering 3 3 73 3 3 3

High-quality surface 7 7 37 7 7 3

Texture 7 7 7 7 7 3 3

Variable topology 3 3 3 7 7 37

Table 1: Comparison of 3D generative methods by the representation used. We in-

clude quantitative and qualitative comparisons with the starred methods in our exper-

iments.

of point clouds [53,57], they cannot overcome the inherent limitations of the repre-

sentation, which cannot model sharp surface details, or be rendered as high-ﬁdelity

textured meshes. We quantitatively compare our method against the recently pub-

lished Diffusion Point Cloud (DPC) [31] (See also Tb.1, B). DPC proposes a prob-

abilistic diffusion model and uses point cloud representation of 3D shapes for shape

generation, auto-encoding and shape completion.

Recently there has also been a surge in using implicit 3D representations, such as

learned signed distance functions (SDFs) [19,35] or occupancy [11,32]. Like point

clouds and voxel grids, these representations allow modeling varying object topology,

but can also represent high-resolution surface detail. Most related to ours among im-

plicit methods is IM-GAN [11], which proposes to use an implicit decoder IM-NET

in conjunction with features learned by a latent-GAN model [2] to yield a general

3D generative model. Because the outputs of this and related methods are implicit

in the network weights, high-resolution results typically cannot be visualized or ex-

ported in real-time, requiring several seconds on state-of-the-art GPU, and precluding

interactive applications (See Tb.1, C). In addition, a number of hybrid approaches are

emerging. For example, ShapeFormer [52] employs transformers in the space of dis-

crete implicit shape elements to support shape completion, and DMTet [42] produces

detailed tetrahedral meshes using sparse SDF samples. While DMTet generates a

mesh-based representation quickly, like our method, it is only designed to work when

conditioned on a rough input shape, not as a general generator.

3 Methodology

Our approach leverages the power of 2D convolutional generator networks in order

to generate 3D geometry. Given a training set of (optionally textured) 3D objects,

we ﬁrst convert them into geometry images and corresponding normal maps, textures

or other attribute maps (§3.1). The one-channel geometry images represent vertex

positions of a ﬁxed topology mesh, and are used to train an image-based GAN model

to generate such plausible geometry images (§3.2). In order to augment output with

per-vertex attributes like textures and normals, we similarly train an image-to-image

translation network on these aligned attribute maps (§3.2). After training (Fig.2a),

these two image-based generators produce detailed textured 3D meshes in real-time

(§2b). Fast generation makes interactive exploration of the GAN latent space espe-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ALHAIJAETAL.:XDGAN-3DSHAPEGENERATIONIN2DSPACE1XDGAN:Multi-Modal3DShapeGenerationin2DSpaceHassanAbuAlhaija1habualhaija@nvidia.comAlaraDirik2alara@pch-innovations.comAndréKnörig2andre@pch-innovations.comSanjaFidler1,3sdler@nvidia.comMariaShugrina1mshugrina@nvidia.com1NVIDIA2PCHInnovationsGmbH3Univers...

展开>> 收起<<

ALHAIJA ET AL. XDGAN - 3D SHAPE GENERATION IN 2D SPACE 1 XDGAN Multi-Modal 3D Shape Generation in 2D Space_2.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ALHAIJA ET AL. XDGAN - 3D SHAPE GENERATION IN 2D SPACE 1 XDGAN Multi-Modal 3D Shape Generation in 2D Space_2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: