ALHAIJA ET AL. XDGAN - 3D SHAPE GENERATION IN 2D SPACE 1 XDGAN Multi-Modal 3D Shape Generation in 2D Space_2

2025-04-30 0 0 6.22MB 16 页 10玖币
侵权投诉
ALHAIJA ET AL.: XDGAN - 3D SHAPE GENERATION IN 2D SPACE 1
XDGAN: Multi-Modal 3D Shape
Generation in 2D Space
Hassan Abu Alhaija1
habualhaija@nvidia.com
Alara Dirik2
alara@pch-innovations.com
André Knörig2
andre@pch-innovations.com
Sanja Fidler1,3
sfidler@nvidia.com
Maria Shugrina1
mshugrina@nvidia.com
1NVIDIA
2PCH Innovations GmbH
3University of Toronto
Abstract
Generative models for 2D images has recently seen tremendous progress in
quality, resolution and speed as a result of the efficiency of 2D convolutional ar-
chitectures. However it is difficult to extend this progress into the 3D domain
since most current 3D representations rely on custom network components. This
paper addresses a central question: Is it possible to directly leverage 2D im-
age generative models to generate 3D shapes instead? To answer this, we pro-
pose XDGAN, an effective and fast method for applying 2D image GAN ar-
chitectures to the generation of 3D object geometry combined with additional
surface attributes, like color textures and normals. Specifically, we propose a
novel method to convert 3D shapes into compact 1-channel geometry images
and leverage StyleGAN3 and image-to-image translation networks to generate
3D objects in 2D space. The generated geometry images are quick to convert to
3D meshes, enabling real-time 3D object synthesis, visualization and interactive
editing. Moreover, the use of standard 2D architectures can help bring more 2D
advances into the 3D realm. We show both quantitatively and qualitatively that
our method is highly effective at various tasks such as 3D shape generation, sin-
gle view reconstruction and shape manipulation, while being significantly faster
and more flexible compared to recent 3D generative models.
1 Introduction
Generative Adversarial Networks [15] have achieved remarkable progress in generat-
ing high-resolution realistic images, typically using convolutional architectures such
as [5,23]. Extending these advances to the 3D domain remains a challenge and
an active area of research, with newest methods introducing implicit or explicit 3D
awareness into the GAN generative process [7,8,17,34,55]. In this work, we show
that it is possible to directly leverage a 2D GAN architecture designed for images to
arXiv:2210.03007v1 [cs.CV] 6 Oct 2022
2ALHAIJA ET AL.: XDGAN - 3D SHAPE GENERATION IN 2D SPACE
Generation
sample 1 sample 2
sample 3 sample 4
Projection
original projection (all)
w/o texture w/o normals
Figure 1: XDGAN allows generation of high-resolution textured 3D meshes, sup-
ports projection of 3D models into the latent space, where generation of multiple tex-
tures and semantic editing are possible. Results from the model trained with HQCars
dataset.
generate high-quality 3D shapes. The key to our approach is parameterization of 3D
shapes as 2D planar geometry images [18], which we use as the training dataset for
an image-based generator.
Perhaps one of the greatest challenges to the development of generative mod-
els for 3D content is converging on the right representation. Direct extension of 2D
pixel grids to 3D voxel grids [50] suffers from high memory demands of 3D convo-
lutions, limiting resolution. Point cloud samples of surface geometry, while popular
for generative tasks [2,6,28], are limited in their ability to model sharp features or
high-resolution textures. Recently implicit 3D representations, such as NeRF [33]
and DeepSDF [35], as well as hybrids, have shown great promise for generative tasks
[7,11,52]. However, these approaches typically are too slow for interactive appli-
cations and their output does not interface with existing 3D tools, calling for costly
or lossy conversion of models before use. To this date, 3D meshes, augmented with
normals and textures, remain the most widely adopted 3D representation in 3D soft-
ware, movies and games. Our method uses 2D generative architectures to produce
fixed-topology textured meshes with a single forward pass, and is thus immediately
practical.
We present XDGAN (X-Dimensional GAN), a method for using standard fast 2D
GAN architectures for generating high-resolution 3D meshes with additional surface
properties like textures and normals. XDGAN is trained on a collection of 3D shapes,
which are first unwrapped into planar geometry images. Each pixel of a geometry
image represents a vertex position of a fixed topology mesh, and is thus a direct rep-
resentation of surface geometry. To our knowledge, ours is the first work to show that
2D convolutional generators can produce high-fidelity 3D shapes by operating on pla-
nar parameterization of 3D geometry, an idea that can help bring more advances in 2D
architectures to the 3D realm. Specifically, we experiment with StyleGAN [23,24]
modified to train on higher-precision geometry images that represent 3D shapes. In
ALHAIJA ET AL.: XDGAN - 3D SHAPE GENERATION IN 2D SPACE 3
order to augment generated geometry with additional per-vertex properties, we show
that it is possible to directly apply an image-to-image translation framework [36] to
predict these properties for each geometry image. We demonstrate that our approach
can generate diverse, high-quality textured 3D meshes, beating existing generators
in quality of surface detail, presence of color or practical usefulness of output repre-
sentation. Further, we demonstrate that powerful properties of the StyleGAN latent
space can also be exploited for 3D generation, reconstruction, as well as supervised
and unsupervised manipulation, just as it has been shown for 2D images.
2 Related Work
Generative Adversarial Networks (GANs) have become a popular technique for im-
age generation since their introduction by Goodfellow et al. [15]. Striking progress
in the quality and resolution of GAN-generated images has been achieved in recent
years [5,22,23,24], including in conditional settings [36]. While some degree of
view control of such models has been exploited for downstream 3D tasks [56], these
architectures have remained primarily confined to generation in the 2D domain. Most
recently, approaches that leverage an intermediate 3D representation to improve 3D
consistency and view control of GANs have also been proposed [4,7,8,17,34]. Our
method is orthogonal to this line of work as we show that it is possible to leverage
unmodified 2D GAN architectures, to learn 3D geometry directly. Beyond uncondi-
tional generation, latent spaces of large-scale GANs have been successfully used for
both coarse and fine-grained manipulations of images. While some techniques focus
on editing real images [1,3,26,30,40], others devise meaningful exploration of the
GAN latent space. Supervised approaches typically use pre-trained attribute classi-
fiers to optimize edit directions [14,43,44]. Other works show that it is possible to
find meaningful directions in latent space in an unsupervised way [20,21,49]. In our
work, we use a modified version of InterFaceGAN [44] to demonstrate how latent
space manipulation techniques, originally proposed for 2D image manipulation, can
be leveraged for 3D generation as well.
While there have been remarkable advances in 3D shape generation in the recent
years, deployment of 3D generative models poses challenges due to generation qual-
ity and speed, as well as compatibility of the output format with downstream tasks
(see Tb.1for summery).
The most direct 3D analog of the 2D pixel array is the 3D voxel grid, and early
generative approaches adapted convolutional architectures to generate objects by op-
erating on this 3D grid [50,51]. Due to fixed resolution of the voxel grid and high
memory demands of 3D convolutions these approaches are limited to low quality
outputs. Later methods mitigate high memory requirements, for example, by using
octrees to represent the 3D space more efficiently [47] or through local generation
[54], but are still constrained in their ability to represent high-resolution smoothly
varying geometry and texture (Tb.1, A).
Due to their simplicity, point clouds, or unordered (x,y,z)samples of surface ge-
ometry, are a popular representation for 3D generation [2,6,28], reconstruction [29]
and segmentation [13,38,39]. Despite continued progress in generative modeling
4ALHAIJA ET AL.: XDGAN - 3D SHAPE GENERATION IN 2D SPACE
Representations
Voxels Points Implicit Mesh
[47,54] [31] [11,19,35] [45] [48] [16]Ours
Real-time generation 3 3 73 3 3 3
Real-time rendering 3 3 73 3 3 3
High-quality surface 7 7 37 7 7 3
Texture 7 7 7 7 7 3 3
Variable topology 3 3 3 7 7 37
Table 1: Comparison of 3D generative methods by the representation used. We in-
clude quantitative and qualitative comparisons with the starred methods in our exper-
iments.
of point clouds [53,57], they cannot overcome the inherent limitations of the repre-
sentation, which cannot model sharp surface details, or be rendered as high-fidelity
textured meshes. We quantitatively compare our method against the recently pub-
lished Diffusion Point Cloud (DPC) [31] (See also Tb.1, B). DPC proposes a prob-
abilistic diffusion model and uses point cloud representation of 3D shapes for shape
generation, auto-encoding and shape completion.
Recently there has also been a surge in using implicit 3D representations, such as
learned signed distance functions (SDFs) [19,35] or occupancy [11,32]. Like point
clouds and voxel grids, these representations allow modeling varying object topology,
but can also represent high-resolution surface detail. Most related to ours among im-
plicit methods is IM-GAN [11], which proposes to use an implicit decoder IM-NET
in conjunction with features learned by a latent-GAN model [2] to yield a general
3D generative model. Because the outputs of this and related methods are implicit
in the network weights, high-resolution results typically cannot be visualized or ex-
ported in real-time, requiring several seconds on state-of-the-art GPU, and precluding
interactive applications (See Tb.1, C). In addition, a number of hybrid approaches are
emerging. For example, ShapeFormer [52] employs transformers in the space of dis-
crete implicit shape elements to support shape completion, and DMTet [42] produces
detailed tetrahedral meshes using sparse SDF samples. While DMTet generates a
mesh-based representation quickly, like our method, it is only designed to work when
conditioned on a rough input shape, not as a general generator.
3 Methodology
Our approach leverages the power of 2D convolutional generator networks in order
to generate 3D geometry. Given a training set of (optionally textured) 3D objects,
we first convert them into geometry images and corresponding normal maps, textures
or other attribute maps (§3.1). The one-channel geometry images represent vertex
positions of a fixed topology mesh, and are used to train an image-based GAN model
to generate such plausible geometry images (§3.2). In order to augment output with
per-vertex attributes like textures and normals, we similarly train an image-to-image
translation network on these aligned attribute maps (§3.2). After training (Fig.2a),
these two image-based generators produce detailed textured 3D meshes in real-time
2b). Fast generation makes interactive exploration of the GAN latent space espe-
摘要:

ALHAIJAETAL.:XDGAN-3DSHAPEGENERATIONIN2DSPACE1XDGAN:Multi-Modal3DShapeGenerationin2DSpaceHassanAbuAlhaija1habualhaija@nvidia.comAlaraDirik2alara@pch-innovations.comAndréKnörig2andre@pch-innovations.comSanjaFidler1,3sdler@nvidia.comMariaShugrina1mshugrina@nvidia.com1NVIDIA2PCHInnovationsGmbH3Univers...

展开>> 收起<<
ALHAIJA ET AL. XDGAN - 3D SHAPE GENERATION IN 2D SPACE 1 XDGAN Multi-Modal 3D Shape Generation in 2D Space_2.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:6.22MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注