
ALHAIJA ET AL.: XDGAN - 3D SHAPE GENERATION IN 2D SPACE 3
order to augment generated geometry with additional per-vertex properties, we show
that it is possible to directly apply an image-to-image translation framework [36] to
predict these properties for each geometry image. We demonstrate that our approach
can generate diverse, high-quality textured 3D meshes, beating existing generators
in quality of surface detail, presence of color or practical usefulness of output repre-
sentation. Further, we demonstrate that powerful properties of the StyleGAN latent
space can also be exploited for 3D generation, reconstruction, as well as supervised
and unsupervised manipulation, just as it has been shown for 2D images.
2 Related Work
Generative Adversarial Networks (GANs) have become a popular technique for im-
age generation since their introduction by Goodfellow et al. [15]. Striking progress
in the quality and resolution of GAN-generated images has been achieved in recent
years [5,22,23,24], including in conditional settings [36]. While some degree of
view control of such models has been exploited for downstream 3D tasks [56], these
architectures have remained primarily confined to generation in the 2D domain. Most
recently, approaches that leverage an intermediate 3D representation to improve 3D
consistency and view control of GANs have also been proposed [4,7,8,17,34]. Our
method is orthogonal to this line of work as we show that it is possible to leverage
unmodified 2D GAN architectures, to learn 3D geometry directly. Beyond uncondi-
tional generation, latent spaces of large-scale GANs have been successfully used for
both coarse and fine-grained manipulations of images. While some techniques focus
on editing real images [1,3,26,30,40], others devise meaningful exploration of the
GAN latent space. Supervised approaches typically use pre-trained attribute classi-
fiers to optimize edit directions [14,43,44]. Other works show that it is possible to
find meaningful directions in latent space in an unsupervised way [20,21,49]. In our
work, we use a modified version of InterFaceGAN [44] to demonstrate how latent
space manipulation techniques, originally proposed for 2D image manipulation, can
be leveraged for 3D generation as well.
While there have been remarkable advances in 3D shape generation in the recent
years, deployment of 3D generative models poses challenges due to generation qual-
ity and speed, as well as compatibility of the output format with downstream tasks
(see Tb.1for summery).
The most direct 3D analog of the 2D pixel array is the 3D voxel grid, and early
generative approaches adapted convolutional architectures to generate objects by op-
erating on this 3D grid [50,51]. Due to fixed resolution of the voxel grid and high
memory demands of 3D convolutions these approaches are limited to low quality
outputs. Later methods mitigate high memory requirements, for example, by using
octrees to represent the 3D space more efficiently [47] or through local generation
[54], but are still constrained in their ability to represent high-resolution smoothly
varying geometry and texture (Tb.1, A).
Due to their simplicity, point clouds, or unordered (x,y,z)samples of surface ge-
ometry, are a popular representation for 3D generation [2,6,28], reconstruction [29]
and segmentation [13,38,39]. Despite continued progress in generative modeling