1. Introduction
The design process in Architecture, Engineering, and
Construction (AEC) industry involves many stakeholders,
including professionals such as engineers, architects, plan-
ners, and non-specialists such as clients, citizens, and users.
Each stakeholder contributes all the design aspects, which
Vitruvius called ‘Firmitas, Utilitas, Venustas’, which trans-
lates as solidity, usefulness, and beauty. Early in the design
process, all parties must reach a shared understanding of
these Vitruvian values to avoid any misrepresentation later
on [10,123]. A critical factor in establishing a shared un-
derstanding is the ability to convey the information quickly,
using a medium all stakeholders can understand [72].
However, during the initial meetings, design ideas are
shared with mediums such as 2D diagrams [16], front
views, floorplans [48,67,68,89], and sketches on papers
[101]. These mediums often represent the design informa-
tion in a few lines, leading to partial and incomplete repre-
sentations of the overall mass. As a result, many stakehold-
ers need help to visualize the actual 3D representation of
the building, resulting in varied interpretations, which ulti-
mately inhibits a shared understanding of design. [73] notes
that the inability of the stakeholders to interpret 2D designs
leads to reductions in productivity, reworks, wastage, and
cost overruns. [49] points out that this mode of design prac-
tices leads to difficulties in the communication of designs
since these representations lack of the 3D information ( such
as proportion, volume, overall mass, and others) needed
during later phases.
To alleviate this challenge, this research aims to gen-
erate 3D geometries from sketches, grounding its theory
on Sketch Modeling, an active area of research since the
90’s [45,118]. Sketch modeling has two major approaches:
Learning-based methods and Non-learning-based methods.
The Non-learning based methods require specific and de-
fined inputs to construct 3D geometries. As a result, this
method operates with fixed viewpoints and specific sketch-
ing styles, thus reducing the designer’s flexibility. There-
fore learning-based methods have been employed to resolve
these issues, allowing for more flexibility, as detailed later
in Section 3. Learning-based methods, also called data-
driven, generate a 3D shape from a partial sketch by learn-
ing a joint probabilistic distribution of a shape and sketches
from the dataset.
Currently, these techniques have only focused on decon-
textualized shapes such as furniture and mechanical parts,
where positions and orientations do not directly affect their
representation. However, this is different for buildings.
Their design is affected by the building’s location and ori-
entation
Therefore, this research provides a step forward in this
direction to consider location and orientation within the re-
construction process for deep learning models. Indeed, Vit-
ruvio is a flexible, and contextual method that reconstructs
a 3D representation of buildings from a single perspective
sketch. It provides the flexibility to generate a building mass
from a partial sketch drawn from any perspective viewpoint.
To accomplish this tasks we build our own dataset (Section
4) dubbed Manhattan1k. Manhattan1k preserves the con-
textual information of building, specifically their locations
and orientations.
To summarize, the contribution is threefold:
1. We explain the use of a learning-based method for
sketch-to-3D applications where the final 3D building
shapes depend on a single perspective sketch.
2. We develop Vitruvio adapting Occupancy Network
(OccNet) [61] to our buildings dataset and improving
its accuracy and efficiency ( Section 5.1 ).
3. We show qualitatively and quantitatively that the build-
ing orientation affects the reconstruction performance
of our network (Section 5.2).
After presenting the related works and their limitations
in Section 2, the remainder of this paper introduces our
methods in Sections 3,4and the relative experiments to
validate the above-mentioned hypothesis and claims in Sec-
tion 5. Finally, Sections 6,7, and 8present the dis-
cussion and limitations of our method. The code has
been released open source: https://github.com/
CDInstitute/Vitruvio
2. Related Work
This section introduces previous work in sketch model-
ing and the limitations that led to a learning-based approach.
For most, sketch modeling has been an active area of
research since the late 90’s [45,118]. This modeling process
can be represented in two ways: as a series of operations or
as a Single View Reconstruction (SVR) [30,97].
The former is typically adopted by CAD software. It
requires specific and defined inputs, such as strokes, to
construct 3D geometries. Through a series of sketches
from different viewpoints [26] , or a series of procedures
[40,55,56,69,70,87], these simplified CAD interfaces , pro-
vide complete control of the 3D geometry, at the expense of
the artistic style. Thus, the models developed have been
view, and style-dependent [126], operating with fixed view-
points and specific sketching styles.
The latter, SVR, leverages a more flexible approach. It
uses computer vision techniques to reconstruct 3D shapes
from a single sketch without the mandatory requirement
of a digital surface. SVR has recently gained attention
thanks to the advances in learning-based methods [37,39,
47,98,106,126,126,127], inspired by image-based re-
construction where the geometric output is represented in
2