
A F G B E,D C
A F G B E D C
Explicit
Implicit A
x=0.75
B
x=4.5
C
x=7.25
D
x=5.75
E
x=5.25
F
x=1.5
G
x=2.0
x
02461357
0241356
Figure 2: A comparison of explicit and implicit
data representations. The example shows a 1-
dimensional pointcloud on the top, and an explicit
and implicit memory layout at the bottom. An
example of aliasing can be observed, as points E
and D collide into the same memory cell. Wasted
memory space can be seen as empty memory
cells.
r
rr
W
H
LW
H
r r
x, y, z
x, y, z
x, y, z
x, y, z
x, y, z
x, y, z
x, y, z
n
N
3D
Voxels
2D
Projection
1D
Bag-of-Points
Figure 3: A comparison of 3D voxel representa-
tions, 2D Image representations and unordered
1-dimensional Bag-of-Points representations. As
the resolution
r
becomes more fine-grained, the
sparsity of rasterized representations increases.
However, a coarse resolution may cause multiple
points to collide within a single representation
cell (as seen in the top right of the 2D projection).
performed on them. The multi-dimensional representations perform a rasterization of the space into a
finite number of grid cells, aligning each memory cell to a section of 3D or 2D space. The decision of
representation dimensionality is orthogonal to the memory layout, as multi-dimensional rasterizations
can also be sparsely stored in one-dimensional data structures [
15
]. Figure 3 illustrates different
rasterization dimensionalities.
Coordinate system
The third design decision (fig. 1, third layer) we observe concerns the choice
of coordinate system, which is used for rasterizations of multi-dimensional spaces. Rasterization
divides 2D or 3D space into chunks of finite size. This partition is typically performed along regular
intervals across coordinate axes. Therefore, the coordinate axes chosen for this division also impacts
how the resulting representation partitions 3D space. Here, we mainly differentiate between Cartesian
coordinate systems, which refer to absolute positions in 3D Euclidean space [
12
], and polar coordinate
systems which refer to locations by combinations of angles and distance measurements [
19
,
20
].
Figure 4 (left) illustrates how different coordinate systems can lead to different rasterizations of
two-dimensional space.
Spherical coordinates are an extension of polar coordinates which uses two angles and one distance
measurement to index 3D space. Projecting spherical coordinates along the radial axis results in a
range image 2D representation. There are also various coordinate systems which combine polar and
Cartesian geometry for different axes. Figure 4 shows an example of this: Cylinder coordinates use a
polar coordinate system for the X-Y plane and a Cartesian axis for the Z-direction. Similarly, some
approaches use a polar coordinate system in a 2D Bird’s eye view (BEV) projection, which projects
points along a Cartesian z-axis [
19
]. For polar coordinate systems, the coordinate origin is typically
chosen as the center of the LiDAR sensor in order to minimize aliasing [19, 20].
Feature aggregation
As a final design decision (fig. 1, fourth layer), we differentiate approaches
by their choice of mathematical operation to be applied to compute the resulting point cloud represen-
tation. To compute a feature representation for a point in a 3D point cloud, almost all deep learning
approaches aggregate information about its local or global neighborhood. This aggregation typically
requires finding other points within the neighborhood of the point whose features are to be computed.
Next, these features are aggregated using parametric or non-parametric mathematical operations.
The decision for a rasterized representation often directly leads to the use of convolutions [
13
,
11
],
although other operations are certainly possible. For Bag-of-Points representations, the choice of the
feature aggregation method becomes the main differentiating factor. Some approaches use convo-
lutions in non-rasterized spaces [
21
,
16
,
22
], others perform feature aggregation through different
variants of weighted or non-weighted pooling of local neighbors [
23
,
14
,
18
]. For brevity, we group
these into neighbor-based approaches in figure 1 (bottom right). A final group of approaches which
exist for smaller point clouds, but have not yet found application in the large-scale point clouds used
in autonomous driving are point cloud transformers. These approaches use local or global (point
cloud-wide) attention mechanisms to exchange features of points inside a point cloud [24].
3