VoxelCache Accelerating Online Mapping in Robotics and 3D Reconstruction Tasks

2025-05-06 0 0 1.11MB 13 页 10玖币
侵权投诉
VoxelCache: Accelerating Online Mapping
in Robotics and 3D Reconstruction Tasks
Sankeerth Durvasula
University of Toronto
Canada
Raymond Kiguru
University of Toronto
Canada
Samarth Mathur
PES Institute of Technology
India
Jenny Xu
University of Toronto
Canada
Jimmy Lin
University of Toronto
Canada
Nandita Vijaykumar
University of Toronto
Canada
ABSTRACT
Real-time 3D mapping is a critical component in many important
applications today including robotics, AR/VR, and 3D visualization.
3D mapping involves continuously fusing depth maps obtained
from depth sensors in phones, robots, and autonomous vehicles
into a single 3D representative model of the scene. Many important
applications, e.g., global path planning and trajectory generation
in micro aerial vehicles, require the construction of large maps at
high resolutions. In this work, we identify mapping, i.e., construc-
tion and updates of 3D maps to be a critical bottleneck in these
applications. The memory required and access times of these maps
limit the size of the environment and the resolution with which
the environment can be feasibly mapped, especially in resource
constrained environments such as autonomous robot platforms and
portable devices. To address this challenge, we propose VoxelCache:
a hardware-software technique to accelerate map data access times
in 3D mapping applications. We observe that mapping applications
typically access voxels in the map that are spatially co-located to
each other. We leverage this temporal locality in voxel accesses to
cache indices to blocks of voxels to enable quick lookup and avoid
expensive access times. We evaluate VoxelCache on popularly used
mapping and reconstruction applications on both GPUs and CPUs.
We demonstrate an average speedup of 1.47X (up to 1.66X) and
1.79X (up to 1.91X) on CPUs and GPUs respectively.
CCS CONCEPTS
Computer systems organization
Robotics;
Embedded sys-
tems;Other architectures;
KEYWORDS
Mapping, SLAM, reconstruction, caching, hardware acceleration
1 INTRODUCTION
3D mapping is an essential component in many important applica-
tions today, such as in autonomous robotics, AR/VR applications,
and 3D reconstruction. Mapping involves the real-time construc-
tion of a representation (map) of the environment. This is typically
done by continually processing and integrating distance informa-
tion to objects in the scene that is obtained from sensors such as
depth cameras and LiDAR. Mapping is an integral part of tasks
such as SLAM (Simultaneous Localization and Mapping), which
involves determining the location/pose of a mobile robot within the
constructed map of the environment, and 3D reconstruction, which
is used to construct and render 3D scenes/objects from 2D images.
These tasks are deployed in a number of important applications.
For example, augmented reality libraries for mobile phones like
Google’s AR-Core [
2
] and Apple’s ARkit [
1
] use SLAM for mo-
tion and environment estimation. 3D Reconstruction is used for a
number of visualization tasks, like 3D renderings of real estate oor-
plans for apartment rentals, and free viewpoint television, which
enables interactively viewing large city models from dierent any
viewpoints used for street view.
One common approach to represent the environment as a map is
to use a voxel grid, in which the 3D space is discretized into a grid
and each element of the grid (called a voxel) stores local information
that characterizes that point in space. Typically, the distance to the
closest occupied point in space is saved in each voxel, referred
to as the Signed Distance Field (SDF) [
31
]. Mapping continually
updates the information in each voxel by integrating the depth input
information obtained from sensors with repeated calls to a map
update routine. It is often necessary to have both fast map updates
that can process high input frame rates and generating maps at high
resolution (more voxels to represent a given region). For example,
in robotics perception, faster updates allow for a faster control
loop feedback which enables robust and agile maneuvering. In
online scene reconstruction, a fast and high resolution map allows
a more detailed render to be generated at a higher frame rate. In
autonomous navigation, a higher resolution map would result in
the application inferring ner details of its surroundings, enabling
more optimized trajectories during path planning [
16
,
29
,
30
,
40
,
41
].
However, enabling fast map updates at very high resolutions is
challenging, especially for mapping large environments. We nd
that a single map update takes anywhere from around
50𝑚𝑠
to
500𝑚𝑠
, depending on the required resolution of the map. This is
due to both lengthy access latencies and a large number of accesses
to voxel data per map update. Furthermore, higher map resolutions
signicantly increase the number of voxels required to be updated.
For example, to have a
2𝑋
resolution requires each map update to
access and update 8𝑋the number of voxels.
Thus, mapping large environments necessitates fast access to
information stored in voxels for vast stretches of the environment
in memory in a scalable manner. Voxel hashing [
28
] is a commonly
used technique that oers a memory ecient representation while
having reasonable access latencies to voxel data, making it ideal for
a number of high speed mapping tasks to dynamically represent
large areas of the environment [
8
10
,
12
,
13
,
19
,
20
,
22
,
28
,
32
35
].
It uses a hash table to map coordinates in space to an
𝑛×𝑛×𝑛
block of voxels (voxel block). However, we nd that the percentage
of time that is spent in accessing voxel SDF information from the
arXiv:2210.08729v1 [cs.AR] 17 Oct 2022
PACT ’22, October 10–12, 2012, Chicago,IL Durvasula, et al.
voxel hash is up to
60%
of the overall map update time, leading to
high latency map updates. Another commonly used technique is
to use octrees [
15
]. State-of-the-art octree implementations store
blocks of voxels as leaves of a tree data structure, but still incur
similar voxel data access latencies as that of voxel hashing [24].
Our goal is to accelerate this voxel data retrieval time during
map updates in order to have the low latency map updates at
high-resolution that are crucial for many important applications.
We observe that while mapping requires a large number of up-
dates/accesses (in the order of millions) to voxels, many of which
are repeated accesses to the same voxels.
We introduce VoxelCache, a hardware-software mechanism to
enable fast map updates by ensuring low latency accesses to voxel
data. The key idea behind VoxelCache is to leverage temporal local-
ity in accesses to the voxels by caching the pointer to the voxel cor-
responding to each 3D coordinate. VoxelCache enables fast lookups
to the voxel blocks by using the on-chip caches to save the voxel
block pointers for recently accessed voxels. This avoids the need to
perform expensive hash table computations for each lookup in the
case of voxel hashing or expensive tree traversal in octrees.
We demonstrate that our approach achieves speedups of
1.47𝑋
(upto
1.66𝑋
) and
1.79𝑋
(upto
1.91𝑋
) on average in mapping update
for various commonly-used SLAM and 3D reconstruction frame-
works on CPUs and GPUs respectively. The contributions of this
work are as follows:
We perform a detailed characterization of various 3D map-
ping and reconstruction tasks, and we observe that voxel
hash resolution forms a signicant performance bottleneck
for mapping.
We introduce VoxelCache, a hardware-software mechanism
to accelerate mapping updates at higher resolutions by en-
abling faster accesses to voxel data.
We evaluate VoxelCache for various SLAM, online 3D re-
construction frameworks that construct an SDF map of its
environment on both CPUs and GPUs. We nd that Voxel-
Cache is able to provide signicant speedups in map update
latencies for these workloads.
2 BACKGROUND AND MOTIVATION
2.1 Representation of Voxel Maps in Memory
A straightforward approach to store voxel data is to use a 3D array
indexed by the spatial coordinate of the voxel [
18
,
37
]. However,
this approach is not scalable and becomes impractical for large
maps or at high resolution as it requires a signicant amount of
memory for storage. For example, a 3D array to represent occupancy
information for a relatively small
10𝑚×10𝑚×10𝑚
region with
voxels of resolution 2𝑐𝑚 would require 500𝑀𝐵.
2.1.1 Voxel Data Structures.
Voxel hashing
[
28
] provides an e-
cient way to store maps of large areas in a scalable manner while
incurring low access/insertion latencies. Compared to a 3D array
representation of voxels which densely represents both empty and
occupied spaces, voxel hashing only allocates voxels for areas that
are occupied. Unallocated voxels are considered to be empty mak-
ing it highly scalable with respect to memory. Voxel hashing thus
involves saving allocated blocks of voxels (of size
𝑛×𝑛×𝑛
) that
kx, ky, kz
hash()
Buckets
Sequence of
data-dependent loads
Block Pointers
Figure 1: Retrieving block pointer with a hash table lookup
is indexed the coordinates of voxel block using a hash table. Map-
ping a large scene involves a number of allocations of blocks of
voxels. A large block size would decrease the number of hash table
entries needed, but would greatly increase the memory footprint
occupied by the block. Since the target platforms for a number
of SLAM/mapping and reconstruction frameworks are embedded
systems which have constraints on memory and compute power, a
large block size is generally not preferred. A block of dimensions
8×8×8is usually chosen.
Octrees
[
15
] are another popular approach for storing voxel in-
formation of large regions of the environment in memory. Octrees
use a tree data structure to store voxel data in space. Each node in
the tree corresponds to a cubic volume in 3D space. A node may
contain
8
children, which represent 8 equal octants of the cube, or
it may be a leaf node, which corresponds to an individual voxel. An
access to a voxel involves traversing through the tree data structure
starting from the root node to the leaf node. Since each access to a
voxel requires a tree traversal, octrees incur a high access latency,
making online mapping a lot more time-consuming compared to
implementations using voxel hashing. However, some recent works
on online mapping with octrees propose specialized implementa-
tions that achieve similar performance as implementations with
voxel hashing [7, 24].
2.1.2 Signed Distance Fields and Map Updates. Signed distance
eld (SDF) information stored in voxels have previously been used
to represent 3D volumes in graphics applications, but are increas-
ingly deployed in computer vision and simultaneous localization
and mapping (SLAM) tasks due to the exible representation and
accurate 3D reconstructions it is able to provide [
18
,
31
]. A signed
distance is computed for each voxel in the map, and represents the
estimated distance to the closest occupied surface in space. While
signed distance elds provide high quality environment represen-
tation, they have large memory requirements. A voxel grid map to
store the signed distance is the most commonly used approach in
mapping literature [
9
,
13
,
18
,
22
,
26
,
32
35
,
37
,
38
]. In this work, we
mainly consider frameworks whose map construction maintains
and updates signed distance information stored in voxel grids.
The map update routine receives depth measurement informa-
tion of dierent points in space from either depth cameras / LiDAR
or stereo vision cameras, and a pose estimate of the sensor from
odometry measurements or external pose measurements as the
input. Real time computation of true SDF from depth sensor mea-
surements over a large volume is computationally expensive [
18
].
VoxelCache: Accelerating Online Mapping
in Robotics and 3D Reconstruction Tasks PACT ’22, October 10–12, 2012, Chicago,IL
One common approach to obtain an approximate SDF is to com-
pute the truncated signed distance eld (TSDF) in which the point
cloud captured by the sensor is rst transformed from the camera’s
coordinates to the global coordinates. Multiple rays are cast from
the camera center to each point in the point cloud. The distance
from each point to surrounding voxel centers along the ray up to
a truncation distance is calculated and assigned to these voxels.
Another approach to estimate an approximate SDF is to compute
the Euclidean Signed Distance Field (ESDF) in which the distance
from each occupied voxel center (closest voxel to each point in the
point cloud captured by the sensor) to all neighboring voxels up to
a clear radius is computed. These distances computed are used to
update the signed distance of corresponding voxels.
2.2 Characterizing Mapping Overheads
Figure 1 illustrates how a voxel block pointer is retrieved from its
coordinates by computing the hash function and iterating through
a data-dependent sequence of load and key comparison operations.
This is a high latency operation because of data-dependent memory
accesses and the high branch mispredict rates in the control ow.
A large number of these operations are performed at each map
update leading to a signicantly higher map update time. These
operations form a signicant portion of the overall map update
time. An analysis of the contribution of this latency as a percentage
of runtime and absolute number of accesses to the voxel hash table
is shown in Figure 2. Note that in the case of
infi-CPU
, despite
the relatively high number of accesses we observe a relatively
smaller percentage of cycles contributing to the map update time.
This is because
infi-CPU
uses a memoization scheme in software
which caches the recently accessed block pointer at each hash table
bucket. While reducing the number of hash table accesses, it incurs
additional overhead for memoization at every access.
fiesta voxblox inft-CPU opchisl rfsn geomean
20
30
40
50
60
0
2M
4M
6M
8M
cycles(%) accesses
% cycles/update
num. accesses
Figure 2: Analysis of the number of voxel block accesses
and fraction of overall runtime attributed to voxel block ac-
cesses
At higher resolutions, the number of accesses made at every
update increases greatly, leading to a signicant increase in map
update time. Figure 3 shows the impact of resolution on map update
time. On average, the map update time increases by
9𝑋
between
grid sizes 0.15𝑚and 0.05𝑚.
Figure 4 shows the number of accesses and the access time as
a percentage of the overall scene integration time of Supereight [
24
],
which uses an ecient octree implementation for the ICL-NUIM [
14
]
living room traj2 dataset. We observe that the average number of
accesses at smaller voxel grid sizes is up to
1.5×106
, and the ac-
cesses take up to
40%
of the scene integration time. In addition, a
signicant portion of the time taken for generating meshes using
fiesta voxblox inft-CPU opchisl c-blox mhash rfsn inft-GPU geomean
0
2
4
6
8
10
12
0.15m 0.1m 0.05m
Update Time
Figure 3: Normalized map update time with varying voxel
grid size
ray marching involves (up to 45%) voxel access times.
3.75cm 1.88cm 0.93cm gemoean
0
10
20
30
40
0
0.5M
1M
1.5M
cycles(%) accesses
% cycles/update
num. accesses
Figure 4: Fraction of time attributed to octree voxel accesses
in Supereight [24]
2.3 Characterizing Key Access Patterns
Figure 5 shows the number of distinct voxel blocks accessed on
average during a map update at a voxel resolution of
0.10𝑚
. On
comparing this with the overall number of accesses during each
map update step in Figure 2, we infer that there is massive reuse in
accesses to voxel blocks. There are two sources that contribute to
this reuse:
A single update makes repeated accesses to the voxels in a small
portion of the large scene. This is because a number of rays cast
during raycast intersect the same blocks of voxels when estimating
the TSDF. When computing the ESDF, voxels surrounding each
point from a point cloud captured by the sensor are updated (refer
to Section 2.1.2). Since these points (projected from a single depth
image) are close by in space, there is a large degree of overlap
between update regions of occupied voxels. As a result, the voxels
in these overlapped regions are accessed multiple times during each
map update.
There is a large overlap in the blocks of voxels accessed in two
consecutive updates, as it involves updating the SDF in the same
portion of the scene.
Figure 5: Number of distinct voxels accessed at each map up-
date
To empirically verify repetitive key access patterns, we analyze
the gaps between accesses to the same key (in terms of map ac-
cesses) and count the number of occurrences of each gap for the
摘要:

VoxelCache:AcceleratingOnlineMappinginRoboticsand3DReconstructionTasksSankeerthDurvasulaUniversityofTorontoCanadaRaymondKiguruUniversityofTorontoCanadaSamarthMathurPESInstituteofTechnologyIndiaJennyXuUniversityofTorontoCanadaJimmyLinUniversityofTorontoCanadaNanditaVijaykumarUniversityofTorontoCanada...

展开>> 收起<<
VoxelCache Accelerating Online Mapping in Robotics and 3D Reconstruction Tasks.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:1.11MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注