VoxelCache Accelerating Online Mapping in Robotics and 3D Reconstruction Tasks

2025-05-06 2 0 1.11MB 13 页 10玖币

侵权投诉

VoxelCache: Accelerating Online Mapping

in Robotics and 3D Reconstruction Tasks

Sankeerth Durvasula

University of Toronto

Canada

Raymond Kiguru

University of Toronto

Canada

Samarth Mathur

PES Institute of Technology

India

Jenny Xu

University of Toronto

Canada

Jimmy Lin

University of Toronto

Canada

Nandita Vijaykumar

University of Toronto

Canada

ABSTRACT

Real-time 3D mapping is a critical component in many important

applications today including robotics, AR/VR, and 3D visualization.

3D mapping involves continuously fusing depth maps obtained

from depth sensors in phones, robots, and autonomous vehicles

into a single 3D representative model of the scene. Many important

applications, e.g., global path planning and trajectory generation

in micro aerial vehicles, require the construction of large maps at

high resolutions. In this work, we identify mapping, i.e., construc-

tion and updates of 3D maps to be a critical bottleneck in these

applications. The memory required and access times of these maps

limit the size of the environment and the resolution with which

the environment can be feasibly mapped, especially in resource

constrained environments such as autonomous robot platforms and

portable devices. To address this challenge, we propose VoxelCache:

a hardware-software technique to accelerate map data access times

in 3D mapping applications. We observe that mapping applications

typically access voxels in the map that are spatially co-located to

each other. We leverage this temporal locality in voxel accesses to

cache indices to blocks of voxels to enable quick lookup and avoid

expensive access times. We evaluate VoxelCache on popularly used

mapping and reconstruction applications on both GPUs and CPUs.

We demonstrate an average speedup of 1.47X (up to 1.66X) and

1.79X (up to 1.91X) on CPUs and GPUs respectively.

CCS CONCEPTS

•Computer systems organization →

Robotics;

Embedded sys-

tems;Other architectures;

KEYWORDS

Mapping, SLAM, reconstruction, caching, hardware acceleration

1 INTRODUCTION

3D mapping is an essential component in many important applica-

tions today, such as in autonomous robotics, AR/VR applications,

and 3D reconstruction. Mapping involves the real-time construc-

tion of a representation (map) of the environment. This is typically

done by continually processing and integrating distance informa-

tion to objects in the scene that is obtained from sensors such as

depth cameras and LiDAR. Mapping is an integral part of tasks

such as SLAM (Simultaneous Localization and Mapping), which

involves determining the location/pose of a mobile robot within the

constructed map of the environment, and 3D reconstruction, which

is used to construct and render 3D scenes/objects from 2D images.

These tasks are deployed in a number of important applications.

For example, augmented reality libraries for mobile phones like

Google’s AR-Core [

] and Apple’s ARkit [

] use SLAM for mo-

tion and environment estimation. 3D Reconstruction is used for a

number of visualization tasks, like 3D renderings of real estate oor-

plans for apartment rentals, and free viewpoint television, which

enables interactively viewing large city models from dierent any

viewpoints used for street view.

One common approach to represent the environment as a map is

to use a voxel grid, in which the 3D space is discretized into a grid

and each element of the grid (called a voxel) stores local information

that characterizes that point in space. Typically, the distance to the

closest occupied point in space is saved in each voxel, referred

to as the Signed Distance Field (SDF) [

]. Mapping continually

updates the information in each voxel by integrating the depth input

information obtained from sensors with repeated calls to a map

update routine. It is often necessary to have both fast map updates

that can process high input frame rates and generating maps at high

resolution (more voxels to represent a given region). For example,

in robotics perception, faster updates allow for a faster control

loop feedback which enables robust and agile maneuvering. In

online scene reconstruction, a fast and high resolution map allows

a more detailed render to be generated at a higher frame rate. In

autonomous navigation, a higher resolution map would result in

the application inferring ner details of its surroundings, enabling

more optimized trajectories during path planning [

However, enabling fast map updates at very high resolutions is

challenging, especially for mapping large environments. We nd

that a single map update takes anywhere from around

50𝑚𝑠

500𝑚𝑠

, depending on the required resolution of the map. This is

due to both lengthy access latencies and a large number of accesses

to voxel data per map update. Furthermore, higher map resolutions

signicantly increase the number of voxels required to be updated.

For example, to have a

2𝑋

resolution requires each map update to

access and update 8𝑋the number of voxels.

Thus, mapping large environments necessitates fast access to

information stored in voxels for vast stretches of the environment

in memory in a scalable manner. Voxel hashing [

] is a commonly

used technique that oers a memory ecient representation while

having reasonable access latencies to voxel data, making it ideal for

a number of high speed mapping tasks to dynamically represent

large areas of the environment [

–

It uses a hash table to map coordinates in space to an

𝑛×𝑛×𝑛

block of voxels (voxel block). However, we nd that the percentage

of time that is spent in accessing voxel SDF information from the

arXiv:2210.08729v1 [cs.AR] 17 Oct 2022

PACT ’22, October 10–12, 2012, Chicago,IL Durvasula, et al.

voxel hash is up to

60%

of the overall map update time, leading to

high latency map updates. Another commonly used technique is

to use octrees [

]. State-of-the-art octree implementations store

blocks of voxels as leaves of a tree data structure, but still incur

similar voxel data access latencies as that of voxel hashing [24].

Our goal is to accelerate this voxel data retrieval time during

map updates in order to have the low latency map updates at

high-resolution that are crucial for many important applications.

We observe that while mapping requires a large number of up-

dates/accesses (in the order of millions) to voxels, many of which

are repeated accesses to the same voxels.

We introduce VoxelCache, a hardware-software mechanism to

enable fast map updates by ensuring low latency accesses to voxel

data. The key idea behind VoxelCache is to leverage temporal local-

ity in accesses to the voxels by caching the pointer to the voxel cor-

responding to each 3D coordinate. VoxelCache enables fast lookups

to the voxel blocks by using the on-chip caches to save the voxel

block pointers for recently accessed voxels. This avoids the need to

perform expensive hash table computations for each lookup in the

case of voxel hashing or expensive tree traversal in octrees.

We demonstrate that our approach achieves speedups of

1.47𝑋

(upto

1.66𝑋

) and

1.79𝑋

(upto

1.91𝑋

) on average in mapping update

for various commonly-used SLAM and 3D reconstruction frame-

works on CPUs and GPUs respectively. The contributions of this

work are as follows:

•

We perform a detailed characterization of various 3D map-

ping and reconstruction tasks, and we observe that voxel

hash resolution forms a signicant performance bottleneck

for mapping.

•

We introduce VoxelCache, a hardware-software mechanism

to accelerate mapping updates at higher resolutions by en-

abling faster accesses to voxel data.

•

We evaluate VoxelCache for various SLAM, online 3D re-

construction frameworks that construct an SDF map of its

environment on both CPUs and GPUs. We nd that Voxel-

Cache is able to provide signicant speedups in map update

latencies for these workloads.

2 BACKGROUND AND MOTIVATION

2.1 Representation of Voxel Maps in Memory

A straightforward approach to store voxel data is to use a 3D array

indexed by the spatial coordinate of the voxel [

]. However,

this approach is not scalable and becomes impractical for large

maps or at high resolution as it requires a signicant amount of

memory for storage. For example, a 3D array to represent occupancy

information for a relatively small

10𝑚×10𝑚×10𝑚

region with

voxels of resolution 2𝑐𝑚 would require 500𝑀𝐵.

2.1.1 Voxel Data Structures.

Voxel hashing

[

] provides an e-

cient way to store maps of large areas in a scalable manner while

incurring low access/insertion latencies. Compared to a 3D array

representation of voxels which densely represents both empty and

occupied spaces, voxel hashing only allocates voxels for areas that

are occupied. Unallocated voxels are considered to be empty mak-

ing it highly scalable with respect to memory. Voxel hashing thus

involves saving allocated blocks of voxels (of size

𝑛×𝑛×𝑛

) that

kx, ky, kz

hash()

Buckets

Sequence of

data-dependent loads

Block Pointers

Figure 1: Retrieving block pointer with a hash table lookup

is indexed the coordinates of voxel block using a hash table. Map-

ping a large scene involves a number of allocations of blocks of

voxels. A large block size would decrease the number of hash table

entries needed, but would greatly increase the memory footprint

occupied by the block. Since the target platforms for a number

of SLAM/mapping and reconstruction frameworks are embedded

systems which have constraints on memory and compute power, a

large block size is generally not preferred. A block of dimensions

8×8×8is usually chosen.

Octrees

[

] are another popular approach for storing voxel in-

formation of large regions of the environment in memory. Octrees

use a tree data structure to store voxel data in space. Each node in

the tree corresponds to a cubic volume in 3D space. A node may

contain

children, which represent 8 equal octants of the cube, or

it may be a leaf node, which corresponds to an individual voxel. An

access to a voxel involves traversing through the tree data structure

starting from the root node to the leaf node. Since each access to a

voxel requires a tree traversal, octrees incur a high access latency,

making online mapping a lot more time-consuming compared to

implementations using voxel hashing. However, some recent works

on online mapping with octrees propose specialized implementa-

tions that achieve similar performance as implementations with

voxel hashing [7, 24].

2.1.2 Signed Distance Fields and Map Updates. Signed distance

eld (SDF) information stored in voxels have previously been used

to represent 3D volumes in graphics applications, but are increas-

ingly deployed in computer vision and simultaneous localization

and mapping (SLAM) tasks due to the exible representation and

accurate 3D reconstructions it is able to provide [

]. A signed

distance is computed for each voxel in the map, and represents the

estimated distance to the closest occupied surface in space. While

signed distance elds provide high quality environment represen-

tation, they have large memory requirements. A voxel grid map to

store the signed distance is the most commonly used approach in

mapping literature [

–

]. In this work, we

mainly consider frameworks whose map construction maintains

and updates signed distance information stored in voxel grids.

The map update routine receives depth measurement informa-

tion of dierent points in space from either depth cameras / LiDAR

or stereo vision cameras, and a pose estimate of the sensor from

odometry measurements or external pose measurements as the

input. Real time computation of true SDF from depth sensor mea-

surements over a large volume is computationally expensive [

VoxelCache: Accelerating Online Mapping

in Robotics and 3D Reconstruction Tasks PACT ’22, October 10–12, 2012, Chicago,IL

One common approach to obtain an approximate SDF is to com-

pute the truncated signed distance eld (TSDF) in which the point

cloud captured by the sensor is rst transformed from the camera’s

coordinates to the global coordinates. Multiple rays are cast from

the camera center to each point in the point cloud. The distance

from each point to surrounding voxel centers along the ray up to

a truncation distance is calculated and assigned to these voxels.

Another approach to estimate an approximate SDF is to compute

the Euclidean Signed Distance Field (ESDF) in which the distance

from each occupied voxel center (closest voxel to each point in the

point cloud captured by the sensor) to all neighboring voxels up to

a clear radius is computed. These distances computed are used to

update the signed distance of corresponding voxels.

2.2 Characterizing Mapping Overheads

Figure 1 illustrates how a voxel block pointer is retrieved from its

coordinates by computing the hash function and iterating through

a data-dependent sequence of load and key comparison operations.

This is a high latency operation because of data-dependent memory

accesses and the high branch mispredict rates in the control ow.

A large number of these operations are performed at each map

update leading to a signicantly higher map update time. These

operations form a signicant portion of the overall map update

time. An analysis of the contribution of this latency as a percentage

of runtime and absolute number of accesses to the voxel hash table

is shown in Figure 2. Note that in the case of

infi-CPU

, despite

the relatively high number of accesses we observe a relatively

smaller percentage of cycles contributing to the map update time.

This is because

infi-CPU

uses a memoization scheme in software

which caches the recently accessed block pointer at each hash table

bucket. While reducing the number of hash table accesses, it incurs

additional overhead for memoization at every access.

fiesta voxblox inft-CPU opchisl rfsn geomean

cycles(%) accesses

% cycles/update

num. accesses

Figure 2: Analysis of the number of voxel block accesses

and fraction of overall runtime attributed to voxel block ac-

cesses

At higher resolutions, the number of accesses made at every

update increases greatly, leading to a signicant increase in map

update time. Figure 3 shows the impact of resolution on map update

time. On average, the map update time increases by

9𝑋

between

grid sizes 0.15𝑚and 0.05𝑚.

Figure 4 shows the number of accesses and the access time as

a percentage of the overall scene integration time of Supereight [

which uses an ecient octree implementation for the ICL-NUIM [

]

living room traj2 dataset. We observe that the average number of

accesses at smaller voxel grid sizes is up to

1.5×106

, and the ac-

cesses take up to

40%

of the scene integration time. In addition, a

signicant portion of the time taken for generating meshes using

fiesta voxblox inft-CPU opchisl c-blox mhash rfsn inft-GPU geomean

0.15m 0.1m 0.05m

Update Time

Figure 3: Normalized map update time with varying voxel

grid size

ray marching involves (up to 45%) voxel access times.

3.75cm 1.88cm 0.93cm gemoean

0.5M

1.5M

cycles(%) accesses

% cycles/update

num. accesses

Figure 4: Fraction of time attributed to octree voxel accesses

in Supereight [24]

2.3 Characterizing Key Access Patterns

Figure 5 shows the number of distinct voxel blocks accessed on

average during a map update at a voxel resolution of

0.10𝑚

. On

comparing this with the overall number of accesses during each

map update step in Figure 2, we infer that there is massive reuse in

accesses to voxel blocks. There are two sources that contribute to

this reuse:

•

A single update makes repeated accesses to the voxels in a small

portion of the large scene. This is because a number of rays cast

during raycast intersect the same blocks of voxels when estimating

the TSDF. When computing the ESDF, voxels surrounding each

point from a point cloud captured by the sensor are updated (refer

to Section 2.1.2). Since these points (projected from a single depth

image) are close by in space, there is a large degree of overlap

between update regions of occupied voxels. As a result, the voxels

in these overlapped regions are accessed multiple times during each

map update.

•

There is a large overlap in the blocks of voxels accessed in two

consecutive updates, as it involves updating the SDF in the same

portion of the scene.

fiesta voxblox inft-CPU opchisl c-blox mhash rfsn inft-GPU geomean

100

120

140

160

distinct accesses

Figure 5: Number of distinct voxels accessed at each map up-

date

To empirically verify repetitive key access patterns, we analyze

the gaps between accesses to the same key (in terms of map ac-

cesses) and count the number of occurrences of each gap for the

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

VoxelCache:AcceleratingOnlineMappinginRoboticsand3DReconstructionTasksSankeerthDurvasulaUniversityofTorontoCanadaRaymondKiguruUniversityofTorontoCanadaSamarthMathurPESInstituteofTechnologyIndiaJennyXuUniversityofTorontoCanadaJimmyLinUniversityofTorontoCanadaNanditaVijaykumarUniversityofTorontoCanada...

展开>> 收起<<

VoxelCache Accelerating Online Mapping in Robotics and 3D Reconstruction Tasks.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

VoxelCache Accelerating Online Mapping in Robotics and 3D Reconstruction Tasks

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: