Efficient Computation of Map-scale Continuous Mutual Information on Chip in Real Time Keshav Gupta Peter Zhi Xuan Li Sertac Karaman Vivienne Sze

2025-05-03 0 0 2.07MB 7 页 10玖币
侵权投诉
Efficient Computation of Map-scale Continuous Mutual Information on Chip
in Real Time
Keshav Gupta, Peter Zhi Xuan Li, Sertac Karaman, Vivienne Sze
Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Emails: {keshav21,peterli,sertac,sze}@mit.edu
Abstract Exploration tasks are essential to many emerging
robotics applications, ranging from search and rescue to space
exploration. The planning problem for exploration requires
determining the best locations for future measurements that
will enhance the fidelity of the map, for example, by reducing
its total entropy. A widely-studied technique involves computing
the Mutual Information (MI) between the current map and
future measurements, and utilizing this MI metric to decide
the locations for future measurements. However, computing
MI for reasonably-sized maps is slow and power hungry,
which has been a bottleneck towards fast and efficient robotic
exploration. In this paper, we introduce a new hardware
accelerator architecture for MI computation that features a
low-latency, energy-efficient MI compute core and an optimized
memory subsystem that provides sufficient bandwidth to keep
the cores fully utilized. The core employs interleaving to counter
the recursive algorithm, and workload balancing and numerical
approximations to reduce latency and energy consumption.
We demonstrate this optimized architecture with a Field-
Programmable Gate Array (FPGA) implementation, which can
compute MI for all cells in an entire 201-by-201 occupancy grid
(e.g., representing a 20.1m-by-20.1m map at 0.1m resolution)
in 1.55 ms while consuming 1.7 mJ of energy, thus finally
rendering MI computation for the whole map real time and
at a fraction of the energy cost of traditional compute plat-
forms. For comparison, this particular FPGA implementation
running on the Xilinx Zynq-7000 platform is two orders of
magnitude faster and consumes three orders of magnitude less
energy per MI map compute, when compared to a baseline
GPU implementation running on an NVIDIA GeForce GTX
980 platform. The improvements are more pronounced when
compared to CPU implementations of equivalent algorithms.
I. INTRODUCTION
Exploration problems arise in an increasing set of appli-
cations, particularly as robotic vehicles are being utilized in
deep sea and space exploration missions as well as search and
rescue scenarios. In most applications, the goal is to create
a map of the environment, e.g., from range measurements,
as quickly as possible. A key problem, then, is to find the
best locations for future measurements in order to achieve
this goal.
Consider an instance of this problem involving a robot
equipped with a laser scanner. A typical computation cy-
cle in this process involves making a scan of the robot’s
surroundings, updating the partially-unknown map with the
information from the scan, determining the best location for
the next scan and moving to that location. An information-
theoretic approach to determining the best scan locations
involves finding the location with the highest Mutual In-
Crossbar Interconnect
Crossbar Interconnect
Bresenham
Ray Caster
Control
FSM
Core 1
Occupancy Grid Bank 1
Mutual Information Bank 1
ATU 1
Core 2
Occupancy Grid Bank 2
Mutual Information Bank 2
ATU 2
Core 3
Occupancy Grid Bank 3
Mutual Information Bank 3
ATU 3
Core 16
Occupancy Grid Bank 16
Mutual Information Bank 16
ATU 16
Ray Origin Coordinate
Current Coordinate Offset
Bank Address
Cell Occupancy (read)
Partial MI at Cell (read)
Updated MI for Cell (write)
Cell Width, Ray Reset
Legend:
Memory Subsystem MI Cores
Fig. 1: (top) Top-level hardware architecture comprised of
Address Translation Units (ATU), crossbar interconnects,
occupancy grid map, MI map, and 16 FCMI computation
cores. (bottom) An example input occupancy grid, cor-
responding MI map contours generated from a reference
C++ implementation and our FPGA implementation, and the
absolute error between the two.
formation (MI) between the potential future measurement
and the existing map. Unfortunately, computing MI at all
map locations is slow and power hungry, even for small-size
maps [1], which makes it especially infeasible for usage in
medium to small scale robots with the energy budget of a
few kiloJoules. Most heuristics used to reduce the scale of
this computation, such as calculating the MI at sparse points
instead of the entire map, fail to provide guarantees, e.g., on
arXiv:2210.03623v1 [cs.AR] 7 Oct 2022
the quality of the resulting motion [2]–[4].
To accelerate the computation of MI, several novel al-
gorithms were proposed. For instance, the efficient com-
putation of Cauchy-Schwarz Quadratic Mutual Information
(CSQMI) is introduced in [5], [6]. In addition, the high-
throughput computation of Shannon Mutual information is
proposed in the Fast Shannon Mutual Information (FSMI)
algorithm [7]. More recently, the computation of Shannon MI
is reformulated recursively and continuously along the range
measurements in the Fast Continuous Mutual Information
(FCMI) algorithm [8], which leads to a significant reduction
in amount of compute needed for MI across an entire map.
Although these algorithms reduce the time complexity for MI
computation, their optimized implementations on CPUs and
GPUs still achieve a throughput that is orders-of-magnitude
below what is required for autonomous exploration, espe-
cially in large environments. This motivates the design of
specialized hardware accelerators for computing MI.
Hardware accelerators have been proposed for robotics
applications such as motion-planning [9]–[12] and visual-
inertial odometry [13], [14]. To our best knowledge, the
first and only hardware accelerator for computing MI was
proposed in [15], which contains 16 high-throughput com-
putation cores that compute FSMI. For moderately-sized
environments (i.e., a map size of 200 ×200 to 512 ×512
cells), this accelerator computes MI for the entire map in
near real time (i.e., less than 10Hz), which is far below the
rate in which the robot makes range measurements (around
30-60Hz).
The computation of mutual information may seem like a
task that can be easily parallelized. One might be tempted
to think, for example, that computational cores dedicated to
various parts of the map can work in parallel to compute the
mutual information for all cells in the map all at the same
time. Indeed, under certain widely-utilized assumptions, the
computation decouples and cells can be evaluated separately.
Unfortunately, however, the computation does not scale, not
because it can not be decoupled, but because it becomes
impossible to feed all cores with the data that they need for
their computations. More specifically, all cores need to access
different portions of the map, which resides in memory, all
at the same time. The memory bandwidth does not scale
to keep all cores busy. In fact, we show in this paper that
naively scaling up the number cores does not lead to a fast
computation of the mutual information metric, as most cores
end up underutilized due to insufficient memory bandwidth.
The main contribution of this paper is a scalable, multi-
core hardware architecture for computing mutual information
for a 2D occupancy grid map. This architecture has two
key components: (i) a memory management system that
provides sufficient bandwidth to all cores, and (ii) a low-
latency, energy-efficient core (i.e. processing element) design
that fully utilizes all available bandwidth. Specifically, the
memory management system exploits the memory access
pattern of the cores and splits the storage of the occupancy
grid map into smaller memory banks so that all cores access
different banks every cycle. The core contains a feedback
architecture designed to handle the dependencies that are
introduced by the recursive nature of the FCMI algorithm [8],
so that all compute units within the core are fully utilized.
Each core is time-interleaved to compute the FCMI along
multiple rays in order to avoid stalls associated with the
feedback architecture.
The proposed hardware architecture is implemented with
16 cores on a Xilinx Zynq-7000 XC7Z045 FPGA. For a
robot equipped with a 60-ray range sensor utilizing an occu-
pancy grid map of size 201×201, our FPGA implementation
computes MI for the entire map in 1.55 ms (647 Hz, or
much faster than real time), which is 381×faster than an
optimized C++ implementation running on a ARM Cortex
A57 (embedded cellphone processor) and 71×faster than
an optimized CUDA implementation running on a NVIDIA
GTX 980 GPU. In addition, our FPGA implementation
consumes 1.7mJ of energy per MI map compute, which
is 1314×lower than ARM Cortex A57 and 2650×lower
than NVIDIA GTX 980 GPU. A CPU on a robot can
therefore offload Mutual Information computation to such an
accelerator, while interfacing with the sensors and actuators
and performing motion planning. The extremely low latency
and energy consumption of our accelerator allows robots to
explore efficiently without relying on heuristic approxima-
tions even with the energy budget of a few Joules.
II. PRELIMINARIES
A. The FCMI Algorithm
Occupancy grid map [16] is a common probabilistic
representation of the 2D environment and used in several
well known autonomous exploration systems, such as the
frontier-exploration algorithm [17] and Shannon MI-based
mapping algorithms [18]–[22]. In an information theoretic
framework, MI is used to quantify the expected information
gain from potential range measurements in an unknown
environment. The unknown environment is represented by
an occupancy grid map, which is updated continuously
by the robot throughout an exploration trial. The FCMI
algorithm [8] computes the mutual information between the
map and potential range measurement rays emanating from
the sensors (e.g., LiDAR) on such robot.
Let Z={Zθ1, Zθ2. . .}represent range measurements
made by sensor rays with angles Θ = {θ1, θ2, . . .}emanating
from robot’s location x. Let θdenote the angular resolution
of the sensor rays, which is constant across all rays. The
Shannon MI Ix(M;Z)between the measurements Zmade
at location xand map Mis defined as
Ix(M;Z) = X
θΘ
Ix(M;Zθ)∆θ.(1)
Note that the MI between a single range measurement Zθ
and map Mcan be written as
Ix(M;Zθ) = hx(Zθ)hx(Zθ|M),(2)
where hx(Zθ)is the entropy of the range measurement,
and hx(Zθ|M)is the conditional entropy of the range
measurement given the map.
摘要:

EfcientComputationofMap-scaleContinuousMutualInformationonChipinRealTimeKeshavGupta,PeterZhiXuanLi,SertacKaraman,VivienneSzeMassachusettsInstituteofTechnology,Cambridge,Massachusetts02139Emails:fkeshav21,peterli,sertac,szeg@mit.eduAbstract—Explorationtasksareessentialtomanyemergingroboticsapplicati...

展开>> 收起<<
Efficient Computation of Map-scale Continuous Mutual Information on Chip in Real Time Keshav Gupta Peter Zhi Xuan Li Sertac Karaman Vivienne Sze.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:2.07MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注