Efﬁcient Computation of Map-scale Continuous Mutual Information on Chip in Real Time Keshav Gupta Peter Zhi Xuan Li Sertac Karaman Vivienne Sze

2025-05-03 0 0 2.07MB 7 页 10玖币

侵权投诉

Efﬁcient Computation of Map-scale Continuous Mutual Information on Chip

in Real Time

Keshav Gupta, Peter Zhi Xuan Li, Sertac Karaman, Vivienne Sze

Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Emails: {keshav21,peterli,sertac,sze}@mit.edu

Abstract— Exploration tasks are essential to many emerging

robotics applications, ranging from search and rescue to space

exploration. The planning problem for exploration requires

determining the best locations for future measurements that

will enhance the ﬁdelity of the map, for example, by reducing

its total entropy. A widely-studied technique involves computing

the Mutual Information (MI) between the current map and

future measurements, and utilizing this MI metric to decide

the locations for future measurements. However, computing

MI for reasonably-sized maps is slow and power hungry,

which has been a bottleneck towards fast and efﬁcient robotic

exploration. In this paper, we introduce a new hardware

accelerator architecture for MI computation that features a

low-latency, energy-efﬁcient MI compute core and an optimized

memory subsystem that provides sufﬁcient bandwidth to keep

the cores fully utilized. The core employs interleaving to counter

the recursive algorithm, and workload balancing and numerical

approximations to reduce latency and energy consumption.

We demonstrate this optimized architecture with a Field-

Programmable Gate Array (FPGA) implementation, which can

compute MI for all cells in an entire 201-by-201 occupancy grid

(e.g., representing a 20.1m-by-20.1m map at 0.1m resolution)

in 1.55 ms while consuming 1.7 mJ of energy, thus ﬁnally

rendering MI computation for the whole map real time and

at a fraction of the energy cost of traditional compute plat-

forms. For comparison, this particular FPGA implementation

running on the Xilinx Zynq-7000 platform is two orders of

magnitude faster and consumes three orders of magnitude less

energy per MI map compute, when compared to a baseline

GPU implementation running on an NVIDIA GeForce GTX

980 platform. The improvements are more pronounced when

compared to CPU implementations of equivalent algorithms.

I. INTRODUCTION

Exploration problems arise in an increasing set of appli-

cations, particularly as robotic vehicles are being utilized in

deep sea and space exploration missions as well as search and

rescue scenarios. In most applications, the goal is to create

a map of the environment, e.g., from range measurements,

as quickly as possible. A key problem, then, is to ﬁnd the

best locations for future measurements in order to achieve

this goal.

Consider an instance of this problem involving a robot

equipped with a laser scanner. A typical computation cy-

cle in this process involves making a scan of the robot’s

surroundings, updating the partially-unknown map with the

information from the scan, determining the best location for

the next scan and moving to that location. An information-

theoretic approach to determining the best scan locations

involves ﬁnding the location with the highest Mutual In-

Crossbar Interconnect

Bresenham

Ray Caster

Control

FSM

Core 1

Occupancy Grid Bank 1

Mutual Information Bank 1

ATU 1

Core 2

Occupancy Grid Bank 2

Mutual Information Bank 2

ATU 2

Core 3

Occupancy Grid Bank 3

Mutual Information Bank 3

ATU 3

Core 16

Occupancy Grid Bank 16

Mutual Information Bank 16

ATU 16

…

Ray Origin Coordinate

Current Coordinate Offset

Bank Address

Cell Occupancy (read)

Partial MI at Cell (read)

Updated MI for Cell (write)

Cell Width, Ray Reset

Legend:

Memory Subsystem MI Cores

Fig. 1: (top) Top-level hardware architecture comprised of

Address Translation Units (ATU), crossbar interconnects,

occupancy grid map, MI map, and 16 FCMI computation

cores. (bottom) An example input occupancy grid, cor-

responding MI map contours generated from a reference

C++ implementation and our FPGA implementation, and the

absolute error between the two.

formation (MI) between the potential future measurement

and the existing map. Unfortunately, computing MI at all

map locations is slow and power hungry, even for small-size

maps [1], which makes it especially infeasible for usage in

medium to small scale robots with the energy budget of a

few kiloJoules. Most heuristics used to reduce the scale of

this computation, such as calculating the MI at sparse points

instead of the entire map, fail to provide guarantees, e.g., on

arXiv:2210.03623v1 [cs.AR] 7 Oct 2022

the quality of the resulting motion [2]–[4].

To accelerate the computation of MI, several novel al-

gorithms were proposed. For instance, the efﬁcient com-

putation of Cauchy-Schwarz Quadratic Mutual Information

(CSQMI) is introduced in [5], [6]. In addition, the high-

throughput computation of Shannon Mutual information is

proposed in the Fast Shannon Mutual Information (FSMI)

algorithm [7]. More recently, the computation of Shannon MI

is reformulated recursively and continuously along the range

measurements in the Fast Continuous Mutual Information

(FCMI) algorithm [8], which leads to a signiﬁcant reduction

in amount of compute needed for MI across an entire map.

Although these algorithms reduce the time complexity for MI

computation, their optimized implementations on CPUs and

GPUs still achieve a throughput that is orders-of-magnitude

below what is required for autonomous exploration, espe-

cially in large environments. This motivates the design of

specialized hardware accelerators for computing MI.

Hardware accelerators have been proposed for robotics

applications such as motion-planning [9]–[12] and visual-

inertial odometry [13], [14]. To our best knowledge, the

ﬁrst and only hardware accelerator for computing MI was

proposed in [15], which contains 16 high-throughput com-

putation cores that compute FSMI. For moderately-sized

environments (i.e., a map size of 200 ×200 to 512 ×512

cells), this accelerator computes MI for the entire map in

near real time (i.e., less than 10Hz), which is far below the

rate in which the robot makes range measurements (around

30-60Hz).

The computation of mutual information may seem like a

task that can be easily parallelized. One might be tempted

to think, for example, that computational cores dedicated to

various parts of the map can work in parallel to compute the

mutual information for all cells in the map all at the same

time. Indeed, under certain widely-utilized assumptions, the

computation decouples and cells can be evaluated separately.

Unfortunately, however, the computation does not scale, not

because it can not be decoupled, but because it becomes

impossible to feed all cores with the data that they need for

their computations. More speciﬁcally, all cores need to access

different portions of the map, which resides in memory, all

at the same time. The memory bandwidth does not scale

to keep all cores busy. In fact, we show in this paper that

naively scaling up the number cores does not lead to a fast

computation of the mutual information metric, as most cores

end up underutilized due to insufﬁcient memory bandwidth.

The main contribution of this paper is a scalable, multi-

core hardware architecture for computing mutual information

for a 2D occupancy grid map. This architecture has two

key components: (i) a memory management system that

provides sufﬁcient bandwidth to all cores, and (ii) a low-

latency, energy-efﬁcient core (i.e. processing element) design

that fully utilizes all available bandwidth. Speciﬁcally, the

memory management system exploits the memory access

pattern of the cores and splits the storage of the occupancy

grid map into smaller memory banks so that all cores access

different banks every cycle. The core contains a feedback

architecture designed to handle the dependencies that are

introduced by the recursive nature of the FCMI algorithm [8],

so that all compute units within the core are fully utilized.

Each core is time-interleaved to compute the FCMI along

multiple rays in order to avoid stalls associated with the

feedback architecture.

The proposed hardware architecture is implemented with

16 cores on a Xilinx Zynq-7000 XC7Z045 FPGA. For a

robot equipped with a 60-ray range sensor utilizing an occu-

pancy grid map of size 201×201, our FPGA implementation

computes MI for the entire map in 1.55 ms (647 Hz, or

much faster than real time), which is 381×faster than an

optimized C++ implementation running on a ARM Cortex

A57 (embedded cellphone processor) and 71×faster than

an optimized CUDA implementation running on a NVIDIA

GTX 980 GPU. In addition, our FPGA implementation

consumes 1.7mJ of energy per MI map compute, which

is 1314×lower than ARM Cortex A57 and 2650×lower

than NVIDIA GTX 980 GPU. A CPU on a robot can

therefore ofﬂoad Mutual Information computation to such an

accelerator, while interfacing with the sensors and actuators

and performing motion planning. The extremely low latency

and energy consumption of our accelerator allows robots to

explore efﬁciently without relying on heuristic approxima-

tions even with the energy budget of a few Joules.

II. PRELIMINARIES

A. The FCMI Algorithm

Occupancy grid map [16] is a common probabilistic

representation of the 2D environment and used in several

well known autonomous exploration systems, such as the

frontier-exploration algorithm [17] and Shannon MI-based

mapping algorithms [18]–[22]. In an information theoretic

framework, MI is used to quantify the expected information

gain from potential range measurements in an unknown

environment. The unknown environment is represented by

an occupancy grid map, which is updated continuously

by the robot throughout an exploration trial. The FCMI

algorithm [8] computes the mutual information between the

map and potential range measurement rays emanating from

the sensors (e.g., LiDAR) on such robot.

Let Z={Zθ1, Zθ2. . .}represent range measurements

made by sensor rays with angles Θ = {θ1, θ2, . . .}emanating

from robot’s location x. Let ∆θdenote the angular resolution

of the sensor rays, which is constant across all rays. The

Shannon MI Ix(M;Z)between the measurements Zmade

at location xand map Mis deﬁned as

Ix(M;Z) = X

θ∈Θ

Ix(M;Zθ)∆θ.(1)

Note that the MI between a single range measurement Zθ

and map Mcan be written as

Ix(M;Zθ) = hx(Zθ)−hx(Zθ|M),(2)

where hx(Zθ)is the entropy of the range measurement,

and hx(Zθ|M)is the conditional entropy of the range

measurement given the map.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EfcientComputationofMap-scaleContinuousMutualInformationonChipinRealTimeKeshavGupta,PeterZhiXuanLi,SertacKaraman,VivienneSzeMassachusettsInstituteofTechnology,Cambridge,Massachusetts02139Emails:fkeshav21,peterli,sertac,szeg@mit.eduAbstractExplorationtasksareessentialtomanyemergingroboticsapplicati...

展开>> 收起<<

Efﬁcient Computation of Map-scale Continuous Mutual Information on Chip in Real Time Keshav Gupta Peter Zhi Xuan Li Sertac Karaman Vivienne Sze.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Efﬁcient Computation of Map-scale Continuous Mutual Information on Chip in Real Time Keshav Gupta Peter Zhi Xuan Li Sertac Karaman Vivienne Sze

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: