tool for perception. Sonar has a low signal-to-noise ratio,
low resolution, and often lacks 3D information. Moreover,
there are no (or poor) initial conditions relating the robots
in a multi-robot system without GPS.
In this work, we propose a system for distributed multi-
robot SLAM that uses an imaging sonar’s perceptual data
as a basis for state estimation, and we validate it over real-
world datasets. In addition to utilizing real perceptual data,
we do not provide any initial conditions relating the robots a
priori, and we do not rely upon inter-robot ranging, permit-
ting an exclusive reliance on indirect encounters between
robots. Lastly, we consider the bandwidth limitations of
wireless acoustic communications and design our system for
compatibility with realistic constraints (we note that beyond
bandwidth, there are additional limitations associated with
real-world acoustic comms [3]). Our key contributions are:
•The first underwater multi-robot SLAM system devel-
oped for use with real imaging sonar data.
•A high-performance pipeline for robustly identifying,
registering, and rejecting outliers of indirect encounters
using real sonar data with no initial conditions.
•A strategy to manage bandwidth usage, making real
world application with acoustic modems a possibility.
•A validation of our system on two real-world data-sets.
In the following sections, we discuss related work, mathe-
matically define our problem, present our system and results.
II. RELATED WORK
A. Underwater Multi-Robot SLAM
In the underwater space, inter-robot constraints generally
come in two forms [2]: direct encounters where robots
observe one another via acoustic ranging, and indirect en-
counters, where robots observe the same targets in the envi-
ronment and may derive inter-robot measurement constraints
relating one another. We note that direct encounters require
synchronized clocks, which may not be practical over long
periods without GPS clock corrections due to clock drift.
We first consider [4], where a mobile base station is used
to localize a team of low-cost vehicles lacking perceptual
sensors using an acoustic beacon. A similar concept is
considered in [5], except instead of a team of robots, a
leader-follower arrangement is used. [6] considers inter-
robot ranging without fixed acoustic beacons and performs
a simulation study comparing the use of fixed beacons to a
cooperative localization solution. [7] considers an algorithm
for processing inter-robot acoustic pulses in a distributed
manner. [1] proposes a pose-graph-based method for cooper-
ative localization of a team of robots using dead-reckoning,
GPS, and inter-robot ranging. The outcome of this work is a
system where a robot maintains an understanding of its state,
and the team’s state. Moreover, when GPS measurements
occur at the surface, their effect is shared across robots. [2]
integrates perception into the above, and commonly observed
features are shared and integrated into robot pose-graphs.
[2] assumes that robots share point-landmark observations
in their survey area, but the framework is only tested in
simulation. Further, due to GPS, each robot is effectively
localized in a common frame, enabling the sharing of range-
bearing measurements to commonly observed targets without
a need to solve the complex data association problem. Recall,
GPS may not always be available due to under-ice operations
or tactical situations (e.g., jamming). [8] considers the robot
map merging problem using only similarity in feature space,
in this case, ship hull curvature. While this work is able to
merge trajectories and lower the data transmission require-
ments between robots, it requires a highly descriptive feature
vector. It is only validated over a single dimension, whereas
we consider a 3DOF system.
There are some notable examples of underwater multi-
robot SLAM using cameras [9]–[12] and cameras with other
navigation sensors [13]. Cameras, however are not robust
in all water conditions. While these works are informative,
because we operate in turbid conditions with low visibility,
we do not consider them further.
Another area to consider is the concept of multi-session
SLAM, where a robot is provided with a prior SLAM run
and merges the map it builds with the prior run as the mission
progresses. This is examined for ship hull inspection [14],
[15], bathymetric mapping [16], ship-wreck reconstruction
[17] and environmental monitoring [18]. However, multi-
session does not account for communicating information
between robots, as it is a single agent system with prior
information. Additionally, it may be the case that the robot
has some knowledge of its location in the prior map, con-
straining the inter-robot loop closure search space.
In contrast to the above, we utilize no notion of an initial
guess relating robot reference frames. Second, we consider
the bandwidth limitations of real acoustic modem hardware
and take steps to manage network utilization. Lastly, we
implement a fully functioning system to detect and estimate
indirect encounters in sonar data: inter-robot loop closures.
B. Place Recognition With Sonar
Place recognition (loop closure) is fundamental for a
perception-driven multi-robot system. Place recognition has
been widely studied in LIDAR sensing [19]–[21]. These
works assemble a 2D bird’s eye view image of a 3D LIDAR
scan with coarse discretization to compare scenes. They also
derive a 1D descriptor to support scene search and retrieval.
Place recognition has also been studied with sonar-
equipped AUVs. Firstly, [22] considers building scene graphs
to compare scenes and evaluate rigid-body transformations
but requires at least fifteen objects in each scene to run.
Machine learning has also been used to support this task in
[23], [24]. However, few public sonar datasets exist, and this
is often a research area unto itself. Iterative closest point
(ICP) based loop closure is used in our prior work [25]
to support single-agent active SLAM. This work uses sonar
derived point clouds with ICP; when ICP provides a trans-
form between keyframes, we estimate overlap between the
point clouds. Point cloud overlap, then pairwise consistent
measurement set maximization (PCM) [26], are used to reject
outlier loop closures. Inliers are integrated into the graph-
based pose SLAM solution. We note the extension of the