
principle, global localization in triangle meshes can be done
using particle filters [14]. Chen et al. introduced a Monte
Carlo Localization (MCL) method for meshes, which utilizes
LiDARs and OpenGL-simulated range images for importance
sampling [15]. However, particle filter-based approaches
produce high computational loads to track the robot’s pose.
To overcome this issue, relative localization methods such as
iterative closest point (ICP) are often used to track the robot’s
pose [4], [5], [21]–[23]. All of them register a measured
point cloud to a reference point cloud. Several publications,
spanning from computer graphics to robotics research, have
presented ICP-like methods that can register point clouds to
reference meshes instead [11], [24]–[28].
In 2019, Mejia-Parra et al. solved point cloud to mesh reg-
istration with a novel approach of finding the closest surface
point for each point cloud point using hash grids [27]. They
evaluated their approach by registering point clouds to single
meshes of closed objects. However, dealing with meshes of
large environments, as in [15], introduces challenges beyond
the scope of their methodology. For example, triangle mesh
maps of interiors usually consist of walls with two parallel
surfaces. If the initial pose estimate deviates just a little from
the optimum, this would lead to matches between wrong
wall surfaces. In addition, large meshes would drastically
increase the size of the hash table, making it impossible to
use on mobile robots. In 2021, Vizzo et al. presented an
approach for mesh-based SLAM named PUMA [11]. They
generated a mesh from a scan and registered the following
scan to it by using ray casting correspondences (RCC). Their
correspondences seem to be suitable to solve the mentioned
issues of [27]. However, real-time capability was not even
achieved for fractions of the entire mesh map, making it
unsuitable for online operation on autonomous vehicles.
With MICP-L, we address the problems of [11] and [27] by
introducing a light-weight method to find RC correspondences
and optimize the pose estimation by utilizing NVIDIA’s RTX
hardware for acceleration. We reach more than real-time
localization even in large-scale mesh maps and overcome
convergence issues of classical point cloud-based ICP [29].
III. MICP-L
MICP-L continuously registers range data acquired from
one or multiple range sensors to a triangle mesh map, starting
at an initial pose estimate. The general workflow can be
summarized as follows:
1)
Find ray casting (RC) or closest point (CP) correspon-
dences,
2)
Estimate the optimal transformation parameters
∆T
through covariance reduction and SVD using either
P2P or P2L as error measure,
3) Apply ∆Tto the pose guess and repeat step 1).
A. Closest Point Correspondences
For point-cloud maps, locating the nearest model point to
a dataset point is typically sped up by pre-building kd-trees
or hash grids over the model. In mesh maps, the task differs
slightly as finding the closest point on a mesh surface may
Room 2
Room 1
Room 2
Room 1
RCCCPC
Fig. 2: A 2D map consists of two rooms. Left: Conventional closest
point correspondences fail since they match on the wall’s surface
of the wrong room. Right: RCCs overcome these problems, by first
ray casting along the real measurement’s direction (red line) to find
the next intersection with the map (green circle). For point-to-plane
(P2L) optimization, the actual correspondence is the projection onto
the plane (dark blue arrow).
not necessarily involve vertices but requires searching the
nearest point on each face. We use the BVH implementation
of Embree [30] to accelerate this operation.
B. Ray Casting Correspondences
Dealing with meshes allows us to use specialized RTX
chips on the NVidia GPUs to efficiently compute ray-to-mesh
intersections. We use this technique to accelerate finding so-
called ray casting correspondences (RCC), introduced with
PUMA [11]. For computing RCCs, we first convert each
measurement
i
of a range sensor into a ray representation,
consisting of origin, direction, and range. Next, we use the
given pose estimate to trace virtual rays along the paths
that the real sensor would scan. The intersection with the
map represents the actual point that would be measured if
the sensor were at the estimated pose. We then project the
Cartesian point of the real scan measurement onto the plane
of intersection to determine the map correspondence. These
RCCs allow matching scan points only to the surface closest
to the range sensor, as shown in Fig. 2. This reduces the
likelihood of drifting into other rooms during registration as
shown in the experiments, subsequently.
We use Rmagine, a library to flexibly build ray tracing-
based sensor simulations [3], for all ray casts. It decides at
run time whether to compute the RCC on CPU [30] or on
a NVIDIA GPU and, if available, utilize the RTX units for
hardware-acceleration [31].
C. Correction
Finding RC or CP correspondences returns a list of scan
points
D
and map points
M
, where the
i
-th scan point
di
corresponds to the
i
-th map point
mi
. All points are located
in the same coordinate system, e.g., the robots base frame.
Next, we search for the transformation parameters
∆T
,
consisting of a rotation
∆R
and a translation
∆t
, that best
fit the scan points (dataset)
D
to the map points (model)
M
.
This can be done by minimizing the following equation:
E(∆R, ∆t) = 1
n
n
X
i=1
∥mi−(∆Rdi+ ∆t)∥2
2(1)
Our solver is based on Umeyama’s method [32] and enhances
the computation of the covariance matrix for both CPU and