1
Edge-based Monocular Thermal-Inertial Odometry
in Visually Degraded Environments
Yu Wang, Haoyao Chen*, Member, IEEE, Yufeng Liu, Shiwu Zhang, Member, IEEE
Abstract—State estimation in complex illumination environ-
ments based on conventional visual-inertial odometry is a chal-
lenging task due to the severe visual degradation of the visual
camera. The thermal infrared camera is capable of all-day
time and is less affected by illumination variation. However,
most existing visual data association algorithms are incompatible
because the thermal infrared data contains large noise and low
contrast. Motivated by the phenomenon that thermal radiation
varies most significantly at the edges of objects, the study
proposes an ETIO, which is the first edge-based monocular
thermal-inertial odometry for robust localization in visually
degraded environments. Instead of the raw image, we utilize
the binarized image from edge extraction for pose estimation
to overcome the poor thermal infrared image quality. Then, an
adaptive feature tracking strategy ADT-KLT is developed for
robust data association based on limited edge information and its
distance distribution. Finally, a pose graph optimization performs
real-time estimation over a sliding window of recent states by
combining IMU pre-integration with reprojection error of all
edge feature observations. We evaluated the performance of the
proposed system on public datasets and real-world experiments
and compared it against state-of-the-art methods. The proposed
ETIO was verified with the ability to enable accurate and robust
localization all-day time.
Index Terms—Thermal-Inertial Odometry, Edge information,
Visual Degradation.
I. INTRODUCTION
ACCURATE and robust state estimation in GNSS-denied
environments is an active research field due to its
wide applications in simultaneous localization and mapping
(SLAM), 3D reconstruction, and active exploration. The sen-
sor suit consisting of a monocular camera and IMU, which
provides complementary information, is the minimum solu-
tion for recovering the metric six degrees-of-freedom (DOF)
[1]. Considering that both camera and IMU are light-weight
and low-cost, monocular visual-inertial odometry (VIO) is a
common solution for localization and navigation [2]. Existing
VIO frameworks have been mature in stable environments.
However, the environments in disaster areas are uncertain
and prone to extreme light distribution, dynamic illumination
variation, or visual obscurants such as dust, fog, and smoke
[3]. Such visual degradation always reduces the reliability of
VIO estimation solutions.
This work was supported in part by the National Natural Science Foundation
of China under Grant U1713206 and U21A20119. (Corresponding author:
Haoyao Chen.)
Y. Wang, H. Chen* ,and Y. Liu are with the School of Mechanical
Engineering and Automation, Harbin Institute of Technology Shenzhen, P.R.
China, e-mail: hychen5@hit.edu.cn.
S. Zhang is with Department of Precision Mechinery and Precision Instru-
mentation, University of Science and Technology of China.
Thermal infrared camera, operating in the longwave infrared
spectrum and capturing thermal-radiometric information, has
attracted more attention in recent years. Compared with the
visual camera, thermal infrared cameras exhibit evident advan-
tages when applied to disaster areas for their all-day percep-
tual capability [4]. However, using thermal infrared cameras
directly to existing VIO frameworks is challenging for the
following reasons. First, the captured image data are typically
low contrast [5]. Second, many vision-observable information-
rich textures, such as colors and streaks, are lost in thermal
images due to the indistinguishability of thermal radiation
from surrounding regions. Lastly, nonuniformity correction
(NUC) or flat-field correction (FFC) is performed during
thermal infrared camera operation to eliminate accumulated
nonzero-mean noise [6]. Such blackout not only introduces
periods of data interruption but may also significantly change
image appearance between consecutive frames.
The current thermal-inertial odometry (TIO) solutions are
mainly improved from normal VIO. Feature-based thermal
odometry that requires special contrast enhancement on in-
frared images for feature extraction was developed [7]–[9].
However, preprocessing will induce additional noise, resulting
in wrong correspondences. 14-bit or 16-bit full radiometric
data from a thermal infrared camera was directly utilized for
motion estimation [3], [10] to avoid a significant change in
image appearance resulting from rescaling operation. How-
ever, their approaches require enabling NUC in long-term
applications to address the temperature drift problem and are
not directly compatible with the 8-bit image. By selecting the
most reliable modality through several metrics, ROVTIO [11]
fuses asynchronous thermal, visual and inertial measurements
for estimating the odometry, which leads the system to au-
tonomous switch between the different modalities according
to the environmental conditions.
With the development of deep learning, the neural network
is introduced into pose estimation from thermal infrared im-
ages. TP-TIO [12], which utilizes CNN for feature detection
and IMU-aided full radiometric-based KLT method for feature
tracking, is the first tightly coupled deep thermal-inertial
odometry algorithm. Combining the hallucination network
with a selective fusion mechanism, Saputra et al. [13] pro-
posed a deep neural odometry architecture for pose regression
named DeepTIO, which introduced an end-to-end scheme.
Based on DeepTIO, Saputra et al. [14] recently presented
a complete thermal-inertial SLAM system, including neural
abstraction, graph-based optimization, and a global descriptor-
based neural loop closure detection. Combining the advantage
of conventional and learning-based methods, Jiang et al. [15]
arXiv:2210.10033v2 [cs.RO] 22 Oct 2022