TRADE Object Tracking with 3D Trajectory and Ground Depth Estimates for UA Vs Pedro F. Proença Patrick Spieler Robert A. Hewitt Jeff Delaune

2025-05-06 0 0 4.84MB 7 页 10玖币
侵权投诉
TRADE: Object Tracking with 3D Trajectory and Ground Depth
Estimates for UAVs
Pedro F. Proença, Patrick Spieler, Robert A. Hewitt, Jeff Delaune
Abstract We propose TRADE for robust tracking and 3D
localization of a moving target in cluttered environments,
from UAVs equipped with a single camera. Ultimately TRADE
enables 3d-aware target following.
Tracking-by-detection approaches are vulnerable to target
switching, especially between similar objects. Thus, TRADE
predicts and incorporates the target 3D trajectory to select
the right target from the tracker’s response map. Unlike static
environments, depth estimation of a moving target from a
single camera is a ill-posed problem. Therefore we propose
a novel 3D localization method for ground targets on complex
terrain. It reasons about scene geometry by combining ground
plane segmentation, depth-from-motion and single-image depth
estimation. The benefits of using TRADE are demonstrated as
tracking robustness and depth accuracy on several dynamic
scenes simulated in this work. Additionally, we demonstrate
autonomous target following using a thermal camera by run-
ning TRADE on a quadcopter’s board computer.
I. INTRODUCTION
Object tracking and 3D localization from a UAV has
several applications (e.g. defense, disaster response, wildlife
monitoring) involving target following, which already have
some commercial solutions for consumer drones (e.g. Sky-
dio, DJI) relying on GPS beacons, stereo cameras and
visual object trackers. However persistent tracking and 3D
localization of a non-cooperative target from a single camera
remains a challenging problem.
In terms of persistent tracking, Tracking-by-detection has
become the dominant paradigm [1,2] in generic visual
object tracking thanks to learned discriminative and efficient
models. Despite its success, this approach alone leads to
target switching especially between similar objects (as shown
in Fig. 1). This is mainly due to the absence of an object
motion model. In this work, we propose to predict the 3D
trajectory of a ground object with visual-inertial odometry
(VIO) to prevent target switching by selecting the peak from
the tracker’s correlation filter response map closer to the
predicted location.
Unlike static environments, estimating the depth of a
moving object from a single camera is a ill-posed problem
without knowledge of the object’s motion. Thus we propose
a solution for ground targets by combining single-image
depth estimation with depth estimates from camera motion.
While the former is used to obtain dense depth of the
moving target relative to its surroundings, the latter provides
sparse accurate depth measurements from the terrain. Our
localization method can then infer the ground plane from the
Jet Propulsion Laboratory, California Institute of Technology,
Pasadena, CA, USA
Fig. 1: Top: Drone following autonomously a remote-
controlled car using a thermal camera with onboard TRADE
https://youtu.be/QUzky1LFqpY.Bottom: Two scenes from
our synthetic dataset showing the benefit of our approach.
Bottom-Left: The tracker’s correlation filter response [1]
shows two similar peaks, which lead to target switching
between cars. Our approach uses the trajectory predictions
(seen in magenta) to select the right peak. Bottom-Right:
UAV tracking a truck from a rooftop. Our ground plane
segmentation (shown in green) allows us to reject features
from the building top to estimate the correct ground plane.
Tracked features are color-coded based on depth. For more
details refer to: https://youtu.be/MGPK65gm9GI
scene geometry (i.e. ground plane segmentation) to raycast
the object’s depth. Moreover, a temporal plane fusion step
is proposed to account for temporally covered or textureless
ground and missing depth-from-motion due to hovering.
Our contributions are the following:
A novel 3D localization method for a dynamic ground
object that is robust to high terrain relief.
Couple object 3D trajectory forecasting, and camera
pose with a Discriminative Correlation Filter tracker to
avoid target switching.
A photorealistic tracking dataset for UAV with ground-
truth depth and poses and an extensive evaluation.
Demonstrate real-time UAV target following using
TRADE onboard.
arXiv:2210.03270v1 [cs.RO] 7 Oct 2022
Visual-Inertial
Odometry
Trajectory
Prediction
Visual Object
Tracker
Object Localization
ROI
Feature
Tracking
Ground
Plane
Mask
Robust
Plane
Fitting
1
6
1
1
1
1
1
1
1
1
1
8
1
𝛽
𝑥
Single
Image
Depth
Temporal
fusion
Fig. 2: Overview of our object tracking and 3D localization
system.
II. RELATED WORK
Recently, in generic visual object tracking, several
learning-based efforts have been made to address the problem
of target switching [3,4]. In Multi-Object Tracking [5,6],
this problem is formulated as a data association problem
which is commonly addressed by using a constant-velocity
Kalman Filter and a Hungarian algorithm. However this
Kalman Filter works on the image space, where optical flow
is non-linear and it assumes a static camera. A camera motion
compensation is used in [6] based on image registration and
in [7] based on homography warping, which assumes the
homography is estimated using the object’s ground plane. In
[8], target 3D trajectory is modeled in a SLAM factor graph
but it relies on a stereo-camera. In [9,10] trajectory models
are learned for human motion using LSTMs.
Object 3D localization from a UAV has been addressed
using GPS receivers [11], laser range finder [12], georefer-
enced topographic maps [13] and flat earth assumption [14].
There has been extensive work [1517] using ground plane
estimation for 3D object localization. The most similar to
our work is [15], which uses depth estimates from Visual
Odometry and a barometer to estimate the plane normals
and height but this also assumes the scene is planar. In
terms of monocular object 3D localization from the ground,
[18] proposes to estimate 3D car poses by combining 2D
bounding boxes, orientation regression and the object di-
mensions. Single-image depth networks [19,20] have been
demonstrating compelling results on several datasets (e.g
KITTI). In this work, we investigate how these generalize
to aerial downward-looking cameras.
III. SYSTEM OVERVIEW
Our system pipeline is shown in Fig. 2. Firstly, a Discrim-
inative Correlation Filter (DCF) tracker is initialized as usual
with a bounding box on an initial frame. The bounding box is
then used to initialize the ROI Feature Tracking by detecting
Harris corners within a Region of Interest (ROI) surrounding
the bounding box, which is excluded from the ROI. The
ROI is then shifted as the bounding box moves through
tracking. Using this dedicated feature tracking module allows
to maintain a dense distribution of features around the object,
without adding overhead to the VIO.
For every frame, depth is estimated and refined for the ROI
tracks given the camera poses from VIO. These tracks can
be backprojected to a point cloud and fit a plane. However,
since not all tracks are from the object’s ground plane, we
first select tracks based on our ground plane segmentation
which relies on a single image depth model to provide dense
depth for both the target and the ROI. However since this is
only relative depth, we use the ROI feature depth estimates
to effectively scale it.
Given the resulting ground plane mask, the selected ROI
tracks with 3D coordinates are used in a RANSAC multi-
plane fitting routine. Since the ground plane segmentation
can fail and the ROI features may not be enough, we use a
temporally-fused plane model, which aggregates the inlier
points from the last RANSAC plane fitting in a buffer
together with inliers from past frames. The temporal fusion
also includes a gating strategy to enforce temporal consis-
tency. The aggregated points are used in a RANSAC multi-
plane fitting loop once again to estimate the final plane. Then,
given the target image coordinates from the DCF tracker and
the camera pose we can raycast the 3D location. This is then
used to update the trajectory model, whose predictions for the
next frames are used to guide the DCF Tracker, as described
in the next section. The remaining sections provide more
details for each module of our pipeline.
IV. TRACKING WITH TRAJECTORY ESTIMATES
Visual object trackers generally output a 2D score map
(shown in Fig. 1) that maps to locations in an image
search window around the previous target location. Then the
location with highest score is simply selected as the new
target location.
Instead, we first center the search window around the
location predicted by our trajectory model which is projected
to the current image using the camera pose. We then perform
peak selection on the score map: First we normalize the score
map with a softmax function, then, using Non-Maximum
suppression, we select as location candidates the peaks
within a certain fraction of the maximum peak and take the
peak that is closer to the search window origin.
As a trajectory model, We use a linear Kalman filter to
estimate the state {p, v, a}, respectively the object absolute
3D location, velocity and acceleration. To prevent unbounded
motion during temporary tracking loss, we use a damping
factor both in velocity and acceleration instead of a constant
model in the state transition. The state is updated using
only the 3D location observation residuals. The process and
measurement noise was set empirically.
V. ROBUST OBJECT 3D LOCALIZATION
Our object localization is based on the projection of the
object bounding box center on the ground plane. However
as illustrated in Fig. 3.a, camera off-nadir βleads to a
lateral error ˜x=htan βwhere his the height at which
the ray intersects the object. To reduce this error we lift the
ground plane by an estimate of half the object height before
raycasting the depth. The next subsections cover all modules
of our localization approach.
摘要:

TRADE:ObjectTrackingwith3DTrajectoryandGroundDepthEstimatesforUAVsPedroF.Proença,PatrickSpieler,RobertA.Hewitt,JeffDelauneAbstract—WeproposeTRADEforrobusttrackingand3Dlocalizationofamovingtargetinclutteredenvironments,fromUAVsequippedwithasinglecamera.UltimatelyTRADEenables3d-awaretargetfollowin...

展开>> 收起<<
TRADE Object Tracking with 3D Trajectory and Ground Depth Estimates for UA Vs Pedro F. Proença Patrick Spieler Robert A. Hewitt Jeff Delaune.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:4.84MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注