TRADE Object Tracking with 3D Trajectory and Ground Depth Estimates for UA Vs Pedro F. Proença Patrick Spieler Robert A. Hewitt Jeff Delaune

2025-05-06 0 0 4.84MB 7 页 10玖币

侵权投诉

TRADE: Object Tracking with 3D Trajectory and Ground Depth

Estimates for UAVs

Pedro F. Proença∗, Patrick Spieler∗, Robert A. Hewitt∗, Jeff Delaune∗

Abstract— We propose TRADE for robust tracking and 3D

localization of a moving target in cluttered environments,

from UAVs equipped with a single camera. Ultimately TRADE

enables 3d-aware target following.

Tracking-by-detection approaches are vulnerable to target

switching, especially between similar objects. Thus, TRADE

predicts and incorporates the target 3D trajectory to select

the right target from the tracker’s response map. Unlike static

environments, depth estimation of a moving target from a

single camera is a ill-posed problem. Therefore we propose

a novel 3D localization method for ground targets on complex

terrain. It reasons about scene geometry by combining ground

plane segmentation, depth-from-motion and single-image depth

estimation. The beneﬁts of using TRADE are demonstrated as

tracking robustness and depth accuracy on several dynamic

scenes simulated in this work. Additionally, we demonstrate

autonomous target following using a thermal camera by run-

ning TRADE on a quadcopter’s board computer.

I. INTRODUCTION

Object tracking and 3D localization from a UAV has

several applications (e.g. defense, disaster response, wildlife

monitoring) involving target following, which already have

some commercial solutions for consumer drones (e.g. Sky-

dio, DJI) relying on GPS beacons, stereo cameras and

visual object trackers. However persistent tracking and 3D

localization of a non-cooperative target from a single camera

remains a challenging problem.

In terms of persistent tracking, Tracking-by-detection has

become the dominant paradigm [1,2] in generic visual

object tracking thanks to learned discriminative and efﬁcient

models. Despite its success, this approach alone leads to

target switching especially between similar objects (as shown

in Fig. 1). This is mainly due to the absence of an object

motion model. In this work, we propose to predict the 3D

trajectory of a ground object with visual-inertial odometry

(VIO) to prevent target switching by selecting the peak from

the tracker’s correlation ﬁlter response map closer to the

predicted location.

Unlike static environments, estimating the depth of a

moving object from a single camera is a ill-posed problem

without knowledge of the object’s motion. Thus we propose

a solution for ground targets by combining single-image

depth estimation with depth estimates from camera motion.

While the former is used to obtain dense depth of the

moving target relative to its surroundings, the latter provides

sparse accurate depth measurements from the terrain. Our

localization method can then infer the ground plane from the

∗Jet Propulsion Laboratory, California Institute of Technology,

Pasadena, CA, USA

Fig. 1: Top: Drone following autonomously a remote-

controlled car using a thermal camera with onboard TRADE

https://youtu.be/QUzky1LFqpY.Bottom: Two scenes from

our synthetic dataset showing the beneﬁt of our approach.

Bottom-Left: The tracker’s correlation ﬁlter response [1]

shows two similar peaks, which lead to target switching

between cars. Our approach uses the trajectory predictions

(seen in magenta) to select the right peak. Bottom-Right:

UAV tracking a truck from a rooftop. Our ground plane

segmentation (shown in green) allows us to reject features

from the building top to estimate the correct ground plane.

Tracked features are color-coded based on depth. For more

details refer to: https://youtu.be/MGPK65gm9GI

scene geometry (i.e. ground plane segmentation) to raycast

the object’s depth. Moreover, a temporal plane fusion step

is proposed to account for temporally covered or textureless

ground and missing depth-from-motion due to hovering.

Our contributions are the following:

•A novel 3D localization method for a dynamic ground

object that is robust to high terrain relief.

•Couple object 3D trajectory forecasting, and camera

pose with a Discriminative Correlation Filter tracker to

avoid target switching.

•A photorealistic tracking dataset for UAV with ground-

truth depth and poses and an extensive evaluation.

•Demonstrate real-time UAV target following using

TRADE onboard.

arXiv:2210.03270v1 [cs.RO] 7 Oct 2022

Visual-Inertial

Odometry

Trajectory

Prediction

Visual Object

Tracker

Object Localization

ROI

Feature

Tracking

Ground

Plane

Mask

Robust

Plane

Fitting

𝛽

ℎ

෤𝑥

Single

Image

Depth

Temporal

fusion

Fig. 2: Overview of our object tracking and 3D localization

system.

II. RELATED WORK

Recently, in generic visual object tracking, several

learning-based efforts have been made to address the problem

of target switching [3,4]. In Multi-Object Tracking [5,6],

this problem is formulated as a data association problem

which is commonly addressed by using a constant-velocity

Kalman Filter and a Hungarian algorithm. However this

Kalman Filter works on the image space, where optical ﬂow

is non-linear and it assumes a static camera. A camera motion

compensation is used in [6] based on image registration and

in [7] based on homography warping, which assumes the

homography is estimated using the object’s ground plane. In

[8], target 3D trajectory is modeled in a SLAM factor graph

but it relies on a stereo-camera. In [9,10] trajectory models

are learned for human motion using LSTMs.

Object 3D localization from a UAV has been addressed

using GPS receivers [11], laser range ﬁnder [12], georefer-

enced topographic maps [13] and ﬂat earth assumption [14].

There has been extensive work [15–17] using ground plane

estimation for 3D object localization. The most similar to

our work is [15], which uses depth estimates from Visual

Odometry and a barometer to estimate the plane normals

and height but this also assumes the scene is planar. In

terms of monocular object 3D localization from the ground,

[18] proposes to estimate 3D car poses by combining 2D

bounding boxes, orientation regression and the object di-

mensions. Single-image depth networks [19,20] have been

demonstrating compelling results on several datasets (e.g

KITTI). In this work, we investigate how these generalize

to aerial downward-looking cameras.

III. SYSTEM OVERVIEW

Our system pipeline is shown in Fig. 2. Firstly, a Discrim-

inative Correlation Filter (DCF) tracker is initialized as usual

with a bounding box on an initial frame. The bounding box is

then used to initialize the ROI Feature Tracking by detecting

Harris corners within a Region of Interest (ROI) surrounding

the bounding box, which is excluded from the ROI. The

ROI is then shifted as the bounding box moves through

tracking. Using this dedicated feature tracking module allows

to maintain a dense distribution of features around the object,

without adding overhead to the VIO.

For every frame, depth is estimated and reﬁned for the ROI

tracks given the camera poses from VIO. These tracks can

be backprojected to a point cloud and ﬁt a plane. However,

since not all tracks are from the object’s ground plane, we

ﬁrst select tracks based on our ground plane segmentation

which relies on a single image depth model to provide dense

depth for both the target and the ROI. However since this is

only relative depth, we use the ROI feature depth estimates

to effectively scale it.

Given the resulting ground plane mask, the selected ROI

tracks with 3D coordinates are used in a RANSAC multi-

plane ﬁtting routine. Since the ground plane segmentation

can fail and the ROI features may not be enough, we use a

temporally-fused plane model, which aggregates the inlier

points from the last RANSAC plane ﬁtting in a buffer

together with inliers from past frames. The temporal fusion

also includes a gating strategy to enforce temporal consis-

tency. The aggregated points are used in a RANSAC multi-

plane ﬁtting loop once again to estimate the ﬁnal plane. Then,

given the target image coordinates from the DCF tracker and

the camera pose we can raycast the 3D location. This is then

used to update the trajectory model, whose predictions for the

next frames are used to guide the DCF Tracker, as described

in the next section. The remaining sections provide more

details for each module of our pipeline.

IV. TRACKING WITH TRAJECTORY ESTIMATES

Visual object trackers generally output a 2D score map

(shown in Fig. 1) that maps to locations in an image

search window around the previous target location. Then the

location with highest score is simply selected as the new

target location.

Instead, we ﬁrst center the search window around the

location predicted by our trajectory model which is projected

to the current image using the camera pose. We then perform

peak selection on the score map: First we normalize the score

map with a softmax function, then, using Non-Maximum

suppression, we select as location candidates the peaks

within a certain fraction of the maximum peak and take the

peak that is closer to the search window origin.

As a trajectory model, We use a linear Kalman ﬁlter to

estimate the state {p, v, a}, respectively the object absolute

3D location, velocity and acceleration. To prevent unbounded

motion during temporary tracking loss, we use a damping

factor both in velocity and acceleration instead of a constant

model in the state transition. The state is updated using

only the 3D location observation residuals. The process and

measurement noise was set empirically.

V. ROBUST OBJECT 3D LOCALIZATION

Our object localization is based on the projection of the

object bounding box center on the ground plane. However

as illustrated in Fig. 3.a, camera off-nadir βleads to a

lateral error ˜x=htan βwhere his the height at which

the ray intersects the object. To reduce this error we lift the

ground plane by an estimate of half the object height before

raycasting the depth. The next subsections cover all modules

of our localization approach.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TRADE:ObjectTrackingwith3DTrajectoryandGroundDepthEstimatesforUAVsPedroF.Proença,PatrickSpieler,RobertA.Hewitt,JeffDelauneAbstractWeproposeTRADEforrobusttrackingand3Dlocalizationofamovingtargetinclutteredenvironments,fromUAVsequippedwithasinglecamera.UltimatelyTRADEenables3d-awaretargetfollowin...

展开>> 收起<<

TRADE Object Tracking with 3D Trajectory and Ground Depth Estimates for UA Vs Pedro F. Proença Patrick Spieler Robert A. Hewitt Jeff Delaune.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

TRADE Object Tracking with 3D Trajectory and Ground Depth Estimates for UA Vs Pedro F. Proença Patrick Spieler Robert A. Hewitt Jeff Delaune

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: