Uncertainty-based Meta-Reinforcement Learning for Robust Radar Tracking 1stJulius Ott

2025-05-06 0 0 2.02MB 8 页 10玖币
侵权投诉
Uncertainty-based Meta-Reinforcement Learning for
Robust Radar Tracking
1st Julius Ott
Infineon Technologies AG
julius.ott@infineon.com
2nd Lorenzo Servadei
Infineon Technologies AG
lorenzo.servadei@infineon.com
3rd Gianfranco Mauro
Infineon Technologies AG
gianfranco.mauro@infineon.com
4th Thomas Stadelmayer
Infineon Technologies AG
thomas.stadelmayer@infineon.com
5th Avik Santra
Infineon Technologies AG
avik.santra@infineon.com
6th Robert Wille
Technichal University of Munich
robert.wille@tum.de
Abstract—Nowadays, Deep Learning (DL) methods often over-
come the limitations of traditional signal processing approaches.
Nevertheless, DL methods are barely applied in real-life applica-
tions. This is mainly due to limited robustness and distributional
shift between training and test data. To this end, recent work
has proposed uncertainty mechanisms to increase their reliability.
Besides, meta-learning aims at improving the generalization
capability of DL models. By taking advantage of that, this paper
proposes an uncertainty-based Meta-Reinforcement Learning
(Meta-RL) approach with Out-of-Distribution (OOD) detection.
The presented method performs a given task in unseen envi-
ronments and provides information about its complexity. This
is done by determining first and second-order statistics on the
estimated reward. Using information about its complexity, the
proposed algorithm is able to point out when tracking is reliable.
To evaluate the proposed method, we benchmark it on a radar-
tracking dataset. There, we show that our method outperforms
related Meta-RL approaches on unseen tracking scenarios in
peak performance by 16% and the baseline by 35% while
detecting OOD data with an F1-Score of 72%. This shows that
our method is robust to environmental changes and reliably
detects OOD scenarios.
Index Terms—Radar Sensors, Reinforcement Learning, Meta
Learning, Out-of Distribution Detection
I. INTRODUCTION
Radar sensors are gaining momentum in the modern semi-
conductor industry. Various modulation types, independence
of light conditions, low-cost and privacy-friendly features
lead the radar technology to be successfully employable in
applications such as people detection and object tracking [1].
To keep up the pace of advancements, attention directs to
applications like multi-person tracking, which is critical in
several areas such as automotive safety, medical services, or
logistics [2], [3]. Multi-person tracking attempts to estimate
the position of each target in a scene. For estimating the
track positions, the Unscented Kalman Filter (UKF) is often
used in radar tasks [4]. This method has the scope of a
bayesian estimation of the target position incorporating a
nonlinear dynamic movement model. The nonlinear transition
is approximated by the unscented transform, described in
[5]. By design, the UKF relies on hyperparameters. These
tunable hyperparameters describe the dynamics of the sensor
system and real-world scenarios. Particularly for radar data,
occlusions, non-human disturbances, and limited resolution are
significant challenges for robust tracking. Thus, the choice of
optimal hyperparameters varies with the environment, and of-
ten their initial choice is suboptimal. Given the environment’s
variety of settings and noise, finding the best hyperparameters
for any possible scenario is infeasible.
Recent work focused on estimating the underlying dynamics
of a UKF with specific neural networks [6] or using Reinforce-
ment Learning (RL) to provide the best hyperparameters for
a given scene, as shown in [7], [8]. Although this approach is
promising, it often lacks robustness, and the data distribution
on inference time might be different from the one used for
training. Furthermore, the UKF model might fail to track in
overcrowded scenarios, given the capability of its underlying
model.
In real-life applications such as automotive radar sensing,
robustness also has to be critical, and then Deep Reinforcement
Learning (DRL) is employed. However, neural networks are
known to fail when the test data distribution is far off the
training data distribution [9]. As a consequence, meta-learning
is often used in literature to close the gap between training and
test data distributions [10] and aims to adapt quickly to novel
tasks. Although model-based meta-learning algorithms have
obtained excellent results, they are limited to the network de-
sign stage [11]. For this reason, research has recently focused
on model-agnostic meta-learning, which enables learning tasks
independently of the machine learning model. This is granted
through task-specific optimization and is widely applicable in
domains such as few-shot classification and RL. In the case of
Meta-RL, context variables are a promising way to incorporate
task-specific information, as shown in [12]. In this method,
the context variable is learned by a neural network from task-
specific data. Although the method is performant, it requires
computing a context variable using multiple data samples from
each additional task. Storing this data during inference is inef-
ficient. Hence, our proposed method uses domain information
to formulate a context variable. Our approach does not need
to store data: it computes the context variable from easy-
to-obtain input distribution statistics, which we refer to as
arXiv:2210.14532v1 [cs.LG] 26 Oct 2022
context prior. Furthermore, as shown in the experiments, our
method leads to an improved domain generalization compared
to other state-of-the-art approaches. Nevertheless, in several
real-world applications (e.g., radar-tracking), improved domain
generalization cannot address the limitations inherent to the
task itself. For example, tracking operations in small, crowded
scenarios with obstacles becomes increasingly tricky. In order
to assess the reliability of our Meta-RL method, we develop
an uncertainty mechanism via bootstrapped networks. The
uncertainty mechanism is combined with the context prior that
encodes information about the task difficulty. In this approach,
scenes where tracking is prone to failure, are classified as
OOD, thus assessing the particular reliability of the tracker
in the current scenario.
As a summary, the contributions of the paper are the follow-
ings:
1) Meta-RL for domain generalization without additional
memory footprint using context priors
2) Enhanced OOD detection with context priors that encode
task difficulty
The remainder of the paper proceeds as follows: in Section II,
we introduce the radar-tracking problem and how to tackle it
with RL. Afterward, we explain the specific signal processing
in Section III. In the same section, we show how the input data
distribution is used to compute an informative context variable.
At the end of this section, using the context variable, we
propose a Meta-RL algorithm for environment generalization
and detection of OOD scenarios. Finally, we evaluate our
method on a multi-target radar tracking dataset against related
Meta-RL methods in Section IV. Our proposed approach
outperforms comparable Meta-RL approaches in terms of peak
performance by 16% on the test scenarios and the baseline of
fixed parameters by 35%. In the same way, it detects OOD
scenarios with an F1-score of 72%. Thus, our approach is
more robust to environmental changes and reliably detects
OOD scenarios.
In Section V, we summarize our results and give an outlook
on future work in Section.
II. BACKGROUND AND MOTIVATION
In this section, we review the background and related work.
In Section II-A, we first outline the principle of radar tracking.
Afterward, we explain how RL can be used to optimize radar
tracking. Additionally, we extend this concept by introducing
the fundamentals of Meta-RL and Uncertainty-based RL.
A. Radar Tracking
Frequency Modulated Continuous Wave (FMCW) radars
can estimate the range, Angle of Arrival (AoA), and velocity of
targets. In the case of radar tracking, we use the range and AoA
to determine the target position. The typical radar tracking
pipeline can be divided into signal processing, detection, clus-
tering, and tracking, as shown in [13]. A high level description
is given in Figure 1. The signal processing stage elaborates the
sensor data from each radar antenna to estimate the reflected
signal’s range and angle. The resulting image is a so-called
Range-Angle Image (RAI). Afterward, the RAI is convolved
with a window that determines the signal threshold based
on the surrounding noise. Usually, a Constant False Alarm
Rate (CFAR) algorithm [14] or a variation thereof defines
the threshold. A clustering algorithm groups nearby detected
signals, and the respective cluster means are input to the track-
ing stage. In this part of the pipeline, the track management
determines whether to assign the measurement to a track, open
a new track, discard the measurement or delete non-active
tracks. Before updating the track, the measurement has to be
filtered by the tracking filter based on the last position and an
underlying movement model. The UKF is a commonly used
tracking filter [15]. The presented tracking pipeline heavily
angle
range
Target Proposals
Hyperparameters
Signal
Processing Detection Clustering
Track
Management
Track Filter
Radar Data
Fig. 1: High-level description of a Radar Tracking Pipeline.
relies on hyperparameters. Namely, the tracking performance
depends on the gating threshold for assigning tracks and the
covariance matrix of the measurement and state transition
models. Typically, those hyperparameters are determined by
an expert user with recorded data and ground truth positions
evaluating the Normalized Estimation Error Squared (NEES).
However, this approach is unlikely to perform well once the
radar is deployed in a different environment. Thus, recent work
proposed to use RL to tackle the combinatorial problem of
finding the best set of parameters for any scenario [16].
B. Reinforcement Learning
In RL the problem is formalized as a Markov Decision
Process (MDP) by (S, A, R, p, γ), where Sis the state space
as radar sensor input, Ais the action space defined as hyper-
parameters, Ris the reward as tracking performance shown
in [16], pπis the unknown transition probability between
states following policy πand γis the discount factor. Let
τ= (st, at, rt, st+1)define the transition from state sat time
step tto the next state st+1 following action atwith reward rt.
In traditional RL the goal is to maximize the sum of expected
rewards X
t
E(st,at)pπ[r(st,at)].(1)
This can be achieved by value iteration methods [17]. There,
we define a Q-Value Q(st, at)for each state and action pair
that estimates the expected reward. Afterward, for each state
摘要:

Uncertainty-basedMeta-ReinforcementLearningforRobustRadarTracking1stJuliusOttInneonTechnologiesAGjulius.ott@inneon.com2ndLorenzoServadeiInneonTechnologiesAGlorenzo.servadei@inneon.com3rdGianfrancoMauroInneonTechnologiesAGgianfranco.mauro@inneon.com4thThomasStadelmayerInneonTechnologiesAGthoma...

展开>> 收起<<
Uncertainty-based Meta-Reinforcement Learning for Robust Radar Tracking 1stJulius Ott.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:8 页 大小:2.02MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注