Uncertainty-based Meta-Reinforcement Learning for Robust Radar Tracking 1stJulius Ott

2025-05-06 0 0 2.02MB 8 页 10玖币

侵权投诉

Uncertainty-based Meta-Reinforcement Learning for

Robust Radar Tracking

1st Julius Ott

Inﬁneon Technologies AG

julius.ott@inﬁneon.com

2nd Lorenzo Servadei

Inﬁneon Technologies AG

lorenzo.servadei@inﬁneon.com

3rd Gianfranco Mauro

Inﬁneon Technologies AG

gianfranco.mauro@inﬁneon.com

4th Thomas Stadelmayer

Inﬁneon Technologies AG

thomas.stadelmayer@inﬁneon.com

5th Avik Santra

Inﬁneon Technologies AG

avik.santra@inﬁneon.com

6th Robert Wille

Technichal University of Munich

robert.wille@tum.de

Abstract—Nowadays, Deep Learning (DL) methods often over-

come the limitations of traditional signal processing approaches.

Nevertheless, DL methods are barely applied in real-life applica-

tions. This is mainly due to limited robustness and distributional

shift between training and test data. To this end, recent work

has proposed uncertainty mechanisms to increase their reliability.

Besides, meta-learning aims at improving the generalization

capability of DL models. By taking advantage of that, this paper

proposes an uncertainty-based Meta-Reinforcement Learning

(Meta-RL) approach with Out-of-Distribution (OOD) detection.

The presented method performs a given task in unseen envi-

ronments and provides information about its complexity. This

is done by determining ﬁrst and second-order statistics on the

estimated reward. Using information about its complexity, the

proposed algorithm is able to point out when tracking is reliable.

To evaluate the proposed method, we benchmark it on a radar-

tracking dataset. There, we show that our method outperforms

related Meta-RL approaches on unseen tracking scenarios in

peak performance by 16% and the baseline by 35% while

detecting OOD data with an F1-Score of 72%. This shows that

our method is robust to environmental changes and reliably

detects OOD scenarios.

Index Terms—Radar Sensors, Reinforcement Learning, Meta

Learning, Out-of Distribution Detection

I. INTRODUCTION

Radar sensors are gaining momentum in the modern semi-

conductor industry. Various modulation types, independence

of light conditions, low-cost and privacy-friendly features

lead the radar technology to be successfully employable in

applications such as people detection and object tracking [1].

To keep up the pace of advancements, attention directs to

applications like multi-person tracking, which is critical in

several areas such as automotive safety, medical services, or

logistics [2], [3]. Multi-person tracking attempts to estimate

the position of each target in a scene. For estimating the

track positions, the Unscented Kalman Filter (UKF) is often

used in radar tasks [4]. This method has the scope of a

bayesian estimation of the target position incorporating a

nonlinear dynamic movement model. The nonlinear transition

is approximated by the unscented transform, described in

[5]. By design, the UKF relies on hyperparameters. These

tunable hyperparameters describe the dynamics of the sensor

system and real-world scenarios. Particularly for radar data,

occlusions, non-human disturbances, and limited resolution are

signiﬁcant challenges for robust tracking. Thus, the choice of

optimal hyperparameters varies with the environment, and of-

ten their initial choice is suboptimal. Given the environment’s

variety of settings and noise, ﬁnding the best hyperparameters

for any possible scenario is infeasible.

Recent work focused on estimating the underlying dynamics

of a UKF with speciﬁc neural networks [6] or using Reinforce-

ment Learning (RL) to provide the best hyperparameters for

a given scene, as shown in [7], [8]. Although this approach is

promising, it often lacks robustness, and the data distribution

on inference time might be different from the one used for

training. Furthermore, the UKF model might fail to track in

overcrowded scenarios, given the capability of its underlying

model.

In real-life applications such as automotive radar sensing,

robustness also has to be critical, and then Deep Reinforcement

Learning (DRL) is employed. However, neural networks are

known to fail when the test data distribution is far off the

training data distribution [9]. As a consequence, meta-learning

is often used in literature to close the gap between training and

test data distributions [10] and aims to adapt quickly to novel

tasks. Although model-based meta-learning algorithms have

obtained excellent results, they are limited to the network de-

sign stage [11]. For this reason, research has recently focused

on model-agnostic meta-learning, which enables learning tasks

independently of the machine learning model. This is granted

through task-speciﬁc optimization and is widely applicable in

domains such as few-shot classiﬁcation and RL. In the case of

Meta-RL, context variables are a promising way to incorporate

task-speciﬁc information, as shown in [12]. In this method,

the context variable is learned by a neural network from task-

speciﬁc data. Although the method is performant, it requires

computing a context variable using multiple data samples from

each additional task. Storing this data during inference is inef-

ﬁcient. Hence, our proposed method uses domain information

to formulate a context variable. Our approach does not need

to store data: it computes the context variable from easy-

to-obtain input distribution statistics, which we refer to as

arXiv:2210.14532v1 [cs.LG] 26 Oct 2022

context prior. Furthermore, as shown in the experiments, our

method leads to an improved domain generalization compared

to other state-of-the-art approaches. Nevertheless, in several

real-world applications (e.g., radar-tracking), improved domain

generalization cannot address the limitations inherent to the

task itself. For example, tracking operations in small, crowded

scenarios with obstacles becomes increasingly tricky. In order

to assess the reliability of our Meta-RL method, we develop

an uncertainty mechanism via bootstrapped networks. The

uncertainty mechanism is combined with the context prior that

encodes information about the task difﬁculty. In this approach,

scenes where tracking is prone to failure, are classiﬁed as

OOD, thus assessing the particular reliability of the tracker

in the current scenario.

As a summary, the contributions of the paper are the follow-

ings:

1) Meta-RL for domain generalization without additional

memory footprint using context priors

2) Enhanced OOD detection with context priors that encode

task difﬁculty

The remainder of the paper proceeds as follows: in Section II,

we introduce the radar-tracking problem and how to tackle it

with RL. Afterward, we explain the speciﬁc signal processing

in Section III. In the same section, we show how the input data

distribution is used to compute an informative context variable.

At the end of this section, using the context variable, we

propose a Meta-RL algorithm for environment generalization

and detection of OOD scenarios. Finally, we evaluate our

method on a multi-target radar tracking dataset against related

Meta-RL methods in Section IV. Our proposed approach

outperforms comparable Meta-RL approaches in terms of peak

performance by 16% on the test scenarios and the baseline of

ﬁxed parameters by 35%. In the same way, it detects OOD

scenarios with an F1-score of 72%. Thus, our approach is

more robust to environmental changes and reliably detects

OOD scenarios.

In Section V, we summarize our results and give an outlook

on future work in Section.

II. BACKGROUND AND MOTIVATION

In this section, we review the background and related work.

In Section II-A, we ﬁrst outline the principle of radar tracking.

Afterward, we explain how RL can be used to optimize radar

tracking. Additionally, we extend this concept by introducing

the fundamentals of Meta-RL and Uncertainty-based RL.

A. Radar Tracking

Frequency Modulated Continuous Wave (FMCW) radars

can estimate the range, Angle of Arrival (AoA), and velocity of

targets. In the case of radar tracking, we use the range and AoA

to determine the target position. The typical radar tracking

pipeline can be divided into signal processing, detection, clus-

tering, and tracking, as shown in [13]. A high level description

is given in Figure 1. The signal processing stage elaborates the

sensor data from each radar antenna to estimate the reﬂected

signal’s range and angle. The resulting image is a so-called

Range-Angle Image (RAI). Afterward, the RAI is convolved

with a window that determines the signal threshold based

on the surrounding noise. Usually, a Constant False Alarm

Rate (CFAR) algorithm [14] or a variation thereof deﬁnes

the threshold. A clustering algorithm groups nearby detected

signals, and the respective cluster means are input to the track-

ing stage. In this part of the pipeline, the track management

determines whether to assign the measurement to a track, open

a new track, discard the measurement or delete non-active

tracks. Before updating the track, the measurement has to be

ﬁltered by the tracking ﬁlter based on the last position and an

underlying movement model. The UKF is a commonly used

tracking ﬁlter [15]. The presented tracking pipeline heavily

angle

range

Target Proposals

Hyperparameters

Signal

Processing Detection Clustering

Track

Management

Track Filter

Radar Data

Fig. 1: High-level description of a Radar Tracking Pipeline.

relies on hyperparameters. Namely, the tracking performance

depends on the gating threshold for assigning tracks and the

covariance matrix of the measurement and state transition

models. Typically, those hyperparameters are determined by

an expert user with recorded data and ground truth positions

evaluating the Normalized Estimation Error Squared (NEES).

However, this approach is unlikely to perform well once the

radar is deployed in a different environment. Thus, recent work

proposed to use RL to tackle the combinatorial problem of

ﬁnding the best set of parameters for any scenario [16].

B. Reinforcement Learning

In RL the problem is formalized as a Markov Decision

Process (MDP) by (S, A, R, p, γ), where Sis the state space

as radar sensor input, Ais the action space deﬁned as hyper-

parameters, Ris the reward as tracking performance shown

in [16], pπis the unknown transition probability between

states following policy πand γis the discount factor. Let

τ= (st, at, rt, st+1)deﬁne the transition from state sat time

step tto the next state st+1 following action atwith reward rt.

In traditional RL the goal is to maximize the sum of expected

rewards X

E(st,at)∼pπ[r(st,at)].(1)

This can be achieved by value iteration methods [17]. There,

we deﬁne a Q-Value Q(st, at)for each state and action pair

that estimates the expected reward. Afterward, for each state

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Uncertainty-basedMeta-ReinforcementLearningforRobustRadarTracking1stJuliusOttInneonTechnologiesAGjulius.ott@inneon.com2ndLorenzoServadeiInneonTechnologiesAGlorenzo.servadei@inneon.com3rdGianfrancoMauroInneonTechnologiesAGgianfranco.mauro@inneon.com4thThomasStadelmayerInneonTechnologiesAGthoma...

展开>> 收起<<

Uncertainty-based Meta-Reinforcement Learning for Robust Radar Tracking 1stJulius Ott.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Uncertainty-based Meta-Reinforcement Learning for Robust Radar Tracking 1stJulius Ott

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: