HuPR A Benchmark for Human Pose Estimation Using Millimeter Wave Radar Shih-Po Lee Niraj Prakash Kini Wen-Hsiao Peng Ching-Wen Ma Jenq-Neng Hwang National Yang Ming Chiao Tung University Taiwan

2025-05-08 0 0 8.12MB 10 页 10玖币

侵权投诉

HuPR: A Benchmark for Human Pose Estimation Using Millimeter Wave Radar

Shih-Po Lee†*

, Niraj Prakash Kini†, Wen-Hsiao Peng†, Ching-Wen Ma†, Jenq-Neng Hwang‡

†National Yang Ming Chiao Tung University, Taiwan

‡University of Washington, USA

{mapl0756051.cs07g, nirajnctu.cs06g, machingwen}@nctu.edu.tw, wpeng@cs.nctu.edu.tw, hwang@uw.edu

Abstract

This paper introduces a novel human pose estima-

tion benchmark, Human Pose with Millimeter Wave Radar

(HuPR), that includes synchronized vision and radio signal

components. This dataset is created using cross-calibrated

mmWave radar sensors and a monocular RGB camera for

cross-modality training of radar-based human pose estima-

tion. There are two advantages of using mmWave radar to

perform human pose estimation. First, it is robust to dark

and low-light conditions. Second, it is not visually perceiv-

able by humans and thus, can be widely applied to appli-

cations with privacy concerns, e.g., surveillance systems

in patient rooms. In addition to the benchmark, we pro-

pose a cross-modality training framework that leverages the

ground-truth 2D keypoints representing human body joints

for training, which are systematically generated from the

pre-trained 2D pose estimation network based on a monoc-

ular camera input image, avoiding laborious manual la-

bel annotation efforts. The framework consists of a new

radar pre-processing method that better extracts the veloc-

ity information from radar data, Cross- and Self-Attention

Module (CSAM), to fuse multi-scale radar features, and

Pose Reﬁnement Graph Convolutional Networks (PRGCN),

to reﬁne the predicted keypoint conﬁdence heatmaps. Our

intensive experiments on the HuPR benchmark show that

the proposed scheme achieves better human pose estima-

tion performance with only radar data, as compared to

traditional pre-processing solutions and previous radio-

frequency-based methods. Our code is available at here1

1. Introduction

Human pose estimation (HPE) is one of the widely stud-

ied traditional tasks in computer vision. Given the RGB

*This work is supported by National Center for High-performance

Computing, Taiwan.

1https://github.com/robert80203/HuPR-A-Benchmark-for-Human-

Pose-Estimation-Using-Millimeter-Wave-Radar

Figure 1: Illustration of our overall system setup. The up-

per branch represents mmWave Data Collection and Pre-

processing. The lower branch represents our proposed

Cross and Self Attention Module and Pose Reﬁnement

Graph Convolutional Networks. H: Horizontal, V: Vertical.

images under a single or multiple camera view, it predicts

2D/3D human skeletons, in terms of estimating human body

keypoints. Though promising results have been demon-

strated by previous HPE solutions, the natural properties of

an RGB image undoubtedly constrain the advancement of

HPE. In particular, the RGB images captured in the dark

and a low-light conditions can hardly show a person’s pose,

leading to an inferior quality of pose estimation. In addition,

using such vision-based inputs consequently raise the con-

cern of the personal privacy. For example, the surveillance

systems installed in the patient rooms monitor personal ac-

tivities by analyzing their poses while at the same time, the

personal appearance is inevitably disclosed to the systems.

Therefore, predicting human poses using vision-based input

encounters adverse lighting and privacy invasion issues.

To address the above issues, a new type of HPE task

has been proposed. Several radio frequency (RF) datasets

[30, 19, 9] are built to predict human skeletions. Such

RF signals are robust to lighting conditions and barely vi-

sually perceivable by humans. RF signals can be catego-

arXiv:2210.12564v1 [cs.CV] 22 Oct 2022

rized by the frequency bands with varying characteristics.

Zhao et al. [30] adopt Wi-Fi signals (2.4 GHz), which pos-

sess a unique ability that it is able to capture a person’s

pose even when he or she is standing behind a wall. In

spite of showing excellent results on 2D pose estimation,

the Wi-Fi sensors used by [30] is proprietorially-designed.

On the other hand, Sengupta et al. [19] introduces another

type of RF signal, Frequency Modulated Continuous Wave

(FMCW) radar, with frequency band periodically chang-

ing from 77GHz to 81GHz, which can precisely detect the

depth (range) and the velocity of an object. Comparing to

the Wi-Fi sensor used by [30], the mmWave radar sensor is

more economical and accessible, as well as commercially

available from many instrument providers [8]. The 3D HPE

results shown in [19] seem promising; however, it ignores

the human body keypoints with high uncertainty, such as

wrists, due to their low prediction accuracy, showing an

inferior capability of capturing human poses using radar.

Most importantly, the datasets of both [30, 19] remain in-

accessible to the public, restricting the further development

of an HPE in terms of RF data.

To overcome the challenging issues encountered in

RGB-based and RF-based HPE, we introduce a new bench-

mark, Human Pose with mmWave Radar (HuPR). Unlike

[19], we additionally incorporate velocity information in

our dataset, since radar sensors can provide a highly pre-

cise velocity information. Meanwhile, we propose a Cross-

and Self-Attention Module (CSAM) to better fuse the multi-

scale features from horizontal and vertical radars and a 2D

pose conﬁdence reﬁnement network based on Graph Con-

volutional Network (PRGCN) to reﬁne the conﬁdence in the

output pose heatmaps. Our framework consists of two cas-

cading components, 1) Multi-Scale Spatio-Temporal Radar

Feature Fusion with CSAM, which contains two branches

to encode temporal range-azimuth and range-elevation in-

formation respectively, followed by a decoder to decode the

fused features at every scale and predict 2D pose heatmaps

and 2) PRGCN which is applied to the output heatmaps

to reﬁne the conﬁdence of each keypoint based on a pre-

deﬁned graph of human skeletons. Our contributions are

threefold:

• We introduce a novel RF-based HPE benchmark,

HuPR, which features privacy-preserving data, eco-

nomical and accessible radar sensors, and handy hard-

ware setup. The dataset and implementation code will

be released upon paper acceptance.

• We propose a new radar pre-processing method that

better extracts velocity information from radar signals

to help RF-based HPE.

• We propose CSAM to relate the features from two dif-

ferent radars for better feature fusion and PRGCN to

reﬁne the conﬁdence of each keypoint, especially to

Figure 2: Examples of actions in our dataset, including

standing with ﬁxed actions, standing with waving hands,

and walking with waving hands.

improve the precision of the faster moving edge key-

points, such as wrists. Experimental results and abla-

tion studies show that our proposed method makes sig-

niﬁcant improvement over RF-based 2D HPE methods

and 3D pointcloud-based methods.

2. Related Work

2.1. RGB-based HPE

There have been extensive studies on RGB-based HPE.

In general, these works can be split into two categories:

regression-based methods and heatmap-based methods.

Traditional regression-based methods [15, 27, 10] map in-

put sources to the coordinates of body keypoints via an end-

to-end neural network. The regression-based solutions are

straightforward but less attractive since it is more difﬁcult

for a neural network to map image features into just sev-

eral keypoint coordinates. On the other hand, heatmap-

based methods [22, 11, 14, 6, 3] generally outperforming

regression-based methods and dominate the ﬁeld of HPE.

Heatmap-based HPEs produce likelihood heatmaps for each

keypoint as the target of pose estimation.

2.2. RF-based HPE

RF-based data are often used to deal with simpler hu-

man sensing tasks, such as activity recognition [20, 21],

gesture recognition [24, 13] and human object detection

[5, 23]. Channel State Information (CSI) data are the main

RF sources in early days, but they do not provide range or

distance information. With the development of economical

radio sensors, the estimation of range and angle of arrival

becomes feasible with affordable devices, allowing more

detailed and complicated tasks like HPE to be conducted on

RF-based data [30, 19, 18, 4]. Zhao et al. [30] utilize WiFi-

ranged FMCW signals with the ability to generate the 2D

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HuPR:ABenchmarkforHumanPoseEstimationUsingMillimeterWaveRadarShih-PoLee†*,NirajPrakashKini†,Wen-HsiaoPeng†,Ching-WenMa†,Jenq-NengHwang‡†NationalYangMingChiaoTungUniversity,Taiwan‡UniversityofWashington,USA{mapl0756051.cs07g,nirajnctu.cs06g,machingwen}@nctu.edu.tw,wpeng@cs.nctu.edu.tw,hwang@uw.eduAbs...

展开>> 收起<<

HuPR A Benchmark for Human Pose Estimation Using Millimeter Wave Radar Shih-Po Lee Niraj Prakash Kini Wen-Hsiao Peng Ching-Wen Ma Jenq-Neng Hwang National Yang Ming Chiao Tung University Taiwan.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

HuPR A Benchmark for Human Pose Estimation Using Millimeter Wave Radar Shih-Po Lee Niraj Prakash Kini Wen-Hsiao Peng Ching-Wen Ma Jenq-Neng Hwang National Yang Ming Chiao Tung University Taiwan

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: