HuPR A Benchmark for Human Pose Estimation Using Millimeter Wave Radar Shih-Po Lee Niraj Prakash Kini Wen-Hsiao Peng Ching-Wen Ma Jenq-Neng Hwang National Yang Ming Chiao Tung University Taiwan

2025-05-08 0 0 8.12MB 10 页 10玖币
侵权投诉
HuPR: A Benchmark for Human Pose Estimation Using Millimeter Wave Radar
Shih-Po Lee*
, Niraj Prakash Kini, Wen-Hsiao Peng, Ching-Wen Ma, Jenq-Neng Hwang
National Yang Ming Chiao Tung University, Taiwan
University of Washington, USA
{mapl0756051.cs07g, nirajnctu.cs06g, machingwen}@nctu.edu.tw, wpeng@cs.nctu.edu.tw, hwang@uw.edu
Abstract
This paper introduces a novel human pose estima-
tion benchmark, Human Pose with Millimeter Wave Radar
(HuPR), that includes synchronized vision and radio signal
components. This dataset is created using cross-calibrated
mmWave radar sensors and a monocular RGB camera for
cross-modality training of radar-based human pose estima-
tion. There are two advantages of using mmWave radar to
perform human pose estimation. First, it is robust to dark
and low-light conditions. Second, it is not visually perceiv-
able by humans and thus, can be widely applied to appli-
cations with privacy concerns, e.g., surveillance systems
in patient rooms. In addition to the benchmark, we pro-
pose a cross-modality training framework that leverages the
ground-truth 2D keypoints representing human body joints
for training, which are systematically generated from the
pre-trained 2D pose estimation network based on a monoc-
ular camera input image, avoiding laborious manual la-
bel annotation efforts. The framework consists of a new
radar pre-processing method that better extracts the veloc-
ity information from radar data, Cross- and Self-Attention
Module (CSAM), to fuse multi-scale radar features, and
Pose Refinement Graph Convolutional Networks (PRGCN),
to refine the predicted keypoint confidence heatmaps. Our
intensive experiments on the HuPR benchmark show that
the proposed scheme achieves better human pose estima-
tion performance with only radar data, as compared to
traditional pre-processing solutions and previous radio-
frequency-based methods. Our code is available at here1
1. Introduction
Human pose estimation (HPE) is one of the widely stud-
ied traditional tasks in computer vision. Given the RGB
*This work is supported by National Center for High-performance
Computing, Taiwan.
1https://github.com/robert80203/HuPR-A-Benchmark-for-Human-
Pose-Estimation-Using-Millimeter-Wave-Radar
Figure 1: Illustration of our overall system setup. The up-
per branch represents mmWave Data Collection and Pre-
processing. The lower branch represents our proposed
Cross and Self Attention Module and Pose Refinement
Graph Convolutional Networks. H: Horizontal, V: Vertical.
images under a single or multiple camera view, it predicts
2D/3D human skeletons, in terms of estimating human body
keypoints. Though promising results have been demon-
strated by previous HPE solutions, the natural properties of
an RGB image undoubtedly constrain the advancement of
HPE. In particular, the RGB images captured in the dark
and a low-light conditions can hardly show a person’s pose,
leading to an inferior quality of pose estimation. In addition,
using such vision-based inputs consequently raise the con-
cern of the personal privacy. For example, the surveillance
systems installed in the patient rooms monitor personal ac-
tivities by analyzing their poses while at the same time, the
personal appearance is inevitably disclosed to the systems.
Therefore, predicting human poses using vision-based input
encounters adverse lighting and privacy invasion issues.
To address the above issues, a new type of HPE task
has been proposed. Several radio frequency (RF) datasets
[30, 19, 9] are built to predict human skeletions. Such
RF signals are robust to lighting conditions and barely vi-
sually perceivable by humans. RF signals can be catego-
arXiv:2210.12564v1 [cs.CV] 22 Oct 2022
rized by the frequency bands with varying characteristics.
Zhao et al. [30] adopt Wi-Fi signals (2.4 GHz), which pos-
sess a unique ability that it is able to capture a person’s
pose even when he or she is standing behind a wall. In
spite of showing excellent results on 2D pose estimation,
the Wi-Fi sensors used by [30] is proprietorially-designed.
On the other hand, Sengupta et al. [19] introduces another
type of RF signal, Frequency Modulated Continuous Wave
(FMCW) radar, with frequency band periodically chang-
ing from 77GHz to 81GHz, which can precisely detect the
depth (range) and the velocity of an object. Comparing to
the Wi-Fi sensor used by [30], the mmWave radar sensor is
more economical and accessible, as well as commercially
available from many instrument providers [8]. The 3D HPE
results shown in [19] seem promising; however, it ignores
the human body keypoints with high uncertainty, such as
wrists, due to their low prediction accuracy, showing an
inferior capability of capturing human poses using radar.
Most importantly, the datasets of both [30, 19] remain in-
accessible to the public, restricting the further development
of an HPE in terms of RF data.
To overcome the challenging issues encountered in
RGB-based and RF-based HPE, we introduce a new bench-
mark, Human Pose with mmWave Radar (HuPR). Unlike
[19], we additionally incorporate velocity information in
our dataset, since radar sensors can provide a highly pre-
cise velocity information. Meanwhile, we propose a Cross-
and Self-Attention Module (CSAM) to better fuse the multi-
scale features from horizontal and vertical radars and a 2D
pose confidence refinement network based on Graph Con-
volutional Network (PRGCN) to refine the confidence in the
output pose heatmaps. Our framework consists of two cas-
cading components, 1) Multi-Scale Spatio-Temporal Radar
Feature Fusion with CSAM, which contains two branches
to encode temporal range-azimuth and range-elevation in-
formation respectively, followed by a decoder to decode the
fused features at every scale and predict 2D pose heatmaps
and 2) PRGCN which is applied to the output heatmaps
to refine the confidence of each keypoint based on a pre-
defined graph of human skeletons. Our contributions are
threefold:
We introduce a novel RF-based HPE benchmark,
HuPR, which features privacy-preserving data, eco-
nomical and accessible radar sensors, and handy hard-
ware setup. The dataset and implementation code will
be released upon paper acceptance.
We propose a new radar pre-processing method that
better extracts velocity information from radar signals
to help RF-based HPE.
We propose CSAM to relate the features from two dif-
ferent radars for better feature fusion and PRGCN to
refine the confidence of each keypoint, especially to
Figure 2: Examples of actions in our dataset, including
standing with fixed actions, standing with waving hands,
and walking with waving hands.
improve the precision of the faster moving edge key-
points, such as wrists. Experimental results and abla-
tion studies show that our proposed method makes sig-
nificant improvement over RF-based 2D HPE methods
and 3D pointcloud-based methods.
2. Related Work
2.1. RGB-based HPE
There have been extensive studies on RGB-based HPE.
In general, these works can be split into two categories:
regression-based methods and heatmap-based methods.
Traditional regression-based methods [15, 27, 10] map in-
put sources to the coordinates of body keypoints via an end-
to-end neural network. The regression-based solutions are
straightforward but less attractive since it is more difficult
for a neural network to map image features into just sev-
eral keypoint coordinates. On the other hand, heatmap-
based methods [22, 11, 14, 6, 3] generally outperforming
regression-based methods and dominate the field of HPE.
Heatmap-based HPEs produce likelihood heatmaps for each
keypoint as the target of pose estimation.
2.2. RF-based HPE
RF-based data are often used to deal with simpler hu-
man sensing tasks, such as activity recognition [20, 21],
gesture recognition [24, 13] and human object detection
[5, 23]. Channel State Information (CSI) data are the main
RF sources in early days, but they do not provide range or
distance information. With the development of economical
radio sensors, the estimation of range and angle of arrival
becomes feasible with affordable devices, allowing more
detailed and complicated tasks like HPE to be conducted on
RF-based data [30, 19, 18, 4]. Zhao et al. [30] utilize WiFi-
ranged FMCW signals with the ability to generate the 2D
摘要:

HuPR:ABenchmarkforHumanPoseEstimationUsingMillimeterWaveRadarShih-PoLee†*,NirajPrakashKini†,Wen-HsiaoPeng†,Ching-WenMa†,Jenq-NengHwang‡†NationalYangMingChiaoTungUniversity,Taiwan‡UniversityofWashington,USA{mapl0756051.cs07g,nirajnctu.cs06g,machingwen}@nctu.edu.tw,wpeng@cs.nctu.edu.tw,hwang@uw.eduAbs...

展开>> 收起<<
HuPR A Benchmark for Human Pose Estimation Using Millimeter Wave Radar Shih-Po Lee Niraj Prakash Kini Wen-Hsiao Peng Ching-Wen Ma Jenq-Neng Hwang National Yang Ming Chiao Tung University Taiwan.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:8.12MB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注