
mm-Wave Radar Hand Shape Classification Using
Deformable Transformers
Athmanarayanan Lakshmi Narayanan#1, Asma Beevi K. T.#2, Haoyang Wu*#3, Jingyi Ma*#4, W. Margaret Huang#5,
#Intel Labs, Santa Clara CA, USA
*Intel, Intel Labs, Intel Labs China, China
{1athma.lakshmi.narayanan, 2asma.kuriparambil.thekkumpate, 3haoyang.wu, 4jingyi.ma, 5margaret.huang}@intel.com
Abstract — A novel, real-time, mm-Wave radar-based static
hand shape classification algorithm and implementation are
proposed. The method finds several applications in low cost
and privacy sensitive touchless control technology using 60 Ghz
radar as the sensor input. As opposed to prior Range-Doppler
image based 2D classification solutions, our method converts raw
radar data to 3D sparse cartesian point clouds.The demonstrated
3D radar neural network model using deformable transformers
significantly surpasses the performance results set by prior
methods which either utilize custom signal processing or apply
generic convolutional techniques on Range-Doppler FFT images.
Experiments are performed on an internally collected dataset
using an off-the-shelf radar sensor.
Keywords — radar, point cloud, classification, deep learning,
transformer.
I. INTRODUCTION
Users are demanding for more and more touchless
interfaces and controls especially since the onset of
COVID pandemic. Use of mm-wave radar for non-verbal
gesture interaction([1], [2]) have the advantage of low-cost
implementations without the privacy concern. Radar also
helps to detect objects in occluded or variable lighting
conditions. Radar signal processing extracts the range, speed
and angle information of surrounding moving targets. Most
radar based indoor solutions mainly focus on the human action
or gesture classification. While static gesture (hand shape/pose)
recognition has steadily garnered attention, still few questions
remain.
Prior solution for static hand shape recognition [3] uses
Region-Of-Interest (ROI) Range-Doppler (RD) data, with
custom Constant False Alarm Rate (CFAR) algorithm to
reduce background noise followed by 2-dimensional (2D)
convolutional neural networks (CNNs) to classify the hand
shape. However, this solution can only detect hand shapes
at fixed range (30-60 cm) and orientation (hand front facing
radar). Challenges of false positives (such as face classified as
hand) and limited shape options (finger, palm, fist) also need
to be addressed. Moreover, these solutions have difficulties in
extending to additional complex hand shapes and similar but
distinct hand poses.
As a closer step towards a complete touchless gesture
solution, a flexible hand shape/pose recognition using a 60
GHz FMCW radar is proposed without the need for adaptive
CFAR or background noise clutter removal. Our solution
converts RD maps to 3-dimensional (3D) point cloud (PC)
representation and uses 3D deep neural networks (DNNs) to
classify the shape using sparse set of points. Such a point cloud
representation allows for:
•Better utilization of 3D geometry for complex shapes
•Leverage the mature techniques and breakthroughs in
the much larger field of 3D computer vision
•Rotation invariant augmentation during training.
Furthermore, we apply state of the art deep learning
architectures like deformable vision transformers to
the problem of radar point cloud classification enabling
self-attention and increasing the scope of re-usability.
This paper is organized as follows: The problem
description and the data set details are given in Section
II. In Section III, three different approaches towards radar
shape classification are presented. The model performance
and experiment results are summarized and benchmarked in
Section IV. Finally, Section V. concludes this work.
Fig. 1. (left) Radar data collection setup and (right) hand pose shapes.
II. PROBLEM DESCRIPTION
A. Hardware setup and data collection
A low cost, low form factor off-the-shelf radar sensor board
(FMCW 60 Ghz radar sensing solution with 1 TX and 3 RX
antennas) was used to capture the radar scenes. The radar
bandwidth of 5 GHz and ADC sampling rate of 2 MHz was
used to capture the RD information with max range and max
velocity of about 2m and 17m/s respectively. The radar scenes
were captured in a room using both the left and right hands
of the subjects within 20-95 cm from the radar sensor board.
The radar hand shapes dataset consist of 5 different hand
shapes which are palm, fist, finger, ‘C’-shape and ‘Yolo’ hand
arXiv:2210.13079v1 [cs.CV] 24 Oct 2022