ACRNet Attention Cube Regression Network for Multi-view Real-time 3D Human Pose Estimation in Telemedicine Boce Hu Chenfei Zhu Xupeng Ai and Sunil K. Agrawal_2

2025-04-30 0 0 1.66MB 8 页 10玖币
侵权投诉
ACRNet: Attention Cube Regression Network for Multi-view Real-time
3D Human Pose Estimation in Telemedicine
Boce Hu, Chenfei Zhu, Xupeng Ai and Sunil K. Agrawal*
Abstract Human pose estimation (HPE) for 3D skeleton
reconstruction in telemedicine has long received attention.
Although the development of deep learning has made HPE
methods in telemedicine simpler and easier to use, addressing
low accuracy and high latency remains a big challenge. In
this paper, we propose a novel multi-view Attention Cube
Regression Network (ACRNet), which regresses the 3D position
of joints in real time by aggregating informative attention points
on each cube surface. More specially, a cube whose each surface
contains uniformly distributed attention points with specific
coordinate values is first created to wrap the target from
the main view. Then, our network regresses the 3D position
of each joint by summing and averaging the coordinates of
attention points on each surface after being weighted. To
verify our method, we first tested ACRNet on the open-
source ITOP dataset; meanwhile, we collected a new multi-view
upper body movement dataset (UBM) on the trunk support
trainer (TruST) to validate the capability of our model in
real rehabilitation scenarios. Experimental results demonstrate
the superiority of ACRNet compared with other state-of-the-
art methods. We also validate the efficacy of each module in
ACRNet. Furthermore, Our work analyzes the performance of
ACRNet under the medical monitoring indicator. Because of
the high accuracy and running speed, our model is suitable
for real-time telemedicine settings. The source code is available
at:https://github.com/BoceHu/ACRNet
I. INTRODUCTION
Telemedicine is an emerging and booming treatment ap-
proach in the medical field because of its high efficiency,
cost-effective strategy, and safety. Compared with traditional
medical treatment, telemedicine improves treatment effi-
ciency through timely feedback between doctors and patients.
Also, it leverages technologies, such as computer-aided pose
assessment, to provide accurate and objective patient condi-
tions, during which the time of supervision and evaluation by
therapists is reduced, and the number of face-to-face diagno-
sis sessions is also lessened, thus significantly minimizing the
cost of rehabilitation. Meanwhile, telemedicine offers new
probabilities for patients with reduced mobility or disabilities
to be treated at home, effectively preventing infection caused
by exposure to unsanitary conditions. In practice, delivering
such a service remotely requires satisfying several constraints
like exploiting limited computing power on personal comput-
ers, high precision, and real-time performance.
Boce Hu, Chenfei Zhu, and Xupeng Ai are with the Department of
Mechanical Engineering, Columbia University, New York, NY, 10027 USA
(e-mail:bh2770@columbia.edu)
* Sunil K. Agrawal is with the Department of Mechanical Engineering,
Department of Rehabilitation and Regenerative Medicine, Columbia Uni-
versity, New York, NY 10027 USA (e-mail: sunil.agrawal@columbia.edu).
This work involved human subjects in its research. Approval of all ethical
and experimental procedures was granted by the Institutional Review Board
of Columbia University under Protocol No.AAAQ7781.
Fig. 1. An attention cube is introduced to wrap the target from the main
view, and evenly distributed gray points stand for attention points on each
surface. ACRNet calculates the point-wise weight on each surface to find
the informative attention points for regressing the 3D position of joints. In
the figure, the darker the point’s color, the higher its weight.
Telemedicine has been widely used in three medical appli-
cation areas: prediction of movement disorders, diagnosis of
movement disorders, and sports rehabilitation training [1]–
[3]. One of the most significant technology for realizing
them is utilizing human pose estimation (HPE) to recon-
struct the 3D human body skeleton. Considering the actual
implementation requirements in telemedicine, scientists pro-
posed sensor-based and learning-based methods to estimate
human pose for 3D reconstruction. However, sensor-based
methods (e.g., wearable equipment) need to be attached to
the body of patients, which affects patient movement, leading
to inaccurate diagnoses. Moreover, appropriately adjusting
devices on wearable equipment, such as inertial measurement
units (IMUs) and gyroscopes, requires professional skills.
Therefore, the drawbacks of sensor-based methods seriously
hinder its further development in telemedicine.
Benefiting from advances in deep learning and computer
vision, learning-based HPE technology enables telemedicine
to get rid of counting on sensor-based methods in a non-
contact and easily calibrated way. Nevertheless, these meth-
ods still face low accuracy and high latency problems. As a
result, to meet the multiple requirements in telemedicine, we
propose a novel Attention Cube Regression Network (ACR-
Net), a unified and effective network with fully differentiable
end-to-end training ability to perform estimation work based
on multi-view depth images. ACRNet introduces an attention
arXiv:2210.05130v1 [cs.CV] 11 Oct 2022
cube to wrap the object and aggregates information from
each surface of the cube to estimate the 3D position of human
joints, as shown in Fig. 1. More specifically, a fixed-size cube
wrapping the human body from the main view will be created
first, with a fixed number of points distributed uniformly
on each surface. Points on the same surface constitute an
attention matrix. Then, our network fuses the feature infor-
mation from all views to calculate the weight matrices of all
attention matrices w.r.t each joint. Finally, joints position are
deduced by the sum of the element-wise products of all the
attention matrices and corresponding weight matrices. Within
the model, feature maps are extracted from depth images
by a two phases backbone network; after that, a multi-view
fusion module integrates feature maps from different views
using dynamic weights according to the mechanism of cross
similarity. Next, a weight distribution module simultaneously
computes the attention matrix’s corresponding weight matrix
on each surface for the final regression. For each joint,
contributions of different attention points are not equal;
hence, each joint has its informative points (points with
high weight) to be used to regress the position and non-
informative points (points with low weight) to be discarded.
To validate our method, ACRNet is first tested on the ITOP
dataset. The results demonstrate that our method outperforms
the state-of-the-art methods on front-view settings while on
par with the best state-of-the-art method on top-view settings.
Moreover, the running speed of ACRNet achieves 92.3 FPS
on a single NVIDIA Tesla V100 GPU, enabling it to work in
a real-time environment. Furthermore, to verify the capability
of our model in real rehabilitation scenarios, therefore pro-
viding a technical foundation for the telemedicine platform,
we collect a new medical multi-view upper body movement
dataset (UBM) from 16 healthy subjects on the trunk support
trainer (TruST) [4], labeled by a Vicon infrared system.
Our model consistently outperforms the baseline [5] on this
dataset. Overall, the contributions of this manuscript are:
ACRNet: A fully differentiable multi-view regression
network based on depth images to estimate 3D human
joint positions for telemedicine use.
A new backbone structure and a dynamic multi-view
fusion module are proposed. Both of them improve the
representation ability of our model.
UBM: A Vicon-labeled multi-view upper body move-
ment dataset for rehabilitation use, consisting of depth
images collected from 16 healthy subjects.
II. RELATED WORKS
A. 3D HPE with Sensor-based Methods
Currently, clinical diagnosis and treatments using motion
capture and pose estimation depend on Vicon because of its
preciseness, but this system is unsuitable for telemedicine
caused of its expensive components and difficulty trans-
ferring. Thus, Sensor-based wearable equipment is used in
telemedicine to capture patients’ motion data. Li et al. [6]
use multiple inertial sensors attached to the lower limbs of
children with cerebral palsy to evaluate their motor abilities
and validate therapy effectiveness. Sarker et al. [7] infer the
complete upper body kinematics for rehabilitation applica-
tions based on three standalone IMUs mounted on wrists
and pelvis. Nguyen et al. [8] propose using optical linear
encoders and accelerometers to capture the goniometric
data of limb joints. As these methods will affect patients’
movement, and some components are also hard to calibrate,
they will lead to an inaccurate diagnosis, weakening its
application value in telemedicine.
B. 3D HPE with Learning-based Methods
Learning-based HPE methods can be divided into machine
learning and deep learning. The former [9]–[12] usually
transforms the estimation problem into a classification prob-
lem by calculating the probability of the location for each
joint. A serious drawback of these methods is the severely
deficient representation ability when the estimation work
is complex. As a result, deep learning methods utilizing
RGB or depth images have become mainstream in this field.
RGB-images-based methods [5], [13]–[15] are intuitive and
convenient. Nevertheless, the accuracy of those methods
is relatively low due to the lack of spatial information.
With the popularity of depth cameras, depth-image-based
methods address this shortcoming. Guo et al. [16] propose
a tree-structured Region Ensemble Network to aggregate the
depth information. Kim et al. [17] estimate human pose by
projecting the depth and ridge data in various directions.
Qiu et al. [18] tackle the core problems of monocular HPE,
like self-occlusion and joint ambiguity, by an embedded
fusion layer that merges features from different views. He
et al. [19] extend this method with the Transformer to
match the given view with neighboring views along the
epipolar line by calculating feature similarity to obtain the
final 3D features. Further, Moon et al. [20] and Zhou et
al. [21] take advantage of point clouds with more intuitive
information transformed from depth images to acquire an
exact 3D position of the human body. Although point-cloud-
based methods are accurate enough, these methods generate
a plethora of parameters during execution, consuming more
time and memory, which prevents them from working in
real-time. Consequently, considering the pros and cons of
different data types, our work will directly adopt the depth
map as the model’s input.
Inspired by the work [22], which exploits the global-local
spatial information from 2D anchor points, we introduce the
3D attention cube. Our attention points are created following
their method; however, we enhance the correlation between 3
principle directions by facilitating the interaction of different
cube surfaces to eliminate the estimated bias of each surface.
This mutually constrained property improves the robustness
of our method.
III. METHODOLOGY
The workflow of our ACRNet is shown in Fig. 2. Given
images captured by two depth cameras simultaneously, ACR-
Net first extracts the feature map of each view by the
backbone network and then merges feature maps from two
摘要:

ACRNet:AttentionCubeRegressionNetworkforMulti-viewReal-time3DHumanPoseEstimationinTelemedicineBoceHu,ChenfeiZhu,XupengAiandSunilK.Agrawal*Abstract—Humanposeestimation(HPE)for3Dskeletonreconstructionintelemedicinehaslongreceivedattention.AlthoughthedevelopmentofdeeplearninghasmadeHPEmethodsintelemedi...

展开>> 收起<<
ACRNet Attention Cube Regression Network for Multi-view Real-time 3D Human Pose Estimation in Telemedicine Boce Hu Chenfei Zhu Xupeng Ai and Sunil K. Agrawal_2.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:1.66MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注