Towards Holographic Video Communications A Promising AI-driven Solution Yakun Huang Yuanwei Zhu Xiuquan Qiao Xiang Su Member IEEE

2025-04-24 0 0 2.94MB 7 页 10玖币
侵权投诉
Towards Holographic Video Communications: A
Promising AI-driven Solution
Yakun Huang, Yuanwei Zhu, Xiuquan Qiao, Xiang Su, Member, IEEE,
Schahram Dustdar, Fellow, IEEE, Ping Zhang, Fellow, IEEE
Abstract—Real-time holographic video communications enable
immersive experiences for next-generation video services in the
future metaverse era. However, high-fidelity holographic videos
require high bandwidth and significant computation resources,
which exceed the transferring and computing capacity of 5G
networks. This article reviews state-of-the-art holographic point
cloud video (PCV) transmission techniques and highlights the
critical challenges of delivering such immersive services. We
further implement a preliminary prototype of an AI-driven
holographic video communication system and present critical
experimental results to evaluate its performance. Finally, we
identify future research directions and discuss potential solutions
for providing real-time and high-quality holographic experiences.
I. INTRODUCTION
The holographic video provides users with an immersive
six degrees of freedom (6-DoF) viewing experience than
traditional virtual reality (VR), 360-degree, and other 3-DoF
videos [1]. 6-DoF videos are characterized by having depth
information for each frame, providing 3-DoF of translational
movement (X, Y, Z) and 3-DoF of rotational movement (yaw,
pitch, roll). 6-DoF videos allow users to walk around an object
in a circle and view it from the top and the bottom. Point cloud
video (PCV), as a representative holographic 6-DoF video
service, describes the objects using a set of disordered 3D
points with coordinates and color. Figure 1 compares the PCV
transmission with different video services. PCV stream (e.g.,
capturing one second of raw PCV with one depth camera at 30
frames per second (FPS) produces 2.06 Gb of data) is highly
time- and resource-consuming for encoding and decoding,
requiring at least an hour for a common computer compared
with 3-DoF videos.
More importantly, PCV transmission requires a bandwidth
capacity of more than Gbps level, far beyond the current
transmission capacity of 5G networks. Undoubtedly, the holo-
graphic video introduces requirements far exceeding tradi-
tional video streaming services regarding network bandwidth,
transmission latency, and computing complexity.
We investigate the transmission techniques for PCV, includ-
ing point cloud compression and video streaming optimization.
Y. Huang, Y. Zhu, X. Qiao and P. Zhang are with the State Key Labora-
tory of Networking and Switching Technology, Beijing University of Posts
and Telecommunications, Beijing 100876, China. E-mail:{ykhuang, zhuyw,
qiaoxq, pzhang}@bupt.edu.cn. (X. Qiao is the corresponding author.)
X. Su is with the Department of Computer Science, Norwegian University
of Science and Technology, 2815 Gjøvik, Norway and University of Oulu,
90570, Oulu, Finland. Email:xiang.su@ntnu.no.
S. Dustdar is with the Distributed Systems Group, Technische Universit¨
at
Wien, 1040 Vienna, Austria. E-mail:dustdar@dsg.tuwien.ac.at.
For compression, traditional methods include Kdtree-based
and Octree-based solutions, such as the popular Point Cloud
Library (PCL) [2] and Draco [3]. ISO/IEC Moving Picture
Experts Group (MPEG) standardizes Video-based Point Cloud
Compression(V-PCC) and Geometry based Point Cloud Com-
pression (G-PCC) for PCV. However, these methods require
higher computing resources and costs than 3-DoF videos.
Besides, although some deep learning-based compression tech-
niques provide lower accuracy loss and higher compression
ratios [4], [5], they are only applicable for offline holographic
video pre-processing due to high computing overhead and
inference latency. For video streaming optimization, most
point video steaming techniques expand 3-DoF video stream-
ing methods such as tiling and view angle prediction. Since
PCV adds extra 3-DoF information, it requires more adap-
tive adjustment of streaming than 3-DoF with the dynamic
change in the physical distance between the user and the
scene. Some research investigates the combination of point
cloud compression and transmission optimization [6], [7]. For
streaming quality of service (QoS) management, Zhang et
al. [8] proposed a covering-based quality prediction method
to accurately predict the QoS, along with the query of quality
correlation (Q2C) model [9] for the QoS guarantee. However,
these solutions cannot be run in real-time on mobile devices
due to the massive cost of video compression and codecs.
We review related surveys, tutorials, and magazine pub-
lications on PCV, holographic video, and immersive video.
Liu et al. [1] discuss the challenges and solutions to adaptive
point cloud streaming and provide a prototype of extending
MPEG Dynamic Adaptive Streaming over HTTP (DASH).
Clemm et al. [10] articulate the networking challenges to
enable immersive holographic videos and propose new net-
work architecture for optimizing the coordination and synchro-
nization of concurrent streams. Hooft et al. [11] present the
status and challenges of 6-DoF media and Taleb et al. [12]
provide an overview of immersive services as well as the
relevant industry and standardization activities. These works
highlight the gap between existing streaming solutions and
implementing PCV transmission. Most solutions extend from
3-DoF video compression or adaptive streaming techniques
and fail to involve an AI-native PCV streaming.
This article introduces the landscape and requirements of
holographic PCV communication and analyses the technical
challenges associated with supporting PCV services. We pro-
pose an advanced AI-driven transmission solution as a proto-
type for preliminary exploration and verify its performance.
arXiv:2210.06794v1 [cs.MM] 13 Oct 2022
View 1
View 2 View 3
View n
Capturing Encoding Transmission
Point Cloud
Videos
Hologram
experience
Decoding
1080p@240 FPS
Ricoh Theta 4K @60FPS
Kinect v2 @30FPS ~ Hour Level (MEPG V-PCC)
~ 25 Mbps (4K)
~ 400 Mbps (4K)
> 1 Gbps (1080 P) ~
1Tbps (Thousands of concurrent streams)
H.266 (1080P 40~80 FPS)
1080P streams (40 FPS)
MPEG V-PCC (1~5 FPS )
°
Video





1080p@240 FPS
~ Minute Level
~ #view × 25 Mbps
Capturing Comparison Encoding Cost Transmission Cost Decoding Cost
~ Second Level
~ Second Level H.266 (1080P 40~80 FPS)
Fig. 1. Comparisons between holographic point cloud video with conventional videos.
Our contributions pertain to 1) a novel transmission mecha-
nism for holographic PCV; 2) the end-to-end network design
that joins the encoder and decoder; and 3) adaptive streaming
technology for proposed AI-driven video transmission. We fur-
ther discuss the proposed AI-driven communication technique
and identify future directions for high-quality PCV services.
II. REQUIREMENTS AND CHALLENGES
A. Requirements
Holographic PCV streaming poses significant demands on
network transmission infrastructure in terms of ultra-low delay
and reliable network, heavy computation, and device mobility
and portability.
1) Ultra-low delay and reliable networks.: 6-DoF move-
ment and orientation features of PCV are more sensitive to
delay than 3-DoF video services, whose ideal delay require-
ment is less than 5 ms and is more stringent than that of
traditional 3-DoF videos (i.e., <20 ms) [13]. Since PCV
requires numerous depth cameras to capture data, this further
increases data volume than other types of videos. Therefore,
continuous and reliable transmission of multi-view captured
PCV streams requires a reliable network and lower network
jitter than 3-DoF video transmission.
2) Heavy computation.: Encoding and decoding a PCV
using the MPEG standard are computationally intensive, even
if we ignore the computations used for capturing. For example,
encoding a one-second video from the longdress dataset with
lossy compression requires 11 to 42 minutes using MPEG V-
PCC on a generic computer [14]. Although we can use high-
performance GPU servers to accelerate the encoding of PCV at
the sender, the computing capability of mobile devices, such as
AR/VR glasses, does not fulfill the requirements of real-time
decoding. Thus, massive encoding and decoding computation
requirements are one of the primary reasons that hinder the
provision of a 6-DoF experience on mobile devices.
3) Device mobility and portability.: Holographic PCV in-
troduces higher demands on device mobility and portability.
We can use cable-connected VR terminals or large display
screens to enable immersive experiences in panoramic and
360-degree VR videos, which is currently one of the primary
methods of immersive interactions. However, PCV will signif-
icantly reduce the 6-DoF experience if users are not free and
flexible to move and interact. Therefore, interactive devices
for holographic PCV need to provide free mobility. Portable
devices are crucial to providing a satisfying holographic PCV
experience.
B. Challenges
1) Disordered point cloud points and massive computing
demands challenge the traditional streaming pipeline.:Point
clouds are represented by massive disordered 3D points (X,
Y, Z) and colors (R, G, B). It requires hundreds of thousands
of points to clearly represent 6-DoF contents, which makes
the data volume of PCV much larger than that of 3-DoF
videos. In addition to the intuitive increase in data volume, we
have to address the challenges of compression, encoding, and
decoding for real-time PCV transmission. However, existing
encoding and decoding methods extending MPEG standards
are mainly for offline video services, which cannot provide
real-time decoding on mobile devices. Although some AI-
based compression techniques extract point cloud features and
acquire a better compression rate than traditional methods,
they require extensive GPU resources. Also, they have to
train another heavy neural network to reconstruct the original
point cloud, which also cannot provide real-time decoding
for PCV transmission. Hence, there is no existing end-to-end
lightweight AI network designed for point cloud transmission
from the original point cloud to the final rendering point cloud.
2) Intensive 6-DoF point cloud decoding challenges
resource-constrained mobile devices.:Compared with the
existing mature 3-DoF VR or 360-degree video, 6-DoF PCV
lacks efficient decoding algorithms. The computing capability
摘要:

TowardsHolographicVideoCommunications:APromisingAI-drivenSolutionYakunHuang,YuanweiZhu,XiuquanQiao,XiangSu,Member,IEEE,SchahramDustdar,Fellow,IEEE,PingZhang,Fellow,IEEEAbstract—Real-timeholographicvideocommunicationsenableimmersiveexperiencesfornext-generationvideoservicesinthefuturemetaverseera.How...

展开>> 收起<<
Towards Holographic Video Communications A Promising AI-driven Solution Yakun Huang Yuanwei Zhu Xiuquan Qiao Xiang Su Member IEEE.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:2.94MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注