Towards Holographic Video Communications: A
Promising AI-driven Solution
Yakun Huang, Yuanwei Zhu, Xiuquan Qiao, Xiang Su, Member, IEEE,
Schahram Dustdar, Fellow, IEEE, Ping Zhang, Fellow, IEEE
Abstract—Real-time holographic video communications enable
immersive experiences for next-generation video services in the
future metaverse era. However, high-fidelity holographic videos
require high bandwidth and significant computation resources,
which exceed the transferring and computing capacity of 5G
networks. This article reviews state-of-the-art holographic point
cloud video (PCV) transmission techniques and highlights the
critical challenges of delivering such immersive services. We
further implement a preliminary prototype of an AI-driven
holographic video communication system and present critical
experimental results to evaluate its performance. Finally, we
identify future research directions and discuss potential solutions
for providing real-time and high-quality holographic experiences.
I. INTRODUCTION
The holographic video provides users with an immersive
six degrees of freedom (6-DoF) viewing experience than
traditional virtual reality (VR), 360-degree, and other 3-DoF
videos [1]. 6-DoF videos are characterized by having depth
information for each frame, providing 3-DoF of translational
movement (X, Y, Z) and 3-DoF of rotational movement (yaw,
pitch, roll). 6-DoF videos allow users to walk around an object
in a circle and view it from the top and the bottom. Point cloud
video (PCV), as a representative holographic 6-DoF video
service, describes the objects using a set of disordered 3D
points with coordinates and color. Figure 1 compares the PCV
transmission with different video services. PCV stream (e.g.,
capturing one second of raw PCV with one depth camera at 30
frames per second (FPS) produces 2.06 Gb of data) is highly
time- and resource-consuming for encoding and decoding,
requiring at least an hour for a common computer compared
with 3-DoF videos.
More importantly, PCV transmission requires a bandwidth
capacity of more than Gbps level, far beyond the current
transmission capacity of 5G networks. Undoubtedly, the holo-
graphic video introduces requirements far exceeding tradi-
tional video streaming services regarding network bandwidth,
transmission latency, and computing complexity.
We investigate the transmission techniques for PCV, includ-
ing point cloud compression and video streaming optimization.
Y. Huang, Y. Zhu, X. Qiao and P. Zhang are with the State Key Labora-
tory of Networking and Switching Technology, Beijing University of Posts
and Telecommunications, Beijing 100876, China. E-mail:{ykhuang, zhuyw,
qiaoxq, pzhang}@bupt.edu.cn. (X. Qiao is the corresponding author.)
X. Su is with the Department of Computer Science, Norwegian University
of Science and Technology, 2815 Gjøvik, Norway and University of Oulu,
90570, Oulu, Finland. Email:xiang.su@ntnu.no.
S. Dustdar is with the Distributed Systems Group, Technische Universit¨
at
Wien, 1040 Vienna, Austria. E-mail:dustdar@dsg.tuwien.ac.at.
For compression, traditional methods include Kdtree-based
and Octree-based solutions, such as the popular Point Cloud
Library (PCL) [2] and Draco [3]. ISO/IEC Moving Picture
Experts Group (MPEG) standardizes Video-based Point Cloud
Compression(V-PCC) and Geometry based Point Cloud Com-
pression (G-PCC) for PCV. However, these methods require
higher computing resources and costs than 3-DoF videos.
Besides, although some deep learning-based compression tech-
niques provide lower accuracy loss and higher compression
ratios [4], [5], they are only applicable for offline holographic
video pre-processing due to high computing overhead and
inference latency. For video streaming optimization, most
point video steaming techniques expand 3-DoF video stream-
ing methods such as tiling and view angle prediction. Since
PCV adds extra 3-DoF information, it requires more adap-
tive adjustment of streaming than 3-DoF with the dynamic
change in the physical distance between the user and the
scene. Some research investigates the combination of point
cloud compression and transmission optimization [6], [7]. For
streaming quality of service (QoS) management, Zhang et
al. [8] proposed a covering-based quality prediction method
to accurately predict the QoS, along with the query of quality
correlation (Q2C) model [9] for the QoS guarantee. However,
these solutions cannot be run in real-time on mobile devices
due to the massive cost of video compression and codecs.
We review related surveys, tutorials, and magazine pub-
lications on PCV, holographic video, and immersive video.
Liu et al. [1] discuss the challenges and solutions to adaptive
point cloud streaming and provide a prototype of extending
MPEG Dynamic Adaptive Streaming over HTTP (DASH).
Clemm et al. [10] articulate the networking challenges to
enable immersive holographic videos and propose new net-
work architecture for optimizing the coordination and synchro-
nization of concurrent streams. Hooft et al. [11] present the
status and challenges of 6-DoF media and Taleb et al. [12]
provide an overview of immersive services as well as the
relevant industry and standardization activities. These works
highlight the gap between existing streaming solutions and
implementing PCV transmission. Most solutions extend from
3-DoF video compression or adaptive streaming techniques
and fail to involve an AI-native PCV streaming.
This article introduces the landscape and requirements of
holographic PCV communication and analyses the technical
challenges associated with supporting PCV services. We pro-
pose an advanced AI-driven transmission solution as a proto-
type for preliminary exploration and verify its performance.
arXiv:2210.06794v1 [cs.MM] 13 Oct 2022