Towards Holographic Video Communications A Promising AI-driven Solution Yakun Huang Yuanwei Zhu Xiuquan Qiao Xiang Su Member IEEE

2025-04-24 1 0 2.94MB 7 页 10玖币

侵权投诉

Towards Holographic Video Communications: A

Promising AI-driven Solution

Yakun Huang, Yuanwei Zhu, Xiuquan Qiao, Xiang Su, Member, IEEE,

Schahram Dustdar, Fellow, IEEE, Ping Zhang, Fellow, IEEE

Abstract—Real-time holographic video communications enable

immersive experiences for next-generation video services in the

future metaverse era. However, high-ﬁdelity holographic videos

require high bandwidth and signiﬁcant computation resources,

which exceed the transferring and computing capacity of 5G

networks. This article reviews state-of-the-art holographic point

cloud video (PCV) transmission techniques and highlights the

critical challenges of delivering such immersive services. We

further implement a preliminary prototype of an AI-driven

holographic video communication system and present critical

experimental results to evaluate its performance. Finally, we

identify future research directions and discuss potential solutions

for providing real-time and high-quality holographic experiences.

I. INTRODUCTION

The holographic video provides users with an immersive

six degrees of freedom (6-DoF) viewing experience than

traditional virtual reality (VR), 360-degree, and other 3-DoF

videos [1]. 6-DoF videos are characterized by having depth

information for each frame, providing 3-DoF of translational

movement (X, Y, Z) and 3-DoF of rotational movement (yaw,

pitch, roll). 6-DoF videos allow users to walk around an object

in a circle and view it from the top and the bottom. Point cloud

video (PCV), as a representative holographic 6-DoF video

service, describes the objects using a set of disordered 3D

points with coordinates and color. Figure 1 compares the PCV

transmission with different video services. PCV stream (e.g.,

capturing one second of raw PCV with one depth camera at 30

frames per second (FPS) produces 2.06 Gb of data) is highly

time- and resource-consuming for encoding and decoding,

requiring at least an hour for a common computer compared

with 3-DoF videos.

More importantly, PCV transmission requires a bandwidth

capacity of more than Gbps level, far beyond the current

transmission capacity of 5G networks. Undoubtedly, the holo-

graphic video introduces requirements far exceeding tradi-

tional video streaming services regarding network bandwidth,

transmission latency, and computing complexity.

We investigate the transmission techniques for PCV, includ-

ing point cloud compression and video streaming optimization.

Y. Huang, Y. Zhu, X. Qiao and P. Zhang are with the State Key Labora-

tory of Networking and Switching Technology, Beijing University of Posts

and Telecommunications, Beijing 100876, China. E-mail:{ykhuang, zhuyw,

qiaoxq, pzhang}@bupt.edu.cn. (X. Qiao is the corresponding author.)

X. Su is with the Department of Computer Science, Norwegian University

of Science and Technology, 2815 Gjøvik, Norway and University of Oulu,

90570, Oulu, Finland. Email:xiang.su@ntnu.no.

S. Dustdar is with the Distributed Systems Group, Technische Universit¨

Wien, 1040 Vienna, Austria. E-mail:dustdar@dsg.tuwien.ac.at.

For compression, traditional methods include Kdtree-based

and Octree-based solutions, such as the popular Point Cloud

Library (PCL) [2] and Draco [3]. ISO/IEC Moving Picture

Experts Group (MPEG) standardizes Video-based Point Cloud

Compression(V-PCC) and Geometry based Point Cloud Com-

pression (G-PCC) for PCV. However, these methods require

higher computing resources and costs than 3-DoF videos.

Besides, although some deep learning-based compression tech-

niques provide lower accuracy loss and higher compression

ratios [4], [5], they are only applicable for ofﬂine holographic

video pre-processing due to high computing overhead and

inference latency. For video streaming optimization, most

point video steaming techniques expand 3-DoF video stream-

ing methods such as tiling and view angle prediction. Since

PCV adds extra 3-DoF information, it requires more adap-

tive adjustment of streaming than 3-DoF with the dynamic

change in the physical distance between the user and the

scene. Some research investigates the combination of point

cloud compression and transmission optimization [6], [7]. For

streaming quality of service (QoS) management, Zhang et

al. [8] proposed a covering-based quality prediction method

to accurately predict the QoS, along with the query of quality

correlation (Q2C) model [9] for the QoS guarantee. However,

these solutions cannot be run in real-time on mobile devices

due to the massive cost of video compression and codecs.

We review related surveys, tutorials, and magazine pub-

lications on PCV, holographic video, and immersive video.

Liu et al. [1] discuss the challenges and solutions to adaptive

point cloud streaming and provide a prototype of extending

MPEG Dynamic Adaptive Streaming over HTTP (DASH).

Clemm et al. [10] articulate the networking challenges to

enable immersive holographic videos and propose new net-

work architecture for optimizing the coordination and synchro-

nization of concurrent streams. Hooft et al. [11] present the

status and challenges of 6-DoF media and Taleb et al. [12]

provide an overview of immersive services as well as the

relevant industry and standardization activities. These works

highlight the gap between existing streaming solutions and

implementing PCV transmission. Most solutions extend from

3-DoF video compression or adaptive streaming techniques

and fail to involve an AI-native PCV streaming.

This article introduces the landscape and requirements of

holographic PCV communication and analyses the technical

challenges associated with supporting PCV services. We pro-

pose an advanced AI-driven transmission solution as a proto-

type for preliminary exploration and verify its performance.

arXiv:2210.06794v1 [cs.MM] 13 Oct 2022

…

View 2 View 3

View n

Capturing Encoding Transmission

Point Cloud

Videos

Hologram

experience

Decoding

1080p@240 FPS

Ricoh Theta 4K @60FPS

Kinect v2 @30FPS ~ Hour Level (MEPG V-PCC)

~ 25 Mbps (4K)

~ 400 Mbps (4K)

> 1 Gbps (1080 P) ~

1Tbps (Thousands of concurrent streams)

H.266 (1080P 40~80 FPS)

1080P streams (40 FPS)

MPEG V-PCC (1~5 FPS )

°

Video











1080p@240 FPS

~ Minute Level

~ #view × 25 Mbps

Capturing Comparison Encoding Cost Transmission Cost Decoding Cost

~ Second Level

~ Second Level H.266 (1080P 40~80 FPS)

Fig. 1. Comparisons between holographic point cloud video with conventional videos.

Our contributions pertain to 1) a novel transmission mecha-

nism for holographic PCV; 2) the end-to-end network design

that joins the encoder and decoder; and 3) adaptive streaming

technology for proposed AI-driven video transmission. We fur-

ther discuss the proposed AI-driven communication technique

and identify future directions for high-quality PCV services.

II. REQUIREMENTS AND CHALLENGES

A. Requirements

Holographic PCV streaming poses signiﬁcant demands on

network transmission infrastructure in terms of ultra-low delay

and reliable network, heavy computation, and device mobility

and portability.

1) Ultra-low delay and reliable networks.: 6-DoF move-

ment and orientation features of PCV are more sensitive to

delay than 3-DoF video services, whose ideal delay require-

ment is less than 5 ms and is more stringent than that of

traditional 3-DoF videos (i.e., <20 ms) [13]. Since PCV

requires numerous depth cameras to capture data, this further

increases data volume than other types of videos. Therefore,

continuous and reliable transmission of multi-view captured

PCV streams requires a reliable network and lower network

jitter than 3-DoF video transmission.

2) Heavy computation.: Encoding and decoding a PCV

using the MPEG standard are computationally intensive, even

if we ignore the computations used for capturing. For example,

encoding a one-second video from the longdress dataset with

lossy compression requires 11 to 42 minutes using MPEG V-

PCC on a generic computer [14]. Although we can use high-

performance GPU servers to accelerate the encoding of PCV at

the sender, the computing capability of mobile devices, such as

AR/VR glasses, does not fulﬁll the requirements of real-time

decoding. Thus, massive encoding and decoding computation

requirements are one of the primary reasons that hinder the

provision of a 6-DoF experience on mobile devices.

3) Device mobility and portability.: Holographic PCV in-

troduces higher demands on device mobility and portability.

We can use cable-connected VR terminals or large display

screens to enable immersive experiences in panoramic and

360-degree VR videos, which is currently one of the primary

methods of immersive interactions. However, PCV will signif-

icantly reduce the 6-DoF experience if users are not free and

ﬂexible to move and interact. Therefore, interactive devices

for holographic PCV need to provide free mobility. Portable

devices are crucial to providing a satisfying holographic PCV

experience.

B. Challenges

1) Disordered point cloud points and massive computing

demands challenge the traditional streaming pipeline.:Point

clouds are represented by massive disordered 3D points (X,

Y, Z) and colors (R, G, B). It requires hundreds of thousands

of points to clearly represent 6-DoF contents, which makes

the data volume of PCV much larger than that of 3-DoF

videos. In addition to the intuitive increase in data volume, we

have to address the challenges of compression, encoding, and

decoding for real-time PCV transmission. However, existing

encoding and decoding methods extending MPEG standards

are mainly for ofﬂine video services, which cannot provide

real-time decoding on mobile devices. Although some AI-

based compression techniques extract point cloud features and

acquire a better compression rate than traditional methods,

they require extensive GPU resources. Also, they have to

train another heavy neural network to reconstruct the original

point cloud, which also cannot provide real-time decoding

for PCV transmission. Hence, there is no existing end-to-end

lightweight AI network designed for point cloud transmission

from the original point cloud to the ﬁnal rendering point cloud.

2) Intensive 6-DoF point cloud decoding challenges

resource-constrained mobile devices.:Compared with the

existing mature 3-DoF VR or 360-degree video, 6-DoF PCV

lacks efﬁcient decoding algorithms. The computing capability

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TowardsHolographicVideoCommunications:APromisingAI-drivenSolutionYakunHuang,YuanweiZhu,XiuquanQiao,XiangSu,Member,IEEE,SchahramDustdar,Fellow,IEEE,PingZhang,Fellow,IEEEAbstractReal-timeholographicvideocommunicationsenableimmersiveexperiencesfornext-generationvideoservicesinthefuturemetaverseera.How...

展开>> 收起<<

Towards Holographic Video Communications A Promising AI-driven Solution Yakun Huang Yuanwei Zhu Xiuquan Qiao Xiang Su Member IEEE.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Towards Holographic Video Communications A Promising AI-driven Solution Yakun Huang Yuanwei Zhu Xiuquan Qiao Xiang Su Member IEEE

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: