Digital Twin-Based Multiple Access Optimization and Monitoring via Model-Driven Bayesian Learning

2025-04-26 0 0 542.54KB 6 页 10玖币

侵权投诉

Digital Twin-Based Multiple Access Optimization

and Monitoring via Model-Driven Bayesian

Learning

Clement Ruah, Student Member, IEEE, Osvaldo Simeone, Fellow, IEEE, and Bashir Al-Hashimi, Fellow, IEEE

Department of Engineering, King’s College London, London, UK

Abstract—Commonly adopted in the manufacturing and

aerospace sectors, digital twin (DT) platforms are increasingly

seen as a promising paradigm to control and monitor software-

based, “open”, communication systems, which play the role of the

physical twin (PT). In the general framework presented in this

work, the DT builds a Bayesian model of the communication

system, which is leveraged to enable core DT functionalities

such as control via multi-agent reinforcement learning (MARL)

and monitoring of the PT for anomaly detection. We speciﬁcally

investigate the application of the proposed framework to a simple

case-study system encompassing multiple sensing devices that

report to a common receiver. The Bayesian model trained at the

DT has the key advantage of capturing epistemic uncertainty

regarding the communication system, e.g., regarding current traf-

ﬁc conditions, which arise from limited PT-to-DT data transfer.

Experimental results validate the effectiveness of the proposed

Bayesian framework as compared to standard frequentist model-

based solutions.

Index Terms—Digital Twin, 6G, Reinforcement Learning,

Bayesian Learning, Model-based Learning

I. INTRODUCTION

A. Context and Motivation

Adigital twin (DT) platform can be viewed as a cyber-

physical system in which a physical entity, referred to

as the physical twin (PT), and a virtual model, known as the

DT, interact based on an automatized bi-directional ﬂow of

information [1], [2]. Based on data received from the PT, the

DT maintains an up-to-date model of the PT [3], which is

leveraged to control, monitor, and analyze the operation of the

PT [4]. DT platforms are increasingly regarded as an enabling

technology for wireless cellular systems built on the open

networking principles of disaggregation and virtualization [5],

which are expected to be central to 6G [6].

This paper introduces a general framework based on

Bayesian learning for the design of a DT platform imple-

menting the functions of control and monitoring of a wireless

system (see Fig. 1). In the proposed framework, the DT builds

aBayesian model of the system dynamics based on data

C. Ruah and O. Simeone are with King’s Communications, Learning

& Information Processing (KCLIP) Lab. The work of O. Simeone was

supported by the European Research Council (ERC) under the European

Union’s Horizon 2020 Research and Innovation Programme (grant agreement

No. 725732) and by an Open Fellowship of the EPSRC. The work of C.

Ruah was supported by the Faculty of Natural, Mathematical, and Engineering

Sciences at King’s College London.

Device

Data generation

MPR Channel

Base

Station

Environment

Physical Twin

Virtual

rollouts

Analysis

Model

Learning

Policy

Optimization

1 2

Prediction,

Monitoring,

Counterfactual analysis

Actions,

Observations

Decentralized

Policies

Digital Twin

Control,

Diagnostics

Fig. 1: The digital twin (DT) platform for the control and anal-

ysis of the communication system studied in this work. The

physical twin (PT) consists of a group of Kdevices receiving

correlated data and communicating over a shared multi-packet

reception (MPR) channel. The DT platform operates along the

phases of model learning 1

and policy optimization 2

; while

also enabling functionalities such as prediction, counterfactual

analysis and monitoring 3

.

received from the PT. The model is then leveraged to optimize

transmission policies via Bayesian multi-agent reinforcement

learning (MARL), while also enabling monitoring for anomaly

detection, without the need to integrate additional data from

the PT. Unlike standard frequentist models, Bayesian models

have the key property of providing an accurate quantiﬁcation

of epistemic uncertainty arising from limited PT-to-DT com-

munication, and hence limited data at the DT. As a possible

embodiment of the proposed approach, the DT platform may

be implemented as an xApp, or as a collection of connected

xApps, that run in the near-real-time RAN Intelligent Con-

troller (RIC) of an Open-RAN (O-RAN) architecture [7].

arXiv:2210.05582v3 [eess.SP] 27 Jan 2023

As an exemplifying case study, we consider a multi-access

PT system consisting of a radio access network (RAN) similar

to that studied in [8]–[10]. It is emphasized that, unlike [8]–

[10], our goal here is not to address a particular task via

MARL, but rather to introduce a general framework supporting

the implementation of multiple functionalities at the DT, in-

cluding control via MARL and monitoring, despite the limited

data transfer from the PT to the DT.

B. Related Work

DT-aided control of PT systems is often formulated as a

model-based reinforcement learning (RL) problem in which

the DT is leveraged as simulation platform to optimize the PT

policy [11]–[13]. In [14], a Bayesian model is deployed by the

DT to enable the quantiﬁcation epistemic uncertainty. Existing

work on DT platforms for wireless systems has investigated

mechanisms for DT-PT synchronization [15] and DT-aided

network optimization and monitoring [16], as well as DT-

based control for computation ofﬂoading via model-based RL

[11]. To the best of our knowledge, applications of DT relying

on Bayesian learning for communication systems have not

been reported in the literature.

C. Main Contributions

The main contributions of this paper are as follows:

•We propose a Bayesian framework to control and monitor

an AI-native wireless system, using a multi-access RAN as an

exemplifying case study [8]–[10]. In the proposed approach,

as illustrated in Fig. 1, a Bayesian model learning phase is

followed by policy optimization and monitoring phases, which

leverage the uncertainty quantiﬁcation capacity of Bayesian

models.

•A key challenge in the deﬁnition of a Bayesian model

is the choice of a domain-speciﬁc factorization of the joint

distribution of all variables of interest. This paper elucidates

this design choice for a multi-access system consisting of

sensing devices with correlated packet arrivals reporting to

a common receiver through a shared multi-packet reception

(MPR) channel [17]. This case study is relevant for Internet-

of-Things (IoT) and machine-type communications.

•Experimental results conﬁrm the advantages of the proposed

Bayesian framework as compared to conventional frequentist

model-based approaches in terms of metrics such as through-

put and buffer overﬂow, as well as area under the receiver

operating curve (ROC) for anomaly detection at the DT.

II. BAYESIAN DT FRAMEWORK

In this section, we formally deﬁne a PT system compris-

ing multiple network elements, such as mobile devices or

infrastructure nodes, referred to as agents. We then present

a Bayesian DT framework to estimate the PT dynamics,

optimize the agents decisions, and monitor possible anomalies

in the PT.

Fig. 2: Example of factorization in (1), excluding the variables

corresponding to previous time steps t−1.

A. Multi-Agent PT System

The PT system of interest consists of Kagents, e.g., K

sensing devices, indexed by k∈ K ={1, . . . , K}that operate

over a discrete time index t= 1,2, . . . , e.g., over time slots. At

each time t, each agent takes an action ak

t, e.g., a decision on

whether to transmit a packet from its queue or not. The action

is selected by following a policy that leverages information

collected by the agent regarding the current state stof the

overall system, which may include include, e.g., packet queue

lengths. The state stevolves according to some ground-truth

transition probability T(st+1|st, at), such that the probability

distribution of the next state st+1 ∼T(st+1|st, at)depends

on the current state stand joint action at= (a1

t, . . . , aK

t).

We restrict our framework to the case of jointly observable

states [18], in which the state stcan be identiﬁed if one has

access to all observations ok

tmade by all agents k∈ K at

time t, i.e., in which the state is a function of the collection

ot= (o1

t, . . . , oK

t)for all times t.

Agents in the PT cannot communicate, and hence the overall

information available at agent kup to time tis contained in

its action-observation history hk

t= (ok

1, ak

1, ok

2, . . . , ak

t−1, ok

t).

Accordingly, the behaviour of agent kis deﬁned by its policy

πk(ak

t|hk

t), which deﬁnes the probability of each possible

action ak

tbased on the available information hk

The state of a communication system typically comprises

several substates, describing, e.g., the current trafﬁc conditions

or the quality of the wireless channel. As a result, one can

typically partition the state variables stinto Moperationally

distinct subsets {si

t}M

i=1 indexed by i. To describe the interac-

tions among these subsets, we introduce a Bayesian network

deﬁned by a directed acyclic graph, as illustrated in Fig. 2,

in which each subset si

tis directly affected by the subset of

“parent” variables sP(i)

t⊆st(see, e.g., [19]). Accordingly,

the transition probability is assumed to factorize as

T(st+1|st, at) =

i=1

Tisi

t+1



sP(i)

t+1 , st, at,(1)

where the conditional distribution Ti(si

t+1|sP(i)

t+1 , st, at)de-

scribes the evolution of the next states variables si

t+1 given

the current state st, action at, and parent variables sP(i)

t+1 .

In general, the distribution Ti(si

t+1|sP(i)

t+1 , st, at)depends on

some sufﬁcient statistic of variables stand at, which may be

a function of subsets of such variables. We refer to Sec. III-A

for an instance of model (1).

B. Model Learning

The goal of the model learning phase (phase 1

in Fig. 1)

at the DT is to obtain an estimate of the PT system dynamics

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DigitalTwin-BasedMultipleAccessOptimizationandMonitoringviaModel-DrivenBayesianLearningClementRuah,StudentMember,IEEE,OsvaldoSimeone,Fellow,IEEE,andBashirAl-Hashimi,Fellow,IEEEDepartmentofEngineering,King'sCollegeLondon,London,UKAbstractCommonlyadoptedinthemanufacturingandaerospacesectors,digitaltw...

展开>> 收起<<

Digital Twin-Based Multiple Access Optimization and Monitoring via Model-Driven Bayesian Learning.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Digital Twin-Based Multiple Access Optimization and Monitoring via Model-Driven Bayesian Learning

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: