Digital Twin-Based Multiple Access Optimization and Monitoring via Model-Driven Bayesian Learning

2025-04-26 0 0 542.54KB 6 页 10玖币
侵权投诉
Digital Twin-Based Multiple Access Optimization
and Monitoring via Model-Driven Bayesian
Learning
Clement Ruah, Student Member, IEEE, Osvaldo Simeone, Fellow, IEEE, and Bashir Al-Hashimi, Fellow, IEEE
Department of Engineering, King’s College London, London, UK
Abstract—Commonly adopted in the manufacturing and
aerospace sectors, digital twin (DT) platforms are increasingly
seen as a promising paradigm to control and monitor software-
based, “open”, communication systems, which play the role of the
physical twin (PT). In the general framework presented in this
work, the DT builds a Bayesian model of the communication
system, which is leveraged to enable core DT functionalities
such as control via multi-agent reinforcement learning (MARL)
and monitoring of the PT for anomaly detection. We specifically
investigate the application of the proposed framework to a simple
case-study system encompassing multiple sensing devices that
report to a common receiver. The Bayesian model trained at the
DT has the key advantage of capturing epistemic uncertainty
regarding the communication system, e.g., regarding current traf-
fic conditions, which arise from limited PT-to-DT data transfer.
Experimental results validate the effectiveness of the proposed
Bayesian framework as compared to standard frequentist model-
based solutions.
Index Terms—Digital Twin, 6G, Reinforcement Learning,
Bayesian Learning, Model-based Learning
I. INTRODUCTION
A. Context and Motivation
Adigital twin (DT) platform can be viewed as a cyber-
physical system in which a physical entity, referred to
as the physical twin (PT), and a virtual model, known as the
DT, interact based on an automatized bi-directional flow of
information [1], [2]. Based on data received from the PT, the
DT maintains an up-to-date model of the PT [3], which is
leveraged to control, monitor, and analyze the operation of the
PT [4]. DT platforms are increasingly regarded as an enabling
technology for wireless cellular systems built on the open
networking principles of disaggregation and virtualization [5],
which are expected to be central to 6G [6].
This paper introduces a general framework based on
Bayesian learning for the design of a DT platform imple-
menting the functions of control and monitoring of a wireless
system (see Fig. 1). In the proposed framework, the DT builds
aBayesian model of the system dynamics based on data
C. Ruah and O. Simeone are with King’s Communications, Learning
& Information Processing (KCLIP) Lab. The work of O. Simeone was
supported by the European Research Council (ERC) under the European
Union’s Horizon 2020 Research and Innovation Programme (grant agreement
No. 725732) and by an Open Fellowship of the EPSRC. The work of C.
Ruah was supported by the Faculty of Natural, Mathematical, and Engineering
Sciences at King’s College London.
Device
Device
Data generation
MPR Channel
Base
Station
Environment
Physical Twin
Virtual
rollouts
Analysis
Model
Learning
Policy
Optimization
1 2
Prediction,
Monitoring,
Counterfactual analysis
3
Actions,
Observations
Decentralized
Policies
Digital Twin
Control,
Diagnostics
Fig. 1: The digital twin (DT) platform for the control and anal-
ysis of the communication system studied in this work. The
physical twin (PT) consists of a group of Kdevices receiving
correlated data and communicating over a shared multi-packet
reception (MPR) channel. The DT platform operates along the
phases of model learning 1
and policy optimization 2
; while
also enabling functionalities such as prediction, counterfactual
analysis and monitoring 3
.
received from the PT. The model is then leveraged to optimize
transmission policies via Bayesian multi-agent reinforcement
learning (MARL), while also enabling monitoring for anomaly
detection, without the need to integrate additional data from
the PT. Unlike standard frequentist models, Bayesian models
have the key property of providing an accurate quantification
of epistemic uncertainty arising from limited PT-to-DT com-
munication, and hence limited data at the DT. As a possible
embodiment of the proposed approach, the DT platform may
be implemented as an xApp, or as a collection of connected
xApps, that run in the near-real-time RAN Intelligent Con-
troller (RIC) of an Open-RAN (O-RAN) architecture [7].
arXiv:2210.05582v3 [eess.SP] 27 Jan 2023
As an exemplifying case study, we consider a multi-access
PT system consisting of a radio access network (RAN) similar
to that studied in [8]–[10]. It is emphasized that, unlike [8]–
[10], our goal here is not to address a particular task via
MARL, but rather to introduce a general framework supporting
the implementation of multiple functionalities at the DT, in-
cluding control via MARL and monitoring, despite the limited
data transfer from the PT to the DT.
B. Related Work
DT-aided control of PT systems is often formulated as a
model-based reinforcement learning (RL) problem in which
the DT is leveraged as simulation platform to optimize the PT
policy [11]–[13]. In [14], a Bayesian model is deployed by the
DT to enable the quantification epistemic uncertainty. Existing
work on DT platforms for wireless systems has investigated
mechanisms for DT-PT synchronization [15] and DT-aided
network optimization and monitoring [16], as well as DT-
based control for computation offloading via model-based RL
[11]. To the best of our knowledge, applications of DT relying
on Bayesian learning for communication systems have not
been reported in the literature.
C. Main Contributions
The main contributions of this paper are as follows:
We propose a Bayesian framework to control and monitor
an AI-native wireless system, using a multi-access RAN as an
exemplifying case study [8]–[10]. In the proposed approach,
as illustrated in Fig. 1, a Bayesian model learning phase is
followed by policy optimization and monitoring phases, which
leverage the uncertainty quantification capacity of Bayesian
models.
A key challenge in the definition of a Bayesian model
is the choice of a domain-specific factorization of the joint
distribution of all variables of interest. This paper elucidates
this design choice for a multi-access system consisting of
sensing devices with correlated packet arrivals reporting to
a common receiver through a shared multi-packet reception
(MPR) channel [17]. This case study is relevant for Internet-
of-Things (IoT) and machine-type communications.
Experimental results confirm the advantages of the proposed
Bayesian framework as compared to conventional frequentist
model-based approaches in terms of metrics such as through-
put and buffer overflow, as well as area under the receiver
operating curve (ROC) for anomaly detection at the DT.
II. BAYESIAN DT FRAMEWORK
In this section, we formally define a PT system compris-
ing multiple network elements, such as mobile devices or
infrastructure nodes, referred to as agents. We then present
a Bayesian DT framework to estimate the PT dynamics,
optimize the agents decisions, and monitor possible anomalies
in the PT.
Fig. 2: Example of factorization in (1), excluding the variables
corresponding to previous time steps t1.
A. Multi-Agent PT System
The PT system of interest consists of Kagents, e.g., K
sensing devices, indexed by k∈ K ={1, . . . , K}that operate
over a discrete time index t= 1,2, . . . , e.g., over time slots. At
each time t, each agent takes an action ak
t, e.g., a decision on
whether to transmit a packet from its queue or not. The action
is selected by following a policy that leverages information
collected by the agent regarding the current state stof the
overall system, which may include include, e.g., packet queue
lengths. The state stevolves according to some ground-truth
transition probability T(st+1|st, at), such that the probability
distribution of the next state st+1 T(st+1|st, at)depends
on the current state stand joint action at= (a1
t, . . . , aK
t).
We restrict our framework to the case of jointly observable
states [18], in which the state stcan be identified if one has
access to all observations ok
tmade by all agents k∈ K at
time t, i.e., in which the state is a function of the collection
ot= (o1
t, . . . , oK
t)for all times t.
Agents in the PT cannot communicate, and hence the overall
information available at agent kup to time tis contained in
its action-observation history hk
t= (ok
1, ak
1, ok
2, . . . , ak
t1, ok
t).
Accordingly, the behaviour of agent kis defined by its policy
πk(ak
t|hk
t), which defines the probability of each possible
action ak
tbased on the available information hk
t.
The state of a communication system typically comprises
several substates, describing, e.g., the current traffic conditions
or the quality of the wireless channel. As a result, one can
typically partition the state variables stinto Moperationally
distinct subsets {si
t}M
i=1 indexed by i. To describe the interac-
tions among these subsets, we introduce a Bayesian network
defined by a directed acyclic graph, as illustrated in Fig. 2,
in which each subset si
tis directly affected by the subset of
“parent” variables sP(i)
tst(see, e.g., [19]). Accordingly,
the transition probability is assumed to factorize as
T(st+1|st, at) =
M
Y
i=1
Tisi
t+1
sP(i)
t+1 , st, at,(1)
where the conditional distribution Ti(si
t+1|sP(i)
t+1 , st, at)de-
scribes the evolution of the next states variables si
t+1 given
the current state st, action at, and parent variables sP(i)
t+1 .
In general, the distribution Ti(si
t+1|sP(i)
t+1 , st, at)depends on
some sufficient statistic of variables stand at, which may be
a function of subsets of such variables. We refer to Sec. III-A
for an instance of model (1).
B. Model Learning
The goal of the model learning phase (phase 1
in Fig. 1)
at the DT is to obtain an estimate of the PT system dynamics
摘要:

DigitalTwin-BasedMultipleAccessOptimizationandMonitoringviaModel-DrivenBayesianLearningClementRuah,StudentMember,IEEE,OsvaldoSimeone,Fellow,IEEE,andBashirAl-Hashimi,Fellow,IEEEDepartmentofEngineering,King'sCollegeLondon,London,UKAbstract—Commonlyadoptedinthemanufacturingandaerospacesectors,digitaltw...

展开>> 收起<<
Digital Twin-Based Multiple Access Optimization and Monitoring via Model-Driven Bayesian Learning.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:542.54KB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注