2
power allocation problems of WNCSs were investigated for
achieving the minimum overall transmission power consump-
tion while guaranteeing certain control performance. In [8],
a control-aware scheduler design problem was considered
based on the communications protocol of IEEE 802.15.4.
In [9], a communication protocol with variable packet length
was proposed and optimized for achieving the best control
performance. In [10], a transmission power allocation problem
of a WNCS with a coding-free communication protocol was
investigated, aiming to achieve optimal overall control perfor-
mance. In [11], a novel framework was developed for jointly
optimizing the communication design parameters to achieve
the best control performance. Note that all those works are
restricted to linear dynamical systems with linear control laws.
For remote state estimation of linear WNCSs, transmission
scheduling problems have drawn significant attention. In [12]–
[17], optimal scheduling policies were investigated for various
system setups to minimize average estimation errors.
In Stream 2, both the control and communication policies
are jointly optimized to achieve the overall control perfor-
mance. Stream 2 is more challenging due to the fact that the
joint policy has very large combined state and action spaces
when taking into account both control and communications
domains. In a nutshell, most co-design problems can be formu-
lated as dynamic decision-making ones. However, considering
large state and action spaces, conventional solutions such as
the Markov decision process cannot be applied due to the
curse-of-dimensionality. To solve this issue, most works in this
stream rely on deep-learning (DL) approaches with artificial
neural networks (NNs) for function approximations. In [18], a
deep reinforcement learning (DRL) approach was adopted to
learn both the control and the transmission scheduling signals.
In particular, DRL combines artificial NNs with a framework
of reinforcement learning that helps software agents learn
how to solve decision-making problems and reach their goals.
In [19], both the control policy and the dynamic transmission
power allocation policy were jointly optimized based on DRL.
It is worth noting that those DRL-based algorithms are model-
free and are applied to the practical WNCS scenario that does
not require accurate knowledge of the nonlinear system (plant)
models, while the conventional solutions are purely model-
based.
There are still many open problems in the area of control-
communications policy co-design with unknown nonlinear
system models. Many existing works, such as [18], [19],
assume that the sensor measurements are perfect and the
(uplink) communications between sensor-controller are error-
free. Under such an assumption, the controller has an accurate
plant state in real time for generating control signals. When
considering a practical uplink channel, the controller does not
always know the plant state and thus needs state estimation.
This requires estimation-control co-design. A key aspect is
that the estimation quality significantly depends on the age
of the sensor’s information available to the estimator, which
measures the time duration since the controller’s last packet
received from the sensor. Due to system dynamics and un-
certainties, a larger age-of-information (AoI) of the sensor
indicates a less reliable state estimate. For real-time control
applications, an estimate with a small AoI is more important
than the one with a large AoI. Such information about the data
importance needs to be taken into account for the controller’s
training. We note that the analysis and optimization of AoI
in different communication networks have drawn significant
attention during the past five years [20]. However, how to
leverage the AoI of sensor data for effectively training
a controller has not been considered before. Furthermore,
when considering DL-based estimator-control-communication
co-design, one needs to systematically design a joint train-
ing algorithm for achieving time and performance efficiency,
rather than training the three modules one by one. Otherwise,
the resulting estimation, control and communication policies
may not converge to desired ones, leading to poor overall
control performance of the WNCS. Due to aforementioned
difficulties, joint estimator-control-communication policy
learning for WNCSs has not been investigated in the open
literature.
In this work, we systematically investigate a DL-based
estimator-control-scheduler co-design framework for a model-
unknown WNCS with nonlinear dynamic systems. We con-
sider fading channels between sensor-controller and controller-
actuator. The major contributions are summarized as follows.
•We propose a novel DL-based WNCS over fading chan-
nels with time correlations. In particular, the AoI states of
the sensor’s information are utilized in the three modules
of estimator, controller, and scheduler; both the controller
and the scheduler leverage the fading channel states for
decision-making. The instantaneous and historical states
are utilized in each module. Co-design frameworks for
WNCSs with the awareness of AoI and channel states
have not been considered in the open literature.
•We develop a joint estimator-controller-scheduler train-
ing algorithm. In particular, we propose a DRL-based
algorithm for controller and scheduler optimization uti-
lizing both the model-free data that are received from
the sensor directly and the model-based data that are
generated by the estimator, when packet dropout occurs.
An AoI-based importance sampling algorithm that takes
into account the data accuracy is proposed for enhancing
learning efficiency. Moreover, we develop novel schemes
for enhancing the stability of joint training.
•Extensive experiments building on the OpenAI Gym
platform demonstrate that the proposed joint training
algorithm can effectively solve the estimation-control-
scheduling co-design problem in various scenarios. Re-
markable performance gains have been achieved com-
pared to the separative design and some benchmark
policies.
Outline: The system model of a general WNCS over fading
channels is described in Section II. The estimation and control
co-design problems of a low-mobility and a high-mobility
WNCS were investigated in Sections III and IV, respectively.
The numerical results are demonstrated and discussed in
Section V, followed by conclusions in Section VI.