Safe and Efficient Switching Mechanism Design for
Uncertified Linear Controller
Yiwen Lu and Yilin Mo
Abstract—Sustained research efforts have been devoted to learning
optimal controllers for linear stochastic dynamical systems with unknown
parameters, but due to the corruption of noise, learned controllers are
usually uncertified in the sense that they may destabilize the system.
To address this potential instability, we propose a “plug-and-play”
modification to the uncertified controller which falls back to a known
stabilizing controller when the norm of the difference between the
uncertified and the fall-back control input exceeds a certain threshold.
We show that the switching strategy is both safe and efficient, in the
sense that: 1) the linear-quadratic cost of the system is always bounded
even if original uncertified controller is destabilizing; 2) in case the
uncertified controller is stabilizing, the performance loss caused by
switching converges super-exponentially to 0for Gaussian noise, while
the converging polynomially for general heavy-tailed noise. Finally, we
demonstrate the effectiveness of the proposed switching strategy via
numerical simulation on the Tennessee Eastman Process.
I. INTRODUCTION
Learning a controller from noisy data for an unknown system
has been a central topic to adaptive control and reinforcement
learning [1], [2], [3], [4] for the past decades. A main challenge
to directly applying the learned controllers to the system is that
they are usually uncertified, in the sense it can be very difficult
to guarantee the stability of such controllers due to process and
measurement noise. One way to address this challenge is to deploy an
additional safeguard mechanism. In particular, assuming the existence
of a known stabilizing controller, empirically the safeguard may be
implemented by falling back to the stabilizing controller from the
uncertified controller, when potential safety breach is detected.
Motivated by the above intuition, this paper proposes such a
switching strategy, provides a formal safety guarantee and quan-
tifies the performance loss incurred by the safeguard mechanism,
for discrete-time Linear-Quadratic Regulation (LQR) setting with
independent and identically distributed process noise with bounded
fourth-order moment. We assume the existence of a known stabilizing
linear feedback control law u=K0x, which can be achieved either
when the system is known to be open-loop stable (in which case
K0= 0), or through adaptive stabilization methods [5], [6]. Given
an uncertified linear feedback control gain K1, a modification to the
control law u=K1xis proposed: the controller normally applies
u=K1x, but falls back to u=K0xfor tconsecutive steps once
k(K1−K0)xkexceeds a threshold M. The proposed strategy is
analyzed from both stability and optimality aspects. In particular, the
main results include:
1) We prove the LQ cost of the proposed controller is always
bounded, even if K1is destabilizing. This fact implies that
the proposed strategy enhances the safety of the uncertified
controller by preventing the system from being catastrophically
destabilized.
2) Provided K1is stabilizing, and M, t are chosen properly, we
compare the LQ cost of the proposed strategy with that of the
This work is supported by the National Key Research and Develop-
ment Program of China under Grant 2018AAA0101601. The authors are
with the Department of Automation and BNRist, Tsinghua University,
Beijing, P.R.China. Emails: luyw20@mails.tsinghua.edu.cn,
ylmo@tsinghua.edu.cn.
linear feedback control law u=K1x, and quantify the maxi-
mum increase in LQ cost caused by switching w.r.t. the strategy
hyper-parameters M, t as merely O(t1/4exp(−constant·M2))
in the case of Gaussian process noise, which decays super-
exponentially as the switching threshold Mtends to infinity. We
also discuss the extension to general noise distributions with
bounded fourth-order moments, where the above asymptotic
performance gap becomes O(t1/4M−1).
The performance of the proposed switching scheme is further vali-
dated by simulation on the Tennessee Eastman Process example. We
envision that the switching framework could be potentially applicable
in a wider range of learning-based control settings, since it may
combine the good empirical performance of learned policies and
the stability guarantees of classical controllers, and the “plug-and-
play” nature of the switching logic may minimize the required
modifications to existing learning schemes.
A preliminary version of this paper [7] has been submitted to IEEE
CDC 2022. The main contributions of the current manuscript over
the conference submission are: i) the switching scheme has been
redesigned, such that the upper bound on LQ cost (Theorem 2) no
longer depends on K1; ii) the conclusions have been extended to
noise distributions with bounded fourth-order moments; iii) proofs
of all theoretical results are included in the current version of the
manuscript.
Related Works
Switched control systems: Supervisory algorithms have been
developed to stabilize switched linear systems [8], [9], [10], and other
nonlinear systems that are difficult to stabilize globally with a single
controller [11], [12], [13]. However, most of the paper focuses on the
stability of the switched system, while the (near-)optimality of the
controllers are less discussed. Building upon this vein of literature,
the idea of switching between certified and uncertified controllers to
improve performance was proposed in [14], whose scheme guarantees
global stability for general nonlinear systems under mild assumptions.
However, no quantitative analysis of the performance under switching
is provided. In contrast, we specialize our results for linear systems
and prove that switching may induce only negligible performance
loss while ensuring safety.
Adaptive LQR: Adaptive and learned LQR has drawn significant
research attention in recent years, for which high-probability estima-
tion error and regret bounds have been proved for methods including
optimism-in-face-of-uncertainty [15], [16], thompson sampling [17],
policy gradient [18], robust control based on coarse identification [19]
and certainty equivalence [20], [21], [22], [23]. All the above
approaches, however, involve applying a linear controller learned
from finite noise-corrupted data, which has a nonzero probability
of being destabilizing. Furthermore, given a fixed length of data, the
failure probabilities of the aforementioned methods depend on either
unknown system parameters or statistics of online data, which implies
the failure probability cannot be determined a priori, and hence it
can be challenging to design an algorithm that strictly satisfies a pre-
defined specification of safety. In [24], a “cutoff” method similar to
arXiv:2210.14595v1 [eess.SY] 26 Oct 2022