Safe and Efficient Switching Mechanism Design for Uncertified Linear Controller Yiwen Lu and Yilin Mo

2025-05-03 0 0 426.87KB 8 页 10玖币
侵权投诉
Safe and Efficient Switching Mechanism Design for
Uncertified Linear Controller
Yiwen Lu and Yilin Mo
Abstract—Sustained research efforts have been devoted to learning
optimal controllers for linear stochastic dynamical systems with unknown
parameters, but due to the corruption of noise, learned controllers are
usually uncertified in the sense that they may destabilize the system.
To address this potential instability, we propose a “plug-and-play”
modification to the uncertified controller which falls back to a known
stabilizing controller when the norm of the difference between the
uncertified and the fall-back control input exceeds a certain threshold.
We show that the switching strategy is both safe and efficient, in the
sense that: 1) the linear-quadratic cost of the system is always bounded
even if original uncertified controller is destabilizing; 2) in case the
uncertified controller is stabilizing, the performance loss caused by
switching converges super-exponentially to 0for Gaussian noise, while
the converging polynomially for general heavy-tailed noise. Finally, we
demonstrate the effectiveness of the proposed switching strategy via
numerical simulation on the Tennessee Eastman Process.
I. INTRODUCTION
Learning a controller from noisy data for an unknown system
has been a central topic to adaptive control and reinforcement
learning [1], [2], [3], [4] for the past decades. A main challenge
to directly applying the learned controllers to the system is that
they are usually uncertified, in the sense it can be very difficult
to guarantee the stability of such controllers due to process and
measurement noise. One way to address this challenge is to deploy an
additional safeguard mechanism. In particular, assuming the existence
of a known stabilizing controller, empirically the safeguard may be
implemented by falling back to the stabilizing controller from the
uncertified controller, when potential safety breach is detected.
Motivated by the above intuition, this paper proposes such a
switching strategy, provides a formal safety guarantee and quan-
tifies the performance loss incurred by the safeguard mechanism,
for discrete-time Linear-Quadratic Regulation (LQR) setting with
independent and identically distributed process noise with bounded
fourth-order moment. We assume the existence of a known stabilizing
linear feedback control law u=K0x, which can be achieved either
when the system is known to be open-loop stable (in which case
K0= 0), or through adaptive stabilization methods [5], [6]. Given
an uncertified linear feedback control gain K1, a modification to the
control law u=K1xis proposed: the controller normally applies
u=K1x, but falls back to u=K0xfor tconsecutive steps once
k(K1K0)xkexceeds a threshold M. The proposed strategy is
analyzed from both stability and optimality aspects. In particular, the
main results include:
1) We prove the LQ cost of the proposed controller is always
bounded, even if K1is destabilizing. This fact implies that
the proposed strategy enhances the safety of the uncertified
controller by preventing the system from being catastrophically
destabilized.
2) Provided K1is stabilizing, and M, t are chosen properly, we
compare the LQ cost of the proposed strategy with that of the
This work is supported by the National Key Research and Develop-
ment Program of China under Grant 2018AAA0101601. The authors are
with the Department of Automation and BNRist, Tsinghua University,
Beijing, P.R.China. Emails: luyw20@mails.tsinghua.edu.cn,
ylmo@tsinghua.edu.cn.
linear feedback control law u=K1x, and quantify the maxi-
mum increase in LQ cost caused by switching w.r.t. the strategy
hyper-parameters M, t as merely O(t1/4exp(constant·M2))
in the case of Gaussian process noise, which decays super-
exponentially as the switching threshold Mtends to infinity. We
also discuss the extension to general noise distributions with
bounded fourth-order moments, where the above asymptotic
performance gap becomes O(t1/4M1).
The performance of the proposed switching scheme is further vali-
dated by simulation on the Tennessee Eastman Process example. We
envision that the switching framework could be potentially applicable
in a wider range of learning-based control settings, since it may
combine the good empirical performance of learned policies and
the stability guarantees of classical controllers, and the “plug-and-
play” nature of the switching logic may minimize the required
modifications to existing learning schemes.
A preliminary version of this paper [7] has been submitted to IEEE
CDC 2022. The main contributions of the current manuscript over
the conference submission are: i) the switching scheme has been
redesigned, such that the upper bound on LQ cost (Theorem 2) no
longer depends on K1; ii) the conclusions have been extended to
noise distributions with bounded fourth-order moments; iii) proofs
of all theoretical results are included in the current version of the
manuscript.
Related Works
Switched control systems: Supervisory algorithms have been
developed to stabilize switched linear systems [8], [9], [10], and other
nonlinear systems that are difficult to stabilize globally with a single
controller [11], [12], [13]. However, most of the paper focuses on the
stability of the switched system, while the (near-)optimality of the
controllers are less discussed. Building upon this vein of literature,
the idea of switching between certified and uncertified controllers to
improve performance was proposed in [14], whose scheme guarantees
global stability for general nonlinear systems under mild assumptions.
However, no quantitative analysis of the performance under switching
is provided. In contrast, we specialize our results for linear systems
and prove that switching may induce only negligible performance
loss while ensuring safety.
Adaptive LQR: Adaptive and learned LQR has drawn significant
research attention in recent years, for which high-probability estima-
tion error and regret bounds have been proved for methods including
optimism-in-face-of-uncertainty [15], [16], thompson sampling [17],
policy gradient [18], robust control based on coarse identification [19]
and certainty equivalence [20], [21], [22], [23]. All the above
approaches, however, involve applying a linear controller learned
from finite noise-corrupted data, which has a nonzero probability
of being destabilizing. Furthermore, given a fixed length of data, the
failure probabilities of the aforementioned methods depend on either
unknown system parameters or statistics of online data, which implies
the failure probability cannot be determined a priori, and hence it
can be challenging to design an algorithm that strictly satisfies a pre-
defined specification of safety. In [24], a “cutoff” method similar to
arXiv:2210.14595v1 [eess.SY] 26 Oct 2022
the switching strategy described in the present paper is applied in an
attempt to establish almost sure guarantees for adaptive LQR, which
are nevertheless asymptotic in nature, and the extra cost caused by
switching is not analyzed. By contrast, this manuscript provides both
non-asymptotic and asymptotic bounds for the switching strategy.
Nonlinear controller for LQR: Nonlinearity in the control of
linear systems has been studied mainly due to practical concerns such
as saturating actuators. The performance of LQR under saturation
nonlinearity has been studied in [25], [26], [27], which are all based
on stochastic linearization, a heuristics that replaces nonlinearity with
approximately equivalent gain and bias. By contrast, the present paper
treats nonlinearity as a design choice rather than a physical constraint,
and provides rigorous performance bounds without resorting to any
heuristics.
Outline
The remainder of this paper is organized as follows: Section II
introduces the problem setting and describes the proposed switching
strategy. The main results are provided in Section III and Section IV
for Gaussian process noise and noise with bounded fourth-order
moments respectively. Section V validates the performance of the
proposed strategy with a industrial process example. Finally, Sec-
tion VI concludes the paper.
Notations
The set of nonnegative integers are denoted by N, and the set
of positive integers are denoted by N. For a square matrix M,
ρ(M)denotes the spectral radius of M, and tr(M)denotes the
trace of M. For a real symmetric matrix M,M0denotes
that Mis positive definite. kvkdenotes the 2-norm of a vector v
and kMkis the induced 2-norm of the matrix M, i.e., its largest
singular value. For P0,hv, wiP=vTP w is the P-inner
product of vectors v, w, and kvkP=kP1/2vkis the P-norm of
a vector v. For two positive semidefinite matrices P0, Q 0,
kQkP=kP1/2QP 1/2k= supkvkP=1 kvk2
Q. For a random
vector X,X N (µ, Σ) denotes Xis Gaussian distributed with
mean µand covariance Σ.P(·)denotes the probability operator, E(·)
denotes the expectation operator, and 1Eis the indicator function
of the random event E. For functions f(x), g(x)with non-negative
values, f(x) = O(g(x)) means lim supx→∞ f(x)/g(x)<, and
f(x) = Θ(g(x)) means f(x) = O(g(x)) and g(x) = O(f(x)).
II. PROBLEM FORMULATION AND PROPOSED SWITCHING
STRATEGY
Consider the following discrete-time linear plant:
xk+1 =Axk+Buk+wk,(1)
where kNis the time index, xkRnis the state vector, ukRm
is the input vector, and wkRnis the process noise. Without loss
of generality, the system is assumed to be controllable. We further
assume that the initial state x0= 0, and that {wk}are independent
and identically distributed with covariance matrix W0.
We measure the performance of a controller uk(x0:k)in terms of
the infinite-horizon quadratic cost defined as:
J= lim sup
T→∞
1
TE"T1
X
k=0
xT
kQxk+uT
kRuk#,(2)
where Q0, R 0are fixed weight matrices specified by the
system operator. It is well known that the optimal controller is the
linear feedback controller of the form u(x) = Kx, where the
optimal gain Kcan be determined by solving the discrete-time
algebraic Riccati equation.
In this paper, we assume that the system and input matrices A, B
are unavailable to the system operator, and hence she cannot deter-
mine the optimal feedback gain K. Instead, she has the following
two feedback gains:
Primary gain K1, typically learned from data, which can be
close to Kbut does not have stability guarantees;
Fallback gain K0, which is typically conservative but always
guaranteed to be stabilizing, i.e., ρ(A+BK0)<1.
Ideally, the system operator would want to use K1as much as
possible, as it usually admits a better performance. However, since
K1is not necessarily stabilizing, a switching strategy is deployed in
pursuit of both safety and performance of the system. The block
diagram of the closed-loop system under the proposed switching
strategy is shown in Fig. 1, and the switching logic is described in
Algorithm 1. In plain words, the proposed switched control strategy
is normally applying u=K1x, while falling back to u=K0xfor
tconsecutive steps once k(K1K0)xkexceeds a threshold M.
Switching
logic
K0
ξ > 0
z1
K1
ξ= 0 Plant
(eq. (1))
z1
Switched Control Strategy
(Algorithm 1)
ξw
ux
Fig. 1: Block diagram of the closed-loop system under the proposed
switching strategy. The controller selects u=K1xwhen ξ= 0 and
u=K0xwhen ξ > 0, where ξis an internal counter determined by
the switching logic.
Algorithm 1 Proposed switched control strategy
Input: Current state x, primary gain K1, fallback gain K0, current
counter value ξ, switching threshold M, dwell time t
Output: Control input u, next counter value ξ0
1: if ξ > 0then
2: uK0x
3: else
4: if k(K1K0)xk ≥ Mthen
5: ξt, u K0x
6: else
7: uK1x
8: ξ0max{ξ1,0}
III. MAIN THEORETICAL RESULTS
This section is devoted to proving the stability of the proposed
switching strategy as well as quantifying performance loss it incurs.
It is assumed throughout this section that the process noise obeys a
Gaussian distribution, i.e., wk N (0, W ).
摘要:

SafeandEfcientSwitchingMechanismDesignforUncertiedLinearControllerYiwenLuandYilinMoAbstract—Sustainedresearcheffortshavebeendevotedtolearningoptimalcontrollersforlinearstochasticdynamicalsystemswithunknownparameters,butduetothecorruptionofnoise,learnedcontrollersareusuallyuncertiedinthesensethatt...

展开>> 收起<<
Safe and Efficient Switching Mechanism Design for Uncertified Linear Controller Yiwen Lu and Yilin Mo.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:426.87KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注