Optimal Weight Adaptation of Model Predictive Control for Connected and Automated Vehicles in Mixed Traffic with Bayesian Optimization Viet-Anh Le IEEE Student Member Andreas A. Malikopoulos IEEE Senior Member

2025-04-29 0 0 443.95KB 6 页 10玖币
侵权投诉
Optimal Weight Adaptation of Model Predictive Control for Connected
and Automated Vehicles in Mixed Traffic with Bayesian Optimization
Viet-Anh Le, IEEE Student Member, Andreas A. Malikopoulos, IEEE Senior Member
Abstract In this paper, we develop an optimal weight adap-
tation strategy of model predictive control (MPC) for connected
and automated vehicles (CAVs) in mixed traffic. We model
the interaction between a CAV and a human-driven vehicle
(HDV) as a simultaneous game and formulate a game-theoretic
MPC problem to find a Nash equilibrium of the game. In the
MPC problem, the weights in the HDV’s objective function can
be learned online using moving horizon inverse reinforcement
learning. Using Bayesian optimization, we propose a strategy
to optimally adapt the weights in the CAV’s objective function
so that the expected true cost when using MPC in simulations
can be minimized. We validate the effectiveness of the optimal
strategy by numerical simulations of a vehicle crossing example
at an unsignalized intersection.
I. INTRODUCTION
Recent advancements in connected and automated vehicles
(CAVs) provide a promising chance in reducing both energy
consumption and travel delay [1], [2]. In our previous work
[3]–[5], we addressed coordination and routing problems for
CAVs given full penetration of CAVs. However, CAVs will
gradually penetrate the market and co-exist with human-
driven vehicles (HDVs) in the next decades. Therefore,
addressing safe and efficient motion planning and control for
CAVs in mixed traffic given various human driving styles is
highly important. Several control approaches have been pro-
posed in the literature such as model predictive control [6],
[7], learning-based control [8], [9], game-theoretic control
[10], and socially-compatible control [11], [12].
Among those control approaches, model predictive control
(MPC) has received significant attention since (1) it can be
integrated into other methods such as learning-based control
or socially-compatible control, and (2) it can handle multiple
objectives and constraints concurrently. However, like in
many MPC approaches for dynamical systems, some objec-
tives, constraints, or system dynamics in motion planning
and control for CAVs are usually simplified or approximated
so that the resulting MPC problems can be solved in real-
time. In addition, the objective function in MPC is generally
formed by a linear combination of multiple features, in which
the weights are chosen empirically. As a result, true cost
optimization might not be achieved leading to performance
degradation if the weights are chosen inappropriately. An ef-
ficient technique to overcome these difficulties in practice is
automatic weight tuning [13] which aims to derive a strategy
This work was supported by NSF under Grants CNS-2149520 and
CMMI-2219761.
The authors are with the Department of Mechanical Engineer-
ing, University of Delaware, Newark, DE 19716 USA. E-mail:
vietale@udel.edu, andreas@udel.edu.
to tune the weights of MPC so that the best true cost can
be achieved. Marco et al. [14] used Bayesian optimization
to optimize weights of a cost function to compensate for
the discrepancy between the true dynamics and a linearized
model. Gros and Zanon [15] utilized reinforcement learning
for parameter adaptation in nonlinear MPC. Jain et al. [16]
focused on finding an MPC rollout having a low true cost
using covariance matrix adaptation evolution strategy.
Furthermore, in the control applications involving human
decisions, e.g., CAVs interacting with HDVs in mixed traffic,
the controller must address the stochasticity and diversity
caused by human behavior. Generally, MPC with fixed
weights cannot guarantee to work well in such applications.
For example, overly weighting toward the safety objective in
the MPC design while encountering a driving scenario with
a conservative HDV may cause traffic delay. In contrast, if
CAVs and HDVs behave aggressively then unsafe situations
may occur. Therefore, the weights of the MPC problem need
to be adapted online depending on the human driving model.
In the recent research effort [17], we developed a control
framework to address the motion planning problem for CAVs
in mixed traffic. We modeled the interaction between a CAV
and an HDV as a simultaneous game and proposed an MPC
objective function to find a Nash equilibrium of the game.
The weights in the objective function are parameterized by
social value orientation (SVO), and depending on the online
estimate of the SVO for the HDV, the MPC weights are
adapted heuristically. In this paper, we propose a method for
optimal weight adaptation of MPC for CAVs in mixed traffic
based on Bayesian optimization. Using the proposed method,
we can derive offline the optimal weight adaptation strategy
for the MPC with respect to the HDV’s objective weights
so that the true desired performance can be achieved. Then
by learning the objective weights that best describe human
driving behavior online using real-time data and the moving
horizon inverse reinforcement learning (IRL) technique [18],
the MPC weights are adapted accordingly. We demonstrate
the proposed method by a vehicle crossing example at an
unsignalized intersection, and show the benefits by compar-
ing with the heuristic method in [17].
The remainder of this paper is structured as follows.
Section II presents the game-theoretic MPC formulation and
the moving horizon IRL technique. In Section III, we develop
the method to derive the optimal weight adaptation strategy
with Bayesian optimization. In Section IV, we demonstrate
the proposed framework by an intersection crossing example,
while numerical simulation results are provided in Section V.
Finally, we conclude the paper in Section VI.
arXiv:2210.00700v2 [eess.SY] 10 Mar 2023
II. MOTION PLANNING FOR CAVS IN MIXED TRAFFIC
WITH MODEL PREDICTIVE CONTROL
In this section, we present a game-theoretic MPC formu-
lation for motion planning of a CAV while interacting with
an HDV along with the moving horizon IRL technique to
learn the objective weights of the HDV from real-time data.
A. Model Predictive Control for Motion Planning
We consider an interactive driving scenario including a
CAV and an HDV whose indices are 1and 2, respectively.
The goal of the MPC motion planner is to generate the
trajectory and control actions of CAV–1while considering
the real-time driving behavior of HDV–2. To guarantee that
CAV–1has data of HDV–2s real-time trajectories, we make
the following assumption:
Assumption 1: A coordinator is available to collect trajec-
tories of HDV–2and transmit them to CAV–1without any
significant delay or error during the communication.
We formulate the problem in the discrete-time domain, in
which the dynamic model of each vehicle iis given by
xi,k+1 =fi(xi,k ,ui,k),(1)
where xi,k and ui,k,i= 1,2, are the vectors of states
and control actions, respectively, at time step kN. We
utilize the control framework presented in [17], in which
the interaction between CAV–1and HDV–2is modeled
as a simultaneous game, i.e., the game without a leader-
follower structure, in which the objective of each vehicle
includes its individual objective and a shared objective.
Let l1x1,k+1,u1,k )and l2x2,k+1,u2,k )be the individual
objective functions of CAV–1and HDV–2, respectively, and
l12x12,k+1,u12,k , where x12,k+1 = [x>
1,k+1,x>
2,k+1]>and
u12,k = [u>
1,k,u>
2,k]>, be the cooperative term at time
step k. We assume that CAV–1and HDV–2share the
same cooperative objective, e.g., collision avoidance. Those
objective functions are usually designed as weighted sums
of some features as follows
lixi,k+1,ui,k ) = ω>
iφixi,k+1,ui,k ), i = 1,2,(2)
l12x12,k+1,u12,k ) = ω>
12φ12x12,k+1,u12,k ),(3)
where φi,φ12 are vectors of features and ωi∈ Wi,ω12
W12 are corresponding vectors of weights, where Wiand
W12 are the sets of feasible values. For ease of notation,
we define ifor each i∈ {1,2}as the other vehicle than
vehicle i. We consider that given any control actions ui,k
of the other vehicle, each vehicle iapplies the control actions
u
i,k that minimizes a sum of its individual objective and the
shared objective, i.e.,
u
i,k =arg min
ui,k
lixi,k+1,ui,k )+l12x12,k+1,u12,k),ui,k.
(4)
Next, we formulate an MPC problem with a control
horizon of length HN. Let tbe the current time step and
It={t, . . . , t +H1}be the set of all time steps in the
control horizon at time step t. We can recast the simultaneous
game between CAV–1and HDV–2presented above as a
potential game [19], the game in which all players minimize
a single global function called the potential function. In
the potential game, a Nash equilibrium can be found by
minimizing the potential function. The potential function in
this game at each time step kis
lpotx12,k+1,u12,k )
=X
i=1,2
lixi,k+1,ui,k ) + l12x12,k+1,u12,k )(5)
Therefore, we propose utilizing the cumulative sum of the
potential function over the control horizon as the objective
function in the MPC problem, which can be given by
JMPC =X
k∈It
lpot,kx12,k+1,u12,k).(6)
Hence, the MPC problem for motion planning of CAV–1
is formulated as follows
minimize
{u12,k }k∈It
JMPC (7a)
subject to:
(1), i = 1,2,(7b)
gj(x12,k+1,u12,k )0,j∈ Jieq,(7c)
hj(x12,k+1,u12,k )=0,j∈ Jeq,(7d)
where (7b)–(7d) hold for all k∈ It. The constraints (7c) and
(7d) are inequality and equality constraints with Jieq and Jeq
are sets of indices.
In the objective function of the MPC problem (7), assume
that we can pre-define the features φi, i = 1,2and φ12,
if we learn online ω2and ω12 that best describe the human
driving behavior, the CAV’s objective weights ω1are adapted
to achieve the desired performance. The optimal strategy
for adapting ω1can be derived offline using Bayesian
optimization as presented in Section III.
B. Moving Horizon Inverse Reinforcement Learning
To identify the weights ω2and ω12 in the individual
objective function of HDV–2and the shared objective, we
utilize the feature-based IRL approach [18], [20], a ma-
chine learning technique developed to learn the underlying
objective or reward of an agent by observing its behavior.
We define the vector of all features and the vector of all
corresponding weights in HDV–2s objective function as
f= [φ>
2,φ>
12]>and θ= [ω>
2,ω>
12]>, respectively. Let ˜
f
be the vector of average observed feature values computed
from data and Ep[f]be the expected feature values with
a given probability distribution pover trajectories. With
feature-based IRL, the goal is to learn the weight vector
θ, where Ω = W2× W12 so that expected feature
values can match observed feature values.
In moving horizon IRL, at each time step, we utilize
the LNmost recent trajectory segments to update the
weight estimate, where Lis the estimation horizon length.
Let tbe the current time step and Rt={rm}m=1,...,L
be the set of Lsample trajectory segments collected
over the estimation horizon at time t, in which rm=
(x12,tm,x12,tm+1,u12,tm), for m= 1, . . . , L, is the
摘要:

OptimalWeightAdaptationofModelPredictiveControlforConnectedandAutomatedVehiclesinMixedTrafcwithBayesianOptimizationViet-AnhLe,IEEEStudentMember,AndreasA.Malikopoulos,IEEESeniorMemberAbstract—Inthispaper,wedevelopanoptimalweightadap-tationstrategyofmodelpredictivecontrol(MPC)forconnectedandautomated...

展开>> 收起<<
Optimal Weight Adaptation of Model Predictive Control for Connected and Automated Vehicles in Mixed Traffic with Bayesian Optimization Viet-Anh Le IEEE Student Member Andreas A. Malikopoulos IEEE Senior Member.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:443.95KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注