Federated Reinforcement Learning for Real-Time Electric Vehicle Charging and Discharging Control Zixuan Zhangy Yuning Jiangx Yuanming Shiy Ye Shiy and Wei Chenz

2025-04-27 0 0 1.56MB 7 页 10玖币
侵权投诉
Federated Reinforcement Learning for Real-Time
Electric Vehicle Charging and Discharging Control
Zixuan Zhang, Yuning Jiang§, Yuanming Shi, Ye Shi, and Wei Chen
School of Information Science and Technology (SIST), ShanghaiTech University, China
§Automatic Control Laboratory, ´
Ecole Polytechnique F´
ed´
erale de Lausanne (EPFL), Switzerland
Department of Electronic Engineering, Tsinghua University, China
Email: zxzhang@gmail.com, yuning.jiang@ieee.org, {shiym,shiye}@shanghaitech.edu.cn, wchen@tsinghua.edu.cn
Abstract—With the recent advances in mobile energy storage
technologies, electric vehicles (EVs) have become a crucial part
of smart grids. When EVs participate in the demand response
program, the charging cost can be significantly reduced by taking
full advantage of the real-time pricing signals. However, many
stochastic factors exist in the dynamic environment, bringing
significant challenges to design an optimal charging/discharging
control strategy. This paper develops an optimal EV charg-
ing/discharging control strategy for different EV users under
dynamic environments to maximize EV users’ benefits. We first
formulate this problem as a Markov decision process (MDP).
Then we consider EV users with different behaviors as agents
in different environments. Furthermore, a horizontal federated
reinforcement learning (HFRL)-based method is proposed to
fit various users’ behaviors and dynamic environments. This
approach can learn an optimal charging/discharging control
strategy without sharing users’ profiles. Simulation results illus-
trate that the proposed real-time EV charging/discharging control
strategy can perform well among various stochastic factors.
I. INTRODUCTION
In the past decades, the advent of electric vehicles (EVs)
has significantly mitigated air pollution and fossil energy
depletion [1]. When EVs are connected to the power grid, they
can serve in the discharging mode as vehicle-to-grid (V2G)
devices or in charging mode as grid-to-vehicle (G2V) devices
[2]. As a new type of mobile and adjustable load, through
switching their working modes alternatively, a fleet of EVs
connected to the grid can work in the G2V mode at the valley
time to reach valley filling and in the V2G mode at the peak
time to achieve peak shaving [3]. In addition, users’ charging
costs can be reduced by responding to the electricity price
signals and changing the working pattern in time [4]. The
V2G concept and its benefits are shown in Fig. 1.
With the aim of maximizing users’ benefits, the EV
charging/discharging control strategy [5] is always supposed
to coordinate the charging/discharging action, including the
charging/discharging decision and the charging/discharging
rate. However, due to plentiful stochastic factors lying in
the dynamic environment [6], like time-varying electricity
prices and uncertain behaviors from a variety of users, it is
challenging to design an optimal charging/discharging control
strategy considering many kinds of EV users.
Many day-ahead approaches, such as robust optimiza-
tion (RO) [7] and stochastic optimization (SO) [8] have
been proposed to handle the price uncertainty. Although the
methods above achieved great success in day-ahead charg-
ing/discharging control, it may be hard to depict the complex,
real-time scenarios with more uncertain factors. Generally,
the real-time charging/discharging control strategy considering
uncertain electricity prices and users’ demand satisfaction
can be formulated as an optimization problem with a known
state transition. Then it can be solved by model-based ap-
proaches like dynamic programming [9], model predictive
control (MPC) [10], model-based RL [11], etc. Nevertheless,
it is tough to establish an accurate system model or estimate
the state transition when considering various EV users’ inde-
terminate charging/discharging behaviors.
As a technique that can directly learn the optimal policies
without establishing or estimating the environment model [12],
Model-free RL has been applied to numerous smart grid
issues [13], [14] and obtains good control performances. A
model-free RL approach in [15] can avoid grid congestion by
coordinating EV charging/discharging. [16] took the dynamic
electricity price, non-EV residential load consumption, and
drivers’ behaviors into consideration to construct the dynamic
environment. However, in these above papers, it is assumed
that agents’ state transitions follow the same distribution, i.e.,
the environments of different agents are IID. However, in
actual scenarios, the situations faced by EV users may differ
slightly, resulting in the non-IID environments and different
state transitions.
As a novel type of distributed machine learning, federated
learning (FL) [17], [18], [19] has received considerable interest
from academia and industry. FL allows the use of isolated data
from multiple devices without violating the privacy protection
policy [20], [21], and it has been applied in many areas
[22], [23], [24]. Recently, an emerging field called federated
reinforcement learning (FRL) [25] combines the advantages
of both FL and RL. It can not only provide agents with
the experience to learn to make good decisions in unknown
and dynamic environments but also train a global model
collaboratively without sharing their own experiences. As a
branch of FRL, horizontal federated reinforcement learning
(HFRL) fits well for agents who are likely to isolate from
each other but face similar decision-making issues and have
fewer interactions [26].
This paper considers EV users with different behaviors as
agents in different environments. Motivated by [27], we aim
arXiv:2210.01452v1 [eess.SY] 4 Oct 2022
to collaboratively learn a real-time EV charging/discharging
control strategy that can perform uniformly well in different
kinds of environments. We first formulate this problem as
a Markov decision-making process (MDP), then a HFRL-
based approach is proposed to deal with the dynamic charg-
ing/discharging environments and the users’ various behaviors.
In our approach, Soft Actor-Critic (SAC) algorithm [28] can
alleviate the sample-efficiency problem in RL as the local
training method, and the FedAvg algorithm [29] is utilized for
global aggregation to help each agent quickly learn the optimal
policy while considering privacy preservation. Moreover, our
simulation results demonstrate that the proposed real-time EV
charging/discharging control strategy can make a good trade-
off between dynamic electricity prices and uncertain behaviors
from different EV users.
Transformer
Control
Unit
EV
Battery
Building
Control Unit
Bidirectional
Power Converter
EV Charging Station
Electric Vehicle
Smart Grid
Short-Term Value
Peak shaving and valley
filling
Reduce charging cost and
make discharging profits
Long-Term Value
EV users participate in
electricity transactions, and
benefit from power dispatch
and auxiliary modes
Fig. 1. V2G concept and its values
II. SYSTEM MODEL AND PROBLEM FORMULATION
This section first introduces a model to describe the dynamic
changes of EVs’ batteries. Then, the EV charging/discharging
problem is formulated as a Markov decision process (MDP).
Finally, we formulate the objective of the optimal charg-
ing/discharging control policy.
A. EV Battery Model
In this paper, we consider NEVs equipped with the same
batteries indexed by i∈ {1, ..., N }and we assume that the
charging infrastructures are the same for each EV. We define
the times by ti
aat which EV iarrives at the charging station
and by ti
dat which it departs from the station. If the State
of Charge (SoC) of EV iat time tand t+ 1 are denoted by
SoCi
tand SoCi
t+1, respectively, then the dynamic of i-th EV’s
battery between time instants tto t+ 1 can be modeled as
SoCi
t+1 =(SoCi
tt<ti
a, t ti
d,
SoCi
t+η·ai
tti
at<ti
d,(1)
where ai
tis the i-th EV’s total charging/discharging rate during
the time interval [t, t + 1). We assume the EV is under either
V2G (charging mode, ai
t0) or G2V mode (discharging
mode, ai
t0). Here, we also assume charging/discharging
has the same efficiency η(0,1] in this paper. Besides, SoCi
t
satisfies SoCi
t[0,1] for all iand t, and the input ai
tis
constrained by the charging infrastructure.
B. MDP Formulation
The EV charging/discharging control problem has the same
form as the sequential decision-making problem, such that it
can be regarded as a Markov decision process (MDP) with
discrete steps. Let us consider EV users having different charg-
ing/discharging behaviors such that Nagents, respectively,
interact with Nindependent environments. The environments
have different state transitions {Pi}N
i=1 but the same state
space S, action space A, reward function Rand discount
factor γ. Then the MDP of this problem can be denoted by
Mi=hS,A,Pi,R, γi, for all i={1,2,··· , N}.
1) State: As the input of the charging/discharging control
strategy, the environment state is used to generate a real-time
charging/discharging action. For agent i, the state si
t∈ Rn+6
at time tincludes the current and the past nhours’ electricity
price (ψtn, ψtn+1,··· , ψt)∈ Rn+1, the departure time ti
d,
the anxious time ti
x, the current SoCi
t, the expected SoCi
xat
the anxious time and the departure time SoCi
d, that is
si
t={ψtn, ψtn+1, . . . , ψt, ti
d, ti
x,SoCi
t,SoCi
x,SoCi
d}.(2)
2) Action: The action ai
tdenotes the charging/discharging
rate of EV is battery during the state transition step [t, t +
1) with the given state si
t. Due to the limitation of charging
infrastructures, the action is restricted as follows
aai
ta, (3)
where aand aare the minimum and maximum rate for this
EV charging/discharging problem.
3) Reward: The reward represents the immediate system
feedback after the state si
tchanges to si
t+1 with ai
t. We
proposed a reward settlement scheme integrating EV’s demand
response factor into this reward function. A mathematical
model in [30] is used to quantify the effect of anxiety on
SoC, and the anxious time ti
xis defined in this model. The
model can be denoted as
SoCi
x=di
1edi
2(tti
a)/(ti
dti
a)1
edi
21,(4)
where tsatisfies t[ti
x, ti
d). It can map the user’s anxiety to
the expected SoC properly. di
1[0,1] and di
2(−∞,0)
(0,)are both shape parameters of the SoC curve. A larger
di
1leads to a higher SoC at ti
dand a larger di
2determines a
higher SoC during the charging/discharging duration.
We assume that the price of selling and the purchasing elec-
tricity are the same, the reward can be defined as rt(si
t, ai
t) :=
σp·ψt·ai
tti
at<ti
x,
σp·ψt·ai
tσx·max(SoCi
xSoCi
t,0) ti
xt<ti
d,
σd·max(SoCi
dSoCi
t,0) t=ti
d.
Here, the factors σp,σx, and σddepict the user’s sensitivity to
price, anxiety, and demand response, respectively. When ti
a
t<ti
x, the reward fully considers the influence of electricity
price fluctuations to EV charging/discharging decision. Then
the EV user’s anxiety is taken into consideration during ti
x
t<ti
d. As the EV leaves the charging station, i.e., t=ti
d, the
reward calculates the SoC gap between expected and current
SoC to meet the EV user’s demand as well as possible.
摘要:

FederatedReinforcementLearningforReal-TimeElectricVehicleChargingandDischargingControlZixuanZhangy,YuningJiangx,YuanmingShiy,YeShiy,andWeiChenzySchoolofInformationScienceandTechnology(SIST),ShanghaiTechUniversity,ChinaxAutomaticControlLaboratory,´EcolePolytechniqueF´ed´eraledeLausanne(EPFL),Switzerl...

展开>> 收起<<
Federated Reinforcement Learning for Real-Time Electric Vehicle Charging and Discharging Control Zixuan Zhangy Yuning Jiangx Yuanming Shiy Ye Shiy and Wei Chenz.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:1.56MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注