Federated Reinforcement Learning for Real-Time
Electric Vehicle Charging and Discharging Control
Zixuan Zhang†, Yuning Jiang§, Yuanming Shi†, Ye Shi†, and Wei Chen‡
†School of Information Science and Technology (SIST), ShanghaiTech University, China
§Automatic Control Laboratory, ´
Ecole Polytechnique F´
ed´
erale de Lausanne (EPFL), Switzerland
‡Department of Electronic Engineering, Tsinghua University, China
Email: zxzhang@gmail.com, yuning.jiang@ieee.org, {shiym,shiye}@shanghaitech.edu.cn, wchen@tsinghua.edu.cn
Abstract—With the recent advances in mobile energy storage
technologies, electric vehicles (EVs) have become a crucial part
of smart grids. When EVs participate in the demand response
program, the charging cost can be significantly reduced by taking
full advantage of the real-time pricing signals. However, many
stochastic factors exist in the dynamic environment, bringing
significant challenges to design an optimal charging/discharging
control strategy. This paper develops an optimal EV charg-
ing/discharging control strategy for different EV users under
dynamic environments to maximize EV users’ benefits. We first
formulate this problem as a Markov decision process (MDP).
Then we consider EV users with different behaviors as agents
in different environments. Furthermore, a horizontal federated
reinforcement learning (HFRL)-based method is proposed to
fit various users’ behaviors and dynamic environments. This
approach can learn an optimal charging/discharging control
strategy without sharing users’ profiles. Simulation results illus-
trate that the proposed real-time EV charging/discharging control
strategy can perform well among various stochastic factors.
I. INTRODUCTION
In the past decades, the advent of electric vehicles (EVs)
has significantly mitigated air pollution and fossil energy
depletion [1]. When EVs are connected to the power grid, they
can serve in the discharging mode as vehicle-to-grid (V2G)
devices or in charging mode as grid-to-vehicle (G2V) devices
[2]. As a new type of mobile and adjustable load, through
switching their working modes alternatively, a fleet of EVs
connected to the grid can work in the G2V mode at the valley
time to reach valley filling and in the V2G mode at the peak
time to achieve peak shaving [3]. In addition, users’ charging
costs can be reduced by responding to the electricity price
signals and changing the working pattern in time [4]. The
V2G concept and its benefits are shown in Fig. 1.
With the aim of maximizing users’ benefits, the EV
charging/discharging control strategy [5] is always supposed
to coordinate the charging/discharging action, including the
charging/discharging decision and the charging/discharging
rate. However, due to plentiful stochastic factors lying in
the dynamic environment [6], like time-varying electricity
prices and uncertain behaviors from a variety of users, it is
challenging to design an optimal charging/discharging control
strategy considering many kinds of EV users.
Many day-ahead approaches, such as robust optimiza-
tion (RO) [7] and stochastic optimization (SO) [8] have
been proposed to handle the price uncertainty. Although the
methods above achieved great success in day-ahead charg-
ing/discharging control, it may be hard to depict the complex,
real-time scenarios with more uncertain factors. Generally,
the real-time charging/discharging control strategy considering
uncertain electricity prices and users’ demand satisfaction
can be formulated as an optimization problem with a known
state transition. Then it can be solved by model-based ap-
proaches like dynamic programming [9], model predictive
control (MPC) [10], model-based RL [11], etc. Nevertheless,
it is tough to establish an accurate system model or estimate
the state transition when considering various EV users’ inde-
terminate charging/discharging behaviors.
As a technique that can directly learn the optimal policies
without establishing or estimating the environment model [12],
Model-free RL has been applied to numerous smart grid
issues [13], [14] and obtains good control performances. A
model-free RL approach in [15] can avoid grid congestion by
coordinating EV charging/discharging. [16] took the dynamic
electricity price, non-EV residential load consumption, and
drivers’ behaviors into consideration to construct the dynamic
environment. However, in these above papers, it is assumed
that agents’ state transitions follow the same distribution, i.e.,
the environments of different agents are IID. However, in
actual scenarios, the situations faced by EV users may differ
slightly, resulting in the non-IID environments and different
state transitions.
As a novel type of distributed machine learning, federated
learning (FL) [17], [18], [19] has received considerable interest
from academia and industry. FL allows the use of isolated data
from multiple devices without violating the privacy protection
policy [20], [21], and it has been applied in many areas
[22], [23], [24]. Recently, an emerging field called federated
reinforcement learning (FRL) [25] combines the advantages
of both FL and RL. It can not only provide agents with
the experience to learn to make good decisions in unknown
and dynamic environments but also train a global model
collaboratively without sharing their own experiences. As a
branch of FRL, horizontal federated reinforcement learning
(HFRL) fits well for agents who are likely to isolate from
each other but face similar decision-making issues and have
fewer interactions [26].
This paper considers EV users with different behaviors as
agents in different environments. Motivated by [27], we aim
arXiv:2210.01452v1 [eess.SY] 4 Oct 2022