a mismatch when the environment changes and cannot make
the right decision rapidly in a new environment.
In this work, we propose to improve the efficiency of the
learning process in RL with meta-learning in EMS. Meta-
learning is the method of systematically observing how differ-
ent learning approaches perform on a wide range of learning
tasks, and then learning from this experience to learn new
tasks much faster. Successful applications have been demon-
strated in areas spanning few-shot image recognition, unsu-
pervised learning, data-efficient, and RL (Finn, Abbeel, and
Levine 2017; Nagabandi, Finn, and Levine 2018; Vanschoren
2018; Han et al. 2022; Li et al. 2022). These methods learn a
well-generalized initialization that can be quickly adapted to
a new scenario with a few gradient steps.
Moreover, we investigate an efficient energy optimization
learning problem for EMS with ESU, HVAC systems, re-
newable energies, and non-shiftable loads (e.g., televisions)
in the absence of a building dynamics model. To be spe-
cific, our objective is to quickly minimize the energy cost
of the EMS during a time horizon with the consideration of
shaping load profiles to improve system reliability. However,
it is very challenging to achieve the above aims by simply
applying meta-RL methods to EMS control due to the follow-
ing reasons. Firstly, it is often intractable to obtain accurate
dynamics of different loads demands and buildings, which
can be affected by many factors. Secondly, it is difficult to
know the statistical distributions of all combinations of ran-
dom system parameters (e.g., renewable generation output,
power demand of non-shiftable loads, outdoor temperature,
and electricity price). Thirdly, there are temporally-coupled
operational constraints associated with ESU and HVAC sys-
tems in different environments, which means that the current
action would affect future decisions.
To address the above challenge, we propose a meta-RL
framework for building control in EMS (MetaEMS), which is
built upon the actor-critic-based meta-RL line. To the best of
our knowledge, it is the first work to introduce the meta-RL
paradigm into building control. In MetaEMS, we learn a well-
generalized initialization from various building control tasks.
Given a new building scenario with a limited learning period,
the learned initialization can be quickly adapted with a few
generated samples without knowing the building dynamics.
We further propose two types of adaptation mechanisms to
enhance the data efficiency in MetaEMS: group-level adap-
tation and building-level adaptation. The former is a step-
by-step optimization process on each task and the latter is a
periodic synchronous updating process on a batch of sampled
tasks. Each task inherits a group-shared initialization of pa-
rameters, then performs building-level adaptation and finally
contributes to group-level adaptation. Our experiment results
show that the proposed method is more robust and can learn
faster and be generalized well, facing different building envi-
ronment dynamics. In summary, this paper has the following
key contributions:
•
In this work, MetaEMS, a meta-RL framework consisting
of group-level and building-level adaptation, is proposed
to deal with building energy management control.
•
Empirically, we demonstrate the effectiveness and effi-
ciency of our proposed model on the newest released
real-world CityLearn environment datasets.
Related Work
Traditional Control Methods for EMS
The traditional ways of building control can be sorted
into RBC and MPC methods (Zhang et al. 2022; Mariano-
Hern
´
andez et al. 2021). The basic idea of RBC techniques is
that adjustment is based on the manually designed set points.
For example, cooling control is applied when the measured
temperature exceeds a pre-defined temperature. The MPC
techniques merge principles of feedback control and numer-
ical optimization in EMS. The system response models of
MPC are based on the physical principle to calculate the ther-
mal dynamics and energy behaviour of the whole building
(Camponogara et al. 2021; Serale et al. 2018; Sturzenegger
et al. 2014). Another trend of MPC is to combine various ma-
chine learning tools with classical MPC to design data-driven
MPC strategies that preserve the reliability of classical MPC.
In (Eini and Abdelwahed 2020), MPC combined with neu-
ral network model is used for lighting and thermal comfort
optimization. A nonlinear autoregressive exogenous model
with parallel architecture is used to train the networks that
estimate the comfort specifications, environmental conditions
and power consumption. However, there exist some limita-
tions to such methods in solving control problems in EMS.
First, it needs quantities of precise domain knowledge and
building information to manually design the model, which is
hard to obtain and results in limited commercial implemen-
tation (Zhao et al. 2022; B
¨
unning et al. 2020). Second, the
iterative algorithms designed by the traditional optimization
methods cannot make the fast decision on the building con-
trol in a dynamic building environment (Drgo
ˇ
na et al. 2018;
Chen, Cai, and Berg
´
es 2019). Since such algorithms require
iterative calculations for each building dynamic model.
Reinforcement Learning for EMS
RL-based EMS control has attracted wide attention from
both academia and industry in the last decades (Forootani,
Rastegar, and Jooshaki 2022; Yu et al. 2021). Traditional
RL methods in EMS are limited to tabular Q-learning and a
discrete state representation (Wen, O’Neill, and Maei 2015).
Recently, researchers have studied deep RL methods in EMS
(Ren et al. 2022; Wei, Wang, and Zhu 2017), which can deal
with problems with large action spaces and state spaces. The
authors in (Ren et al. 2022) adopts a forecasting based dueling
deep Q-learning to optimize and dispatch a featured home
EMS, where a generalized corr-entropy assisted long short-
term memory neural network is adopted to predict outdoor
temperature. (Huang et al. 2022) uses a mixed deep RL to
deal with discrete-continuous hybrid action space in EMS.
To jointly optimize the schedules of all kinds of appliances, a
deep RL approach based on trust region policy optimization
is proposed in (Li, Wan, and He 2020).
Some works also point out that it is impractical to let the
deep RL agent explore the state space fully in a real building
environment, because an unacceptably high economic cost
may incur during the long training process (Camponogara