
mization required by MPC. For a more in-depth review on us-
ing RL for DR, refer to [18]. The use of model-free RL comes
at a price, however: training can require large amounts of data
and time, and it can be challenging to enforce safety or physical
constraints. The primary approach for the later has been adding
soft penalties for constraint violation, which only encourages
but doesn’t strictly enforce the constraints. While recent re-
search has attempted to address some of these challenges for
building control using transfer learning [19,20,21] and safe
RL [22], it is still an active area of research.
Recently, hybrids of MPC and model-free RL have shown
promise for building control. Intuitively, a model-based method
such as MPC embodies physical domain knowledge and thus
can guide a data-driven learning process while a learning-based
method such as RL can adapt the model to a specific system.
Based on this intuition, hybrid control methods should train of-
fline faster than RL, run online faster than MPC, and identify
at least as good control actions as either method. Several hy-
brid methods have been proposed in recent years, notably MPC
with a learned terminal cost (RL-MPC) [23,24], approximate
MPC (aMPC) based on imitation learning [25,26], and differ-
entiable predictive control (DPC) [27] based on differentiable
programming [28,29]. aMPC and DPC use a similar approach
and have comparable control performance, but DPC is simpler
to implement and has lower training cost [27].
These works include some comparisons of hybrid methods
to MPC and/or RL for building control. For single building
control, [24] compares RL-MPC to MPC and RL and [27] com-
pares DPC to MPC and RL. For multi-zone building control,
[25] compares aMPC to MPC. These studies each compare
one hybrid controller to one or two baseline controllers rather
than comparing hybrid controllers to each other and they make
these comparisons on a limited set of metrics, such as only
control performance and/or online evaluation time. Also, they
only consider baseline building energy control rather than grid-
interactive control. In addition, some of the hybrid methods are
applied to building control in a limited capacity: RL-MPC ei-
ther uses a restrictive form for the terminal cost [23] or uses
a computationally expensive sampling method [24] and DPC
has only been trained using a linearized building model [27].
Finally, the above comparisons use a mix of building models
and learning frameworks, some of which are not released open-
source, making it challenging to reproduce, build upon, or com-
pare between these works.
To address the knowledge gaps described above, we present
the first paper providing a comprehensive comparison of model-
based, learning-based, and hybrid methods for building control.
The novel contributions of this study are that we:
1. Compare a full spectrum of state-of-the-art control meth-
ods from model-based to model-free and from learning-based
to learning-free, as shown in Figure. 1.
2. Evaluate control methods on a thorough set of met-
rics: training performance, control performance, generalizabil-
ity, and online computational time. Based on our experiments,
we summarize the pros and cons of each method and for what
test cases a given method is suitable.
3. Include DR programs in the analysis of building con-
Figure 1: A quadrant illustration on control methods studied in this paper based
on two feature axes: (1) use of learning and (2) explicit use of a model. See
the nomenclature section for controllers’ full names and Section 3for detailed
formulations.
Nomenclature
Controller Abbreviations
OPT Optimal Open-Loop Control
MPC Nonconvex model predictive control
MPC-C Convex model predictive control
MPC-CL Convex model predictive control with learned terminal cost
DPC Differentiable predictive control
RLC Reinforcement learning control
RBC Rule based control
Indices and Related Constants
τDuration of control step
k;KLookahead step index; total lookahead steps
t;NControl step index; total control steps
zNumber of thermal zones in the building
Variables
uControl action
wExogenous inputs
xState variable in building dynamics
trollers and extend these methods to grid-interactive control.
4. Use robust, general forms of hybrid methods to ensure a
fair comparison between methods. Unlike previous works, we
apply RL-MPC using a general form for the cost function and
a computationally efficient training approach and we train DPC
on a nonlinear building model.
5. Implement seven control methods, two building mod-
els, and three DR programs in a modular, documented, open-
source code base [30] (URL: https://github.com/NREL/
learning-building-control) to enable future comparisons
to and extensions of this work.
The rest of the paper is arranged as follows: Section 2cov-
ers the problem formulation by describing our building model,
cost function, DR programs, general building control problem,
and measured signals. Section 3explains the control methods
compared in this work. Training methods for the data-driven
methods are described in Section 4. Experimental details and
results are presented in Section 5and conclusions in Section 6.
2. Model and Control Problem Formulation
To investigate building control under demand response
(DR), we focus on controlling the heating, ventilation and air-
2