From Model-Based to Model-Free Learning Building Control for Demand Response

2025-05-06 0 0 2.63MB 16 页 10玖币
侵权投诉
From Model-Based to Model-Free: Learning Building Control for Demand Response
David Biagionia,c, Xiangyu Zhanga,, Christiane Adcocka,b, Michael Sinnera, Peter Grafa, Jennifer Kinga
aNational Renewable Energy Laboratory, Golden, CO 80401, U.S.A.
bStanford University, Stanford, CA 94305, U.S.A.
cMaplewell Energy, Broomfield, CO 80021, U.S.A.
Abstract
Grid-interactive building control is a challenging and important problem for reducing carbon emissions, increasing energy e-
ciency, and supporting the electric power grid. Currently researchers and practitioners are confronted with a choice of control
strategies ranging from model-free (purely data-driven) to model-based (directly incorporating physical knowledge) to hybrid
methods that combine data and models. In this work, we identify state-of-the-art methods that span this methodological spec-
trum and evaluate their performance for multi-zone building HVAC control in the context of three demand response programs. We
demonstrate, in this context, that hybrid methods oer many benefits over both purely model-free and model-based methods as long
as certain requirements are met. In particular, hybrid controllers are relatively sample ecient, fast online, and high accuracy so
long as the test case falls within the distribution of training data. Like all data-driven methods, hybrid controllers are still subject
to generalization errors when applied to out-of-sample scenarios. Key takeaways for control strategies are summarized and the
developed software framework is open-sourced.
Keywords: Grid-interactive building, demand response, dierentiable optimization layers, dierentiable programming, hybrid
control, reinforcement learning, model predictive control, dierentiable predictive control
1. Introduction
The increasingly frequent occurrence of extreme weather
events and their severe social and economic consequences make
climate change one of the most pressing problems currently
faced by human society [1]. To address this, many coun-
tries have proposed quantitative targets to reverse the outcome
caused by human-induced climate change and guide the transi-
tion to a decarbonized society. For example, the U.S. govern-
ment aims to decarbonize the power sector by 2035 and reach
net zero emissions by 2050 [2]. Similarly, China’s “30-60” plan
targets to hit “carbon peak” and “carbon neutral” by 2030 and
2060, respectively [3]. A key part of reaching these goals will
be reducing building energy consumption, which in the U.S.,
for example, accounts for 70% of national electricity use [4]
and 30% of national carbon emissions [5]. An eective set of
tools to reduce building energy consumption are technologies
for grid-interactive ecient buildings (GEBs.) According to
[6], adopting these technologies across the U.S. would reduce
power sector emissions by 6% by 2030 and save $100-200 bil-
lion by 2040. To achieve these reductions, it is urgent and nec-
essary to develop technologies that enable buildings to be grid-
interactive, such as through demand response (DR) programs.
Corresponding author
Email addresses: dave@maplewelleng.com (David Biagioni),
Xiangyu.Zhang@nrel.gov (Xiangyu Zhang ), janiad@stanford.edu
(Christiane Adcock), Michael.Sinner@nrel.gov (Michael Sinner),
Peter.Graf@nrel.gov (Peter Graf), Jennifer.King@nrel.gov (Jennifer
King)
One successful method for building-grid interaction through
DR is model predictive control (MPC). MPC has been widely
studied for optimizing building-side control objectives [7,8];
see [9] for a comprehensive review. Many recent works add
grid service objectives to the MPC formulation to deliver con-
trol that balances the trade-os between thermal comfort, en-
ergy cost, and DR requirements. For example, Tang et al. [10]
propose leveraging thermal energy storage in buildings to pro-
vide fast DR while maintaining indoor comfort and Hu et al.
[11] design a mixed integer linear programming-based MPC
controller to make building floor heating systems grid respon-
sive. However, as pointed out by [9], while MPC controllers
have been researched for more than a decade, they have not yet
been widely adopted in buildings owing to challenges in cre-
ating accurate yet simple predictive models and solving poten-
tially compute-intensive optimization problems during online
control.
To circumvent the diculties with MPC, researchers have
tried purely data-driven methods, such as model-free reinforce-
ment learning (RL). Building simulators such as EnergyPlus
or thermal dynamics models learned from smart thermostat
data can be used for RL controller training, as demonstrated
in [12,13,14,15]. As RL has less stringent requirements for
building models than MPC, it is easier to obtain models for new
buildings, although several reviews highlight the need for more
research on transfer learning to further improve generalization
[16,17]. RL also has the benefit that online operation only re-
quires a single forward pass through a policy, such as a neural
network, which is generally much faster than the online opti-
Preprint submitted to Elsevier
arXiv:2210.10203v1 [eess.SY] 18 Oct 2022
mization required by MPC. For a more in-depth review on us-
ing RL for DR, refer to [18]. The use of model-free RL comes
at a price, however: training can require large amounts of data
and time, and it can be challenging to enforce safety or physical
constraints. The primary approach for the later has been adding
soft penalties for constraint violation, which only encourages
but doesn’t strictly enforce the constraints. While recent re-
search has attempted to address some of these challenges for
building control using transfer learning [19,20,21] and safe
RL [22], it is still an active area of research.
Recently, hybrids of MPC and model-free RL have shown
promise for building control. Intuitively, a model-based method
such as MPC embodies physical domain knowledge and thus
can guide a data-driven learning process while a learning-based
method such as RL can adapt the model to a specific system.
Based on this intuition, hybrid control methods should train of-
fline faster than RL, run online faster than MPC, and identify
at least as good control actions as either method. Several hy-
brid methods have been proposed in recent years, notably MPC
with a learned terminal cost (RL-MPC) [23,24], approximate
MPC (aMPC) based on imitation learning [25,26], and dier-
entiable predictive control (DPC) [27] based on dierentiable
programming [28,29]. aMPC and DPC use a similar approach
and have comparable control performance, but DPC is simpler
to implement and has lower training cost [27].
These works include some comparisons of hybrid methods
to MPC and/or RL for building control. For single building
control, [24] compares RL-MPC to MPC and RL and [27] com-
pares DPC to MPC and RL. For multi-zone building control,
[25] compares aMPC to MPC. These studies each compare
one hybrid controller to one or two baseline controllers rather
than comparing hybrid controllers to each other and they make
these comparisons on a limited set of metrics, such as only
control performance and/or online evaluation time. Also, they
only consider baseline building energy control rather than grid-
interactive control. In addition, some of the hybrid methods are
applied to building control in a limited capacity: RL-MPC ei-
ther uses a restrictive form for the terminal cost [23] or uses
a computationally expensive sampling method [24] and DPC
has only been trained using a linearized building model [27].
Finally, the above comparisons use a mix of building models
and learning frameworks, some of which are not released open-
source, making it challenging to reproduce, build upon, or com-
pare between these works.
To address the knowledge gaps described above, we present
the first paper providing a comprehensive comparison of model-
based, learning-based, and hybrid methods for building control.
The novel contributions of this study are that we:
1. Compare a full spectrum of state-of-the-art control meth-
ods from model-based to model-free and from learning-based
to learning-free, as shown in Figure. 1.
2. Evaluate control methods on a thorough set of met-
rics: training performance, control performance, generalizabil-
ity, and online computational time. Based on our experiments,
we summarize the pros and cons of each method and for what
test cases a given method is suitable.
3. Include DR programs in the analysis of building con-
Figure 1: A quadrant illustration on control methods studied in this paper based
on two feature axes: (1) use of learning and (2) explicit use of a model. See
the nomenclature section for controllers’ full names and Section 3for detailed
formulations.
Nomenclature
Controller Abbreviations
OPT Optimal Open-Loop Control
MPC Nonconvex model predictive control
MPC-C Convex model predictive control
MPC-CL Convex model predictive control with learned terminal cost
DPC Dierentiable predictive control
RLC Reinforcement learning control
RBC Rule based control
Indices and Related Constants
τDuration of control step
k;KLookahead step index; total lookahead steps
t;NControl step index; total control steps
zNumber of thermal zones in the building
Variables
uControl action
wExogenous inputs
xState variable in building dynamics
trollers and extend these methods to grid-interactive control.
4. Use robust, general forms of hybrid methods to ensure a
fair comparison between methods. Unlike previous works, we
apply RL-MPC using a general form for the cost function and
a computationally ecient training approach and we train DPC
on a nonlinear building model.
5. Implement seven control methods, two building mod-
els, and three DR programs in a modular, documented, open-
source code base [30] (URL: https://github.com/NREL/
learning-building-control) to enable future comparisons
to and extensions of this work.
The rest of the paper is arranged as follows: Section 2cov-
ers the problem formulation by describing our building model,
cost function, DR programs, general building control problem,
and measured signals. Section 3explains the control methods
compared in this work. Training methods for the data-driven
methods are described in Section 4. Experimental details and
results are presented in Section 5and conclusions in Section 6.
2. Model and Control Problem Formulation
To investigate building control under demand response
(DR), we focus on controlling the heating, ventilation and air-
2
conditioning (HVAC) system during the summer when only
cooling is necessary; this work naturally extends to heating. In
this section we present the high-level HVAC model and con-
trol problem formulation. Details for setting up experiments
are presented in Section 5.
2.1. Building Thermal Dynamics
Studying building HVAC control requires a building thermal
dynamics model for 1) applying model-based control methods,
2) generating data to train learning-based control methods, and
3) evaluating any control method. We choose the discrete-time
model for a z-zone building
T(t+1) =AT(t)+Bdiag( ˙
m(t))(1zTsupply(t)T(t))+Gw(t),(1)
where tis the control step index, Tis the indoor temperature
in each zone i∈ {1, ..., z}:T=[T1, ..., Tz]>,˙
mis the mass
flow rate of air to each zone: ˙
m=[ ˙m1, ..., ˙mz]>, and Tsupply is
the temperature of the air supplied to all zones. The control
action uRz+1consists of ˙
mand Tsupply. The exogenous in-
put wRz+1consists of the outdoor temperature and per-zone
solar heat gain. The time-invariant system matrices ARz×z,
BRz×z, and GRz×(z+1) are identified from a higher fidelity
EnergyPlus [31] model, as done in [32]. Frequently used terms
are defined again in the nomenclature table for easy referenc-
ing.
The term diag( ˙
m(t))1zTsupply is a product among elements
of uso the building model is bilinear and thus nonlinear and
non-convex1. To simplify notation, we subsequently refer to
the dynamics as F, where
x(t+1) =F(x(t),u(t),w(t)),(2)
and x(t) is the building state, obtained via a linear transforma-
tion of T.
2.2. Power Consumption and Problem Cost
The HVAC power consumption is the sum of chiller and fan
power,
Ptotal(t)=Pchiller(t)+Pfan(t),(3)
Pchiller(t)=1>
z˙
m(t)
COP Tout(t)Tsupply(t),(4)
Pfan(t)=k11>
z˙
m(t)3+k2,(5)
where COP is the chiller’s coecient of performance, Tout is
the outdoor temperature, and k1and k2are known fan parame-
ters.
The cost cof one control step tis the weighted sum of the
electricity cost and a penalty for violating the thermal comfort
of building residents,
c(x(t),u(t),w(t)) =λ(t)Ptotal(t)τ+µ
z
X
i=1
P
[T(t),T(t)] Ti(t),(6)
1diag( ˙
m(t)) denotes a diagonal matrix with the elements of ˙
m(t) along the
diagonal and 1zdenotes a z×1 vector of ones.
where λis the price of electricity discussed in section 2.3 and
τis the duration of a control step. The penalty uses the band
deviation function which, for a range of desired values hx,xi, is
P
[x,x](x):=
max 0,xx
| {z }
Lower constraint violation
+max (0,xx)
| {z }
Upper constraint violation
2
.(7)
The thermal comfort band for our control problem, hT(t),T(t)i,
varies in time. Deviations from the band are penalized at a fixed
per-unit price µ.
2.3. Demand Response (DR) Programs
To thoroughly compare controller performance for building
control under DR, we consider three common DR programs:
time-of-use pricing (TOU), real-time pricing (RTP), and power-
constrained (PC). Each program results in a dierent price for
electricity λ(t). TOU uses a preset price for each time of day,
RTP uses a dynamically-varying price that depends on the hour-
by-hour wholesale electricity market, and PC uses a constant
price but includes an extra term in (6) to penalize consump-
tion above a given power limit during a power-constraint event.
Details of each DR program are presented in Section 5.1.1.
2.4. Building Control Problem Formulation
Building control minimizes the cost to keep the zone tem-
peratures within a comfort band over a N-step horizon while
satisfying the building thermal dynamics Fand constraints on
HVAC actuation:
min
u0,...,uN1
C=
N1
X
t=0
c(xt,ut,wt) (8a)
s.t. xt+1=F(xt,ut,wt),(8b)
xt∈ X,ut∈ U,(8c)
x0=x(0).(8d)
The initial building state is x(0). In general, we constrain x
and uto lie in sets Xand U, respectively. In this work, xis
unconstrained, X=Rz, and each element in uis constrained by
upper and lower bounds, making Ua box polyhedron in Rz+1.
As the first step for comparing a spectrum of building con-
trollers and to focus on their core capabilities, we assume:
A.1 There is no error in the building model. This translates
to using the same model in our controllers and building
simulation.
A.2 Forecasts are accurate up to a K-step lookahead. This
means at tthe disturbances {w(t),w(t+1), ..., w(t+K1)}
are accurately known. We investigate how the choice of K
aects each controller.
Given our baseline comparison, future work can investigate
the eect of model and forecast error, such as by integrating
[33]. By first comparing the control methods without these
errors, we make it possible for future work to isolate to what
extent one method outperforms another due to dierences in
baseline performance, sensitivity to model error, and sensitivity
to forecast error.
3
2.5. Measured signals
All controllers in this study rely on signals from the building
and environment. To enable a fair comparison, at step t, all
controllers use the following signals for decision-making:
current zone temperature T(t) or system state x(t),
current time step t,
K-step forecasts of disturbances from the current time,
wK(t)={w(t),w(t+1), ..., w(t+K1)},
DR-specific information Λ, as detailed in Section 5.
As will be presented in Section 3, some control methods
use these signals directly in an optimization formulation,
while others transform them into a feature vector, s(t)=
g(x(t),wK(t),Λ(t),t). Details of the mapping function g(·) are
given in Section 5.1.2.
3. Control Methods
3.1. Open-loop Optimal Control Baseline (OPT)
The control problem (8) can be solved in an open-loop man-
ner with no feedback by solving the problem once at time step
t=0 and then applying the resulting control actions to the sys-
tem — we do so using the numerical solver IPOPT [34]. Since
we assume there is no error in the building model and forecasts
of exogenous inputs (recall assumptions A.1 and A.2), this ap-
proach gives the optimal control actions. As such, we refer to
it as the optimal control baseline (OPT) and use it to evaluate
other control methods, which are approximations to OPT.
On a real system with modeling and forecast errors, solving
the open-loop problem (8) would work poorly, due to accumu-
lation of errors. The subsequent sections present the controllers
studied in this paper, which approximate OPT as shown in Fig-
ure 2. The approximation methods vary, in particular in their
use of the building model and learning.
3.2. Model Predictive Control (MPC)
Where OPT solves the control problem (8) once over the full
control horizon, model predictive control (MPC) instead solves
an analogous problem at every control step index tover a looka-
head horizon KN. This updates the optimal control action
online to account for any modeling mismatches and respond to
updated disturbance forecasts. The MPC problem is
min
ut,...,ut+K1
t+K1
X
k=t
c(xk,uk,wk) (9a)
s.t. xk+1=F(xk,uk,wk),(9b)
xk∈ X,uk∈ U,(9c)
xt=x(t),(9d)
where kis the lookahead step index and tis the control step
index. MPC solves problem (9) once for each value of tfrom
0 to N1. The non-convex optimization problem (9) is solved
online using IPOPT; once the solution, {u
t, ..., u
t+K1}, is found,
the first action is applied to the system (u(t)=u
t.) Then the
new state, x(t+1), is observed and problem (9) is solved again,
starting from time t+1.
MPC does not give the same control actions as OPT, because
the objective functions in problems (8) and (9) are not the same;
MPC is an approximation to OPT, meaning it gives suboptimal
control performance. Choosing the total number of lookahead
steps Krequires balancing control performance and computa-
tional cost. When we neglect error, a larger Kimproves the
approximation to OPT and thus improves control performance
but also increases computational cost. Note that when error
is included, increasing Kdoes not necessarily improve control
performance since it improves the approximation to OPT but
increases accumulation of error.
Beyond the choice of K, some MPC implementations aim to
improve control performance by adding a terminal cost func-
tion, ct+K(xt+K), which models the cost of ending in state xt+K.
The optimal terminal cost can in theory be found through dy-
namic programming [35], where a sequence of one-step control
problems are solved backwards in time (starting with t=N1)
using the Bellman equation. However, even for discrete-time
finite horizon problems such as (8), DP requires optimizing
over the space of functions and is thus an infinite-dimensional
problem. This makes it challenging to determine an appropriate
ct+K(xt+K); for simplicity, we do not include a terminal cost in
our MPC implementation. However, we explore using learning
to find the terminal cost in Section 3.4.
3.3. Convex Model Predictive Control (MPC-C)
Besides choosing a small lookahead horizon K, one can re-
duce the computational cost of MPC by using a simpler model,
such as a convex one. We denote MPC with a convex building
model as MPC-C. The convex model is the same as that de-
scribed in Section 2.1 except that the non-convex bilinear terms
in cand Fare linearized around the system’s current operating
point. The MPC-C problem is
min
ut,...,ut+K1
t+K1
X
k=t
˜c(xk,uk,wk) (10a)
s.t. xk+1=˜
F(xk,uk,wk),(10b)
xk∈ X,uk∈ U,(10c)
xt=x(t),(10d)
where ˜cand ˜
Fare the convex cost and dynamics.
3.4. Convex Model Predictive Control with Learned Terminal
Cost (MPC-CL)
Reinforced MPC (RL-MPC) is a hybrid control approach
that addresses the challenge of determining an appropriate ter-
minal cost function, ct+K(xt+K); this challenge was described in
Section 3.2. RL-MPC includes the terminal cost in MPC by
parameterizing the terminal cost, ct+K(xt+K;θ), and learning the
optimal parameters θoine through training. During training,
for a batch of data and control timestep one solves the MPC
problem over the lookahead horizon, evaluates the actual cost
4
摘要:

FromModel-BasedtoModel-Free:LearningBuildingControlforDemandResponseDavidBiagionia,c,XiangyuZhanga,,ChristianeAdcocka,b,MichaelSinnera,PeterGrafa,JenniferKingaaNationalRenewableEnergyLaboratory,Golden,CO80401,U.S.A.bStanfordUniversity,Stanford,CA94305,U.S.A.cMaplewellEnergy,Broomeld,CO80021,U.S.A....

展开>> 收起<<
From Model-Based to Model-Free Learning Building Control for Demand Response.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:2.63MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注