The Pump Scheduling Problem A Real-World Scenario for Reinforcement Learning Henrique Donâncio

2025-05-06 0 0 1.7MB 14 页 10玖币
侵权投诉
The Pump Scheduling Problem:
A Real-World Scenario for Reinforcement Learning
Henrique Donâncio
Normandie Université, INSA Rouen
LITIS
Rouen, France
henrique.donancio@insa-rouen.fr
Laurent Vercouter
Normandie Université, INSA Rouen
LITIS
Rouen, France
laurent.vercouter@insa-rouen.fr
Harald Roclawski
Technical University of Kaiserslautern
SAM
Kaiserlaustern, Germany
roclawsk@mv.uni-kl.de
Abstract
Deep Reinforcement Learning (DRL) has achieved remarkable success in scenarios
such as games and has emerged as a potential solution for control tasks. That is
due to its ability to leverage scalability and handle complex dynamics. However,
few works have targeted environments grounded in real-world settings. Indeed,
real-world scenarios can be challenging, especially when faced with the high
dimensionality of the state space and unknown reward function. We release a
testbed consisting of an environment simulator and demonstrations of human
operation concerning pump scheduling of a real-world water distribution facility
to facilitate research. The pump scheduling problem can be viewed as a decision
process to decide when to operate pumps to supply water while limiting electricity
consumption and meeting system constraints. To provide a starting point, we release
a well-documented codebase, present an overview of some challenges that can
be addressed and provide a baseline representation of the problem. The code and
dataset are available at https://gitlab.com/hdonancio/pumpscheduling
1 Introduction
The pump scheduling problem is a decision process to decide when to operate pumps where the
objective is to supply water while limiting electricity consumption and meeting safety constraints.
The strategies adopted can vary according to the particularities of the water distribution system. For
example, in some locations, the price of electricity may have different tariffs throughout the day. In
this case, storage tanks can supply water while the pumps operate in off-peak time windows to reduce
costs [
1
]. Moreover, the scheduling strategies must avoid switching the pump operation too frequently
to protect the assets and provide water exchange in the tanks to preserve its quality. Some works
that address the pump scheduling problem use methods like linear optimization, branch-and-bound,
and genetic algorithms [
2
4
,
1
,
5
,
6
]. However, some of these methods are limited to small water
networks due to their computational complexity. In other cases, the scheduling considers many steps
ahead with decisions based on observed patterns, not being able to handle unexpected situations. To
overcome this, in previous work [
7
] we have shown that Deep Reinforcement Learning (DRL) has
the potential to provide a data-driven solution that can be robust and scalable.
Preprint. Under review.
arXiv:2210.11111v1 [cs.LG] 20 Oct 2022
Reinforcement Learning (RL) [
8
] is a decision-making process where the agent learns to interact
with the environment aiming to maximize returns. Recent works include applications in autonomous
driving [
9
], robotics [
10
,
11
], video games [
12
15
], dialogue assistance [
16
], among others [
17
].
Currently, most of the research utilizes virtual environments such as Arcade Learning Environment
(ALE) [
18
] and DeepMind Control Suite [
19
], where typically a reward function and the state
representation are available. However, in real-world problems, the agent usually has no indication of
how efficient are the actions it performs in the environment. In other cases, the rewards are sparse
and binary, only perceived when the agent achieves the task accomplishment [
20
]. Besides this,
most real-world scenarios have complex dynamics with tasks consisting of multiple steps and a
high-dimensional state space [
11
]. However, few works have focused on problems that can be shared
and reproduced among the community and grounded in real-world settings. Some examples include
robotics applications, where the limitation is having access to the assets.
We release in this work a simulator and a database related to pump operation in a real-world water
distribution system. The dataset has been gathered through sensors over three years in 1-min
timesteps from the water distribution facility controlled by humans. Moreover, we detail the operation
constraints given by specialists and discuss the current strategy applied to the system. Finally, we point
out some of the challenges that can be addressed and establish a baseline for the state representation
and reward function engineering. Our goal is that this testbed can represent a benchmark with
grounded real-world settings for RL branches, such as Learning from Demonstrations (LfD), (safe)
exploration, inverse RL, and state learning representation, among others. We summarize the main
contributions of this work below:
We release a RL testbed grounded in real-world settings based on a water distribution facility,
containing a simulator of the system and a dataset of human demonstrations collected over
three years in 1-minute timesteps.
We point out the characteristics of the water system such as constraints and the current
scheduling strategy being applied to control it and provide a baseline for problem represen-
tation as a Partially Observable Markov Decision Process (POMDP).
2 Water Distribution System
Figure 1: Water distribution system overview. The system has four pumps with fixed speed (ON/OFF)
and two elevated tanks.
Figure 1 shows an overview of the water distribution system considered in this work. The system
collects and treats the raw water from wells before storing it in a reservoir. In the water utility, four
distribution pumps (NP1 to NP4) of different sizes are available for pumping the water through the
system network into two storage tanks. The start/stop control operates the pumps. Speed or throttle
control are not utilized. At most, only one pump can be running, and parallel operation is usually not
applied. However, it is possible in case of exceptionally high water demand. A decision process has
2
to decide on the operation of the most suitable pump regarding the water demand forecast, energy
consumption, water quality, security of supply, and operational reliability. The tanks are located
approximately 47m above the pump station. Both storage containers have identical dimensions and
can be treated as a single tank with a storage volume of 16000m
³
. The maximum water level is 10m.
Thus, the geodetic height between pumps and tank is [47, 57]m.
In the water distribution system, the human operation (behavioral policy) follows a strategy to
handle the pump schedule. Figure 2 shows the effects of this behavioral policy. As shown in
Figures 2(d), 2(e), 2(f), the operation fills the tank to a high level before the peak in water consumption
(see Figures 2(a), 2(b), 2(c)), and then they let it decrease along the day to provide water exchange
and keep water’s quality. A safety operation must guarantee the tank level with at least 3m filled
once it allows the system operators to handle unexpected situations. Figures 2(g), 2(h), 2(i), shows
the average daily pump switch per month. Each pump switch is considered either
ON
to
OFF
or
OFF
to
ON
, counting +1. Thus, we could say that the current pump operation generally uses each pump
at most once a day. Although it is difficult to measure the impact of a strategy in preserving the
system’s assets, the idea is to minimize the amount of switching and provide a distribution pumps
usage. Finally, in Figures 2(j), 2(k), 2(l) we show the electricity consumption where given the pumps
settings, we can see the prioritization of the use of pump NP2.
3 A Partially Observable Markov Decision Process for the Pump Scheduling
Problem
The RL problem statement can be formalized as a POMDP [
8
], defined by the tuple (
S
,
A
,
P
,
R
,
,
O,γ), where:
Sis the state space;
Ais the action space;
P:S × A × S 7→ [0,1]
is the transition probability for being in some state
s∈ S
, perform
an action a A and reach a state s0∈ S
R:S × A 7→ Ris the reward function;
O:S × A × 7→ [0,1]
is the probability to receive an observation
o
about the next
state s0;
denotes the observation space;
γ[0,1]
is the discount factor that balances the relevance of immediate reward over rewards
in the future;
The objective in the PO(MDP) is to learn a policy
π:S7→ A
aiming maximize the expected discount
rewards
J(π) = E[PT
t=1 γtr(st, at)]
. In previous work [
7
], we cast the pump scheduling problem
for a water system as an episodic POMDP. We consider the daily cycle pattern observed in the water
consumption to define the episode length. Thus, each episode has a length of 1440 timesteps or one
day of operation. The state/observation space and reward were partially based on data that sensors
could gather, such as the tank level, water consumption, flow rate
Q
, and hydraulic head
H
. The
action space
A
represents the discrete set of pumps (NP1 to NP4) and the option to turn all of them
off (NOP). These pumps have different flow rates
Q
, which leads to distinct electricity consumption
kW
. Thus, the decision process consists of defining a policy that meets the water demand while
limiting electricity consumption and satisfying safety constraints. Below, we review the state-action
space and the proposed reward function.
State/Observation
: The states/observation is represented by the
tank level(t)[47,57]
and a given
water consumption(t)
for a
time of day (t)
, the respective
month
, the
action ∈ A
at
t1
, the
cumulative
time running [0,1440]
of pumps along with an episode, and a binary value called
water quality that indicates if during the episode the tank level achieved a level lower than 53m.
The state’s representation is responsible for providing information to achieve the desired behavior
through the reward function presented next. Thus, the features were selected aiming to:
Provide information regarding the system’s current state, such as tank level and water
consumption;
3
摘要:

ThePumpSchedulingProblem:AReal-WorldScenarioforReinforcementLearningHenriqueDonâncioNormandieUniversité,INSARouenLITISRouen,Francehenrique.donancio@insa-rouen.frLaurentVercouterNormandieUniversité,INSARouenLITISRouen,Francelaurent.vercouter@insa-rouen.frHaraldRoclawskiTechnicalUniversityofKaiserslau...

展开>> 收起<<
The Pump Scheduling Problem A Real-World Scenario for Reinforcement Learning Henrique Donâncio.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:1.7MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注