to decide on the operation of the most suitable pump regarding the water demand forecast, energy
consumption, water quality, security of supply, and operational reliability. The tanks are located
approximately 47m above the pump station. Both storage containers have identical dimensions and
can be treated as a single tank with a storage volume of 16000m
³
. The maximum water level is 10m.
Thus, the geodetic height between pumps and tank is [47, 57]m.
In the water distribution system, the human operation (behavioral policy) follows a strategy to
handle the pump schedule. Figure 2 shows the effects of this behavioral policy. As shown in
Figures 2(d), 2(e), 2(f), the operation fills the tank to a high level before the peak in water consumption
(see Figures 2(a), 2(b), 2(c)), and then they let it decrease along the day to provide water exchange
and keep water’s quality. A safety operation must guarantee the tank level with at least 3m filled
once it allows the system operators to handle unexpected situations. Figures 2(g), 2(h), 2(i), shows
the average daily pump switch per month. Each pump switch is considered either
ON
to
OFF
or
OFF
to
ON
, counting +1. Thus, we could say that the current pump operation generally uses each pump
at most once a day. Although it is difficult to measure the impact of a strategy in preserving the
system’s assets, the idea is to minimize the amount of switching and provide a distribution pumps
usage. Finally, in Figures 2(j), 2(k), 2(l) we show the electricity consumption where given the pumps
settings, we can see the prioritization of the use of pump NP2.
3 A Partially Observable Markov Decision Process for the Pump Scheduling
Problem
The RL problem statement can be formalized as a POMDP [
8
], defined by the tuple (
S
,
A
,
P
,
R
,
Ω
,
O,γ), where:
•Sis the state space;
•Ais the action space;
•P:S × A × S 7→ [0,1]
is the transition probability for being in some state
s∈ S
, perform
an action a∈ A and reach a state s0∈ S
•R:S × A 7→ Ris the reward function;
•O:S × A × Ω7→ [0,1]
is the probability to receive an observation
o∈Ω
about the next
state s0;
•Ωdenotes the observation space;
•γ∈[0,1]
is the discount factor that balances the relevance of immediate reward over rewards
in the future;
The objective in the PO(MDP) is to learn a policy
π:S7→ A
aiming maximize the expected discount
rewards
J(π) = E[PT
t=1 γtr(st, at)]
. In previous work [
7
], we cast the pump scheduling problem
for a water system as an episodic POMDP. We consider the daily cycle pattern observed in the water
consumption to define the episode length. Thus, each episode has a length of 1440 timesteps or one
day of operation. The state/observation space and reward were partially based on data that sensors
could gather, such as the tank level, water consumption, flow rate
Q
, and hydraulic head
H
. The
action space
A
represents the discrete set of pumps (NP1 to NP4) and the option to turn all of them
off (NOP). These pumps have different flow rates
Q
, which leads to distinct electricity consumption
kW
. Thus, the decision process consists of defining a policy that meets the water demand while
limiting electricity consumption and satisfying safety constraints. Below, we review the state-action
space and the proposed reward function.
State/Observation
: The states/observation is represented by the
tank level(t)∈[47,57]
and a given
water consumption(t)
for a
time of day (t)
, the respective
month
, the
action ∈ A
at
t−1
, the
cumulative
time running ∈[0,1440]
of pumps along with an episode, and a binary value called
water quality that indicates if during the episode the tank level achieved a level lower than 53m.
The state’s representation is responsible for providing information to achieve the desired behavior
through the reward function presented next. Thus, the features were selected aiming to:
•
Provide information regarding the system’s current state, such as tank level and water
consumption;
3