The Pump Scheduling Problem A Real-World Scenario for Reinforcement Learning Henrique Donâncio

2025-05-06 0 0 1.7MB 14 页 10玖币

侵权投诉

The Pump Scheduling Problem:

A Real-World Scenario for Reinforcement Learning

Henrique Donâncio

Normandie Université, INSA Rouen

LITIS

Rouen, France

henrique.donancio@insa-rouen.fr

Laurent Vercouter

Normandie Université, INSA Rouen

LITIS

Rouen, France

laurent.vercouter@insa-rouen.fr

Harald Roclawski

Technical University of Kaiserslautern

SAM

Kaiserlaustern, Germany

roclawsk@mv.uni-kl.de

Abstract

Deep Reinforcement Learning (DRL) has achieved remarkable success in scenarios

such as games and has emerged as a potential solution for control tasks. That is

due to its ability to leverage scalability and handle complex dynamics. However,

few works have targeted environments grounded in real-world settings. Indeed,

real-world scenarios can be challenging, especially when faced with the high

dimensionality of the state space and unknown reward function. We release a

testbed consisting of an environment simulator and demonstrations of human

operation concerning pump scheduling of a real-world water distribution facility

to facilitate research. The pump scheduling problem can be viewed as a decision

process to decide when to operate pumps to supply water while limiting electricity

consumption and meeting system constraints. To provide a starting point, we release

a well-documented codebase, present an overview of some challenges that can

be addressed and provide a baseline representation of the problem. The code and

dataset are available at https://gitlab.com/hdonancio/pumpscheduling

1 Introduction

The pump scheduling problem is a decision process to decide when to operate pumps where the

objective is to supply water while limiting electricity consumption and meeting safety constraints.

The strategies adopted can vary according to the particularities of the water distribution system. For

example, in some locations, the price of electricity may have different tariffs throughout the day. In

this case, storage tanks can supply water while the pumps operate in off-peak time windows to reduce

costs [

]. Moreover, the scheduling strategies must avoid switching the pump operation too frequently

to protect the assets and provide water exchange in the tanks to preserve its quality. Some works

that address the pump scheduling problem use methods like linear optimization, branch-and-bound,

and genetic algorithms [

–

]. However, some of these methods are limited to small water

networks due to their computational complexity. In other cases, the scheduling considers many steps

ahead with decisions based on observed patterns, not being able to handle unexpected situations. To

overcome this, in previous work [

] we have shown that Deep Reinforcement Learning (DRL) has

the potential to provide a data-driven solution that can be robust and scalable.

Preprint. Under review.

arXiv:2210.11111v1 [cs.LG] 20 Oct 2022

Reinforcement Learning (RL) [

] is a decision-making process where the agent learns to interact

with the environment aiming to maximize returns. Recent works include applications in autonomous

driving [

], robotics [

], video games [

–

], dialogue assistance [

], among others [

Currently, most of the research utilizes virtual environments such as Arcade Learning Environment

(ALE) [

] and DeepMind Control Suite [

], where typically a reward function and the state

representation are available. However, in real-world problems, the agent usually has no indication of

how efﬁcient are the actions it performs in the environment. In other cases, the rewards are sparse

and binary, only perceived when the agent achieves the task accomplishment [

]. Besides this,

most real-world scenarios have complex dynamics with tasks consisting of multiple steps and a

high-dimensional state space [

]. However, few works have focused on problems that can be shared

and reproduced among the community and grounded in real-world settings. Some examples include

robotics applications, where the limitation is having access to the assets.

We release in this work a simulator and a database related to pump operation in a real-world water

distribution system. The dataset has been gathered through sensors over three years in 1-min

timesteps from the water distribution facility controlled by humans. Moreover, we detail the operation

constraints given by specialists and discuss the current strategy applied to the system. Finally, we point

out some of the challenges that can be addressed and establish a baseline for the state representation

and reward function engineering. Our goal is that this testbed can represent a benchmark with

grounded real-world settings for RL branches, such as Learning from Demonstrations (LfD), (safe)

exploration, inverse RL, and state learning representation, among others. We summarize the main

contributions of this work below:

•

We release a RL testbed grounded in real-world settings based on a water distribution facility,

containing a simulator of the system and a dataset of human demonstrations collected over

three years in 1-minute timesteps.

•

We point out the characteristics of the water system such as constraints and the current

scheduling strategy being applied to control it and provide a baseline for problem represen-

tation as a Partially Observable Markov Decision Process (POMDP).

2 Water Distribution System

Figure 1: Water distribution system overview. The system has four pumps with ﬁxed speed (ON/OFF)

and two elevated tanks.

Figure 1 shows an overview of the water distribution system considered in this work. The system

collects and treats the raw water from wells before storing it in a reservoir. In the water utility, four

distribution pumps (NP1 to NP4) of different sizes are available for pumping the water through the

system network into two storage tanks. The start/stop control operates the pumps. Speed or throttle

control are not utilized. At most, only one pump can be running, and parallel operation is usually not

applied. However, it is possible in case of exceptionally high water demand. A decision process has

to decide on the operation of the most suitable pump regarding the water demand forecast, energy

consumption, water quality, security of supply, and operational reliability. The tanks are located

approximately 47m above the pump station. Both storage containers have identical dimensions and

can be treated as a single tank with a storage volume of 16000m

. The maximum water level is 10m.

Thus, the geodetic height between pumps and tank is [47, 57]m.

In the water distribution system, the human operation (behavioral policy) follows a strategy to

handle the pump schedule. Figure 2 shows the effects of this behavioral policy. As shown in

Figures 2(d), 2(e), 2(f), the operation ﬁlls the tank to a high level before the peak in water consumption

(see Figures 2(a), 2(b), 2(c)), and then they let it decrease along the day to provide water exchange

and keep water’s quality. A safety operation must guarantee the tank level with at least 3m ﬁlled

once it allows the system operators to handle unexpected situations. Figures 2(g), 2(h), 2(i), shows

the average daily pump switch per month. Each pump switch is considered either

OFF

, counting +1. Thus, we could say that the current pump operation generally uses each pump

at most once a day. Although it is difﬁcult to measure the impact of a strategy in preserving the

system’s assets, the idea is to minimize the amount of switching and provide a distribution pumps

usage. Finally, in Figures 2(j), 2(k), 2(l) we show the electricity consumption where given the pumps

settings, we can see the prioritization of the use of pump NP2.

3 A Partially Observable Markov Decision Process for the Pump Scheduling

Problem

The RL problem statement can be formalized as a POMDP [

], deﬁned by the tuple (

Ω

O,γ), where:

•Sis the state space;

•Ais the action space;

•P:S × A × S 7→ [0,1]

is the transition probability for being in some state

s∈ S

, perform

an action a∈ A and reach a state s0∈ S

•R:S × A 7→ Ris the reward function;

•O:S × A × Ω7→ [0,1]

is the probability to receive an observation

o∈Ω

about the next

state s0;

•Ωdenotes the observation space;

•γ∈[0,1]

is the discount factor that balances the relevance of immediate reward over rewards

in the future;

The objective in the PO(MDP) is to learn a policy

π:S7→ A

aiming maximize the expected discount

rewards

J(π) = E[PT

t=1 γtr(st, at)]

. In previous work [

], we cast the pump scheduling problem

for a water system as an episodic POMDP. We consider the daily cycle pattern observed in the water

consumption to deﬁne the episode length. Thus, each episode has a length of 1440 timesteps or one

day of operation. The state/observation space and reward were partially based on data that sensors

could gather, such as the tank level, water consumption, ﬂow rate

, and hydraulic head

. The

action space

represents the discrete set of pumps (NP1 to NP4) and the option to turn all of them

off (NOP). These pumps have different ﬂow rates

, which leads to distinct electricity consumption

. Thus, the decision process consists of deﬁning a policy that meets the water demand while

limiting electricity consumption and satisfying safety constraints. Below, we review the state-action

space and the proposed reward function.

State/Observation

: The states/observation is represented by the

tank level(t)∈[47,57]

and a given

water consumption(t)

for a

time of day (t)

, the respective

month

, the

action ∈ A

t−1

, the

cumulative

time running ∈[0,1440]

of pumps along with an episode, and a binary value called

water quality that indicates if during the episode the tank level achieved a level lower than 53m.

The state’s representation is responsible for providing information to achieve the desired behavior

through the reward function presented next. Thus, the features were selected aiming to:

•

Provide information regarding the system’s current state, such as tank level and water

consumption;

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ThePumpSchedulingProblem:AReal-WorldScenarioforReinforcementLearningHenriqueDonâncioNormandieUniversité,INSARouenLITISRouen,Francehenrique.donancio@insa-rouen.frLaurentVercouterNormandieUniversité,INSARouenLITISRouen,Francelaurent.vercouter@insa-rouen.frHaraldRoclawskiTechnicalUniversityofKaiserslau...

展开>> 收起<<

The Pump Scheduling Problem A Real-World Scenario for Reinforcement Learning Henrique Donâncio.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

The Pump Scheduling Problem A Real-World Scenario for Reinforcement Learning Henrique Donâncio

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: