Energy Pricing in P2P Energy Systems Using Reinforcement Learning Nicolas Avila Shahad Hardan Elnura Zhalieva Moayad Aloqaily Mohsen Guizani

2025-05-06 0 0 1.19MB 6 页 10玖币

侵权投诉

Energy Pricing in P2P Energy Systems Using

Reinforcement Learning

Nicolas Avila§, Shahad Hardan§, Elnura Zhalieva§, Moayad Aloqaily, Mohsen Guizani

Mohamed Bin Zayed University of Artiﬁcial Intelligence (MBZUAI), UAE

E-mails: {nicolas.avila; shahad.hardan; elnura.zhalieva; moayad.aloqaily; mohsen.guizani}@mbzuai.ac.ae

Abstract—The increase in renewable energy on the consumer

side gives place to new dynamics in the energy grids. Participants

in a microgrid can produce energy and trade it with their peers

(peer-to-peer) with the permission of the energy provider. In such

a scenario, the stochastic nature of the distributed renewable

energy generators and the energy consumption increase the com-

plexity of deﬁning the fair prices for buying and selling energy.

In this study, we introduce a reinforcement learning framework

to help solving this issue by training an agent to set the prices

that maximize the proﬁt of all components in the microgrid,

aiming to facilitate the implementation of P2P grids in real-life

scenarios. The microgrid considers consumers, prosumers, the

service provider, and a community battery. Experimental results

on the Pymgrid dataset shows a successful approach to price

optimization for all components in the microgrid. The proposed

framework ensures ﬂexibility to account for the interest of these

components, as well as the ratio of consumers and prosumers in

the microgrid. The results also examine the effect of changing the

capacity of the community battery on the proﬁt of the system.

The implementation code is available here.

Index Terms—Energy Price, Smart Microgrid, Reinforcement

Learning, DQN

I. INTRODUCTION

Increasing demand for energy due to the population growth

and the devastating impact of conventional energy generation

sources on global warming spurred a surge of interest in

renewable energy sources (RES), mainly solar and wind en-

ergy. Worldwide growing investments in RES have encouraged

the consumers to locally install rooftop photo-voltaic (PV)

systems to lower their electricity bills and make a proﬁt by

trading surplus energy within the community. This advance-

ment changed the residential consumers into prosumers, who

can both cover their demands and provide electricity to other

consumers locally.

The use of prosumers’ energy alleviated the dependency

of the local energy market (LEM) on the utility grid and

grew into a peer-to-peer (P2P) energy marketplace. The P2P

is a platform where consumers and energy suppliers within or

beyond the community microgrid transact energy at the desired

price. Eight P2P energy trading pilots were reported around the

world in 2020, according to [1]. These implementations allow

the world to recognize the potential beneﬁts of deploying P2P

platforms and understand the requirements for their successful

realization.

§These authors contributed equally to this work

According to the plan of the United Nations and the

International Energy Agency [2], by 2030, countries should

aim to reduce 45% of their carbon emissions, and reach a

net zero emissions by 2050. In such a situation, P2P energy

markets could help accelerate the execution of the plan by in-

tegrating RES into the bulk power grid and leading the energy

industry toward a more sustainable direction. However, their

implementation is challenging due to the different regulations

in the energy sector according to each country. The stochastic

nature of RES is another factor that brings complexity.

Nonetheless, the primary challenge behind the realization

of P2P platforms lies in agreeing on a bidding strategy that

facilitates the trading of energy from prosumers to consumers

without affecting the service provider’s interests. Developing

an optimal dynamic pricing mechanism for P2P energy trading

could promote the emergence of more prosumers, and with

them, more P2P energy markets.

We aim to tackle the challenges of dynamic pricing in P2P

platforms by training a reinforcement learning (RL) agent that

learns an optimal pricing policy in real-time. We consider a

single microgrid that consists of prosumers and consumers,

an energy service provider, the legacy utility grid, and a

community battery. The list of main contributions presented

in the current work is summarized as follows:

•We deal with various stochastic processes in the mi-

crogrid, such as demand and generation of customers,

to provide a comprehensive problem formulation that

depicts the dynamics of a microgrid as accurately as

possible.

•We adopt the Deep Q-Networks (DQN) method to solve

the Markov Decision Process (MDP) problem. The goal is

to decide retail and purchase energy prices that minimize

the operation cost.

•We consider the nature of the microgrid and the individual

interest of its members to evaluate the sustainability and

ﬂexibility of our framework.

The paper follows this structure: Section II introduces and

compares the related works on dynamic pricing mechanisms.

Section III provides the MDP formulation for the dynamics of

the pricing problem for P2P energy trading platforms. After

that, we present the details about using the DQN algorithm

for determining optimal retail and purchase prices in section

IV. Lastly, in section V, we show the experiments and results.

arXiv:2210.13555v1 [cs.LG] 24 Oct 2022

II. RELATED WORK

In [3], Paudel and Gooi proposed a pricing strategy for a

P2P energy trading platform using the alternating direction

method of multiplier (ADMM). They introduced a P2P market

operator that offers prices and provides a platform for energy

trading from prosumers to consumers. Given the many factors

inﬂuencing the energy trading price, the authors chose to

focus more on the social welfare of the community microgrid.

They modeled the personal satisfaction of each household

considering consumption and used it to deﬁne a function

of welfare that depends on the per unit price of energy.

The optimization process determines the best energy platform

price that maximizes the aggregated welfare of the community

microgrid. However, this work did not consider the dynamic

nature of energy pricing and provided only a stable solution

that does not vary over time.

On the other hand, Kim, Zhang, et al. [4] were one of the

ﬁrst to apply RL to allow setting an optimal retail price based

on the dynamics of the customer behavior and the change in

electricity cost. More speciﬁcally, they formulated the dynamic

pricing problem as an MDP problem, where a service provider

decides the action of choosing a retail energy price at each

time step t. They deﬁned the cost as the weighted sum of

the service provider’s cost and the customers’ cost at each

time step. They solved this MDP problem by adopting a Q-

learning algorithm with some proposed improvements. Apart

from dynamic pricing, the authors also considered the case

where customers can schedule their energy consumption based

on the observed energy price to minimize their long-term cost,

which turns this problem into a multi-agent learning case.

However, the authors did not consider the prosumers’ energy

generation capability that largely inﬂuences the smart-grid

dynamics and impacts the retail energy price. Furthermore,

the Q-learning algorithm used in this work has high memory

space requirements to store the state-action values and takes

a long time to converge, making it inefﬁcient to apply with

bigger state spaces. Our formulation of the reward function is

inspired by this work.

Likewise, authors in [7] formulated multi-timescale dispatch

and scheduling for a smart-grid model as an MDP problem

considering the uncertainty of wind generation and energy

demand. Speciﬁcally, they proposed the dispatching and pric-

ing in two timescales: real-time and day-ahead scheduling.

While the authors made a vast contribution to the integration

of wind power into the bulk power grid, they did not consider

customers who can generate wind power and actively trade

energy with other customers within a smart grid.

Other approaches propose statistical regression models,

which identify the set of independent variables required for the

complex process of forecasting the electricity price. Authors

in [6] argue that there is no ﬁt-for-all set of variables and

hence narrowed down their scope by selecting 19 variables

based on the characteristics of the UK energy market. They

performed a multivariable regression using gradient boosting,

random forests, and XGBoost, where the task of each of the

models was to make an electricity price forecast 1-12 hours

ahead.

Instead of focusing on maximizing social welfare, Joe-

Wong, Sen, et al. [5] approached the price offerings optimizing

problem from the service provider’s point of view, maximizing

its revenue. By assessing consumers’ device-speciﬁc schedul-

ing ﬂexibility and modeling their willingness to shift the

energy consumption to off-peak periods, the authors formu-

lated an optimization problem to determine cost-minimizing

prices for service providers. The authors also argue that real-

time pricing is less customer friendly than day-ahead price

scheduling since it does not allow the customers to plan their

activities in advance and thus creates more uncertainty.

In Table I, we show the comparison of our proposed

framework with the previous studies on dynamic pricing

mechanisms for smart grid scenarios.

III. PROBLEM FORMULATION

In this work, we deﬁne a microgrid composed of a service

provider (SP), a set of prosumers P, a set of consumers C,

and a community battery. We consider a temporally dynamic

microgrid, where at each time step t, the SP adopts a retail

energy price at:R+7→ R+and a purchase energy price

pt:R+7→ R+. SP uses atto charge both consumers and

prosumers depending on their total load demand and uses

ptto calculate how much it has to pay to the prosumers

for their energy surplus. In other words, SP regulates both

the price to sell energy and the purchase price for which it

buys surplus energy from prosumers. Furthermore, SP can

also purchase the microgrid’s energy requirements from the

utility grid (UG) using a ﬁxed cost function. We also consider

a shared community battery that facilitates energy trading

within the microgrid by storing the surplus energy and partially

covering the customers’ demands when requested.

We assume that the set of retail pricing functions’ and

the set of purchase pricing functions’ coefﬁcients are both

Table I: Summary of Related Work (X: considered, - : not considered)

Approach Price prediction Real data Prosumers’ energy

generation capabilities Shared battery system

[3] Optimization (ADMM) X-X-

[5] Optimization X X - -

[6] ML Regression X X X -

[7] MDP X- - -

[4] RL (Q-Learning) X X - -

Our work RL (DQN) X X X X

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EnergyPricinginP2PEnergySystemsUsingReinforcementLearningNicolasAvila§,ShahadHardan§,ElnuraZhalieva§,MoayadAloqaily,MohsenGuizaniMohamedBinZayedUniversityofArticialIntelligence(MBZUAI),UAEE-mails:fnicolas.avila;shahad.hardan;elnura.zhalieva;moayad.aloqaily;mohsen.guizanig@mbzuai.ac.aeAbstractThein...

展开>> 收起<<

Energy Pricing in P2P Energy Systems Using Reinforcement Learning Nicolas Avila Shahad Hardan Elnura Zhalieva Moayad Aloqaily Mohsen Guizani.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Energy Pricing in P2P Energy Systems Using Reinforcement Learning Nicolas Avila Shahad Hardan Elnura Zhalieva Moayad Aloqaily Mohsen Guizani

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: