Energy Pricing in P2P Energy Systems Using Reinforcement Learning Nicolas Avila Shahad Hardan Elnura Zhalieva Moayad Aloqaily Mohsen Guizani

2025-05-06 0 0 1.19MB 6 页 10玖币
侵权投诉
Energy Pricing in P2P Energy Systems Using
Reinforcement Learning
Nicolas Avila§, Shahad Hardan§, Elnura Zhalieva§, Moayad Aloqaily, Mohsen Guizani
Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), UAE
E-mails: {nicolas.avila; shahad.hardan; elnura.zhalieva; moayad.aloqaily; mohsen.guizani}@mbzuai.ac.ae
Abstract—The increase in renewable energy on the consumer
side gives place to new dynamics in the energy grids. Participants
in a microgrid can produce energy and trade it with their peers
(peer-to-peer) with the permission of the energy provider. In such
a scenario, the stochastic nature of the distributed renewable
energy generators and the energy consumption increase the com-
plexity of defining the fair prices for buying and selling energy.
In this study, we introduce a reinforcement learning framework
to help solving this issue by training an agent to set the prices
that maximize the profit of all components in the microgrid,
aiming to facilitate the implementation of P2P grids in real-life
scenarios. The microgrid considers consumers, prosumers, the
service provider, and a community battery. Experimental results
on the Pymgrid dataset shows a successful approach to price
optimization for all components in the microgrid. The proposed
framework ensures flexibility to account for the interest of these
components, as well as the ratio of consumers and prosumers in
the microgrid. The results also examine the effect of changing the
capacity of the community battery on the profit of the system.
The implementation code is available here.
Index Terms—Energy Price, Smart Microgrid, Reinforcement
Learning, DQN
I. INTRODUCTION
Increasing demand for energy due to the population growth
and the devastating impact of conventional energy generation
sources on global warming spurred a surge of interest in
renewable energy sources (RES), mainly solar and wind en-
ergy. Worldwide growing investments in RES have encouraged
the consumers to locally install rooftop photo-voltaic (PV)
systems to lower their electricity bills and make a profit by
trading surplus energy within the community. This advance-
ment changed the residential consumers into prosumers, who
can both cover their demands and provide electricity to other
consumers locally.
The use of prosumers’ energy alleviated the dependency
of the local energy market (LEM) on the utility grid and
grew into a peer-to-peer (P2P) energy marketplace. The P2P
is a platform where consumers and energy suppliers within or
beyond the community microgrid transact energy at the desired
price. Eight P2P energy trading pilots were reported around the
world in 2020, according to [1]. These implementations allow
the world to recognize the potential benefits of deploying P2P
platforms and understand the requirements for their successful
realization.
§These authors contributed equally to this work
According to the plan of the United Nations and the
International Energy Agency [2], by 2030, countries should
aim to reduce 45% of their carbon emissions, and reach a
net zero emissions by 2050. In such a situation, P2P energy
markets could help accelerate the execution of the plan by in-
tegrating RES into the bulk power grid and leading the energy
industry toward a more sustainable direction. However, their
implementation is challenging due to the different regulations
in the energy sector according to each country. The stochastic
nature of RES is another factor that brings complexity.
Nonetheless, the primary challenge behind the realization
of P2P platforms lies in agreeing on a bidding strategy that
facilitates the trading of energy from prosumers to consumers
without affecting the service provider’s interests. Developing
an optimal dynamic pricing mechanism for P2P energy trading
could promote the emergence of more prosumers, and with
them, more P2P energy markets.
We aim to tackle the challenges of dynamic pricing in P2P
platforms by training a reinforcement learning (RL) agent that
learns an optimal pricing policy in real-time. We consider a
single microgrid that consists of prosumers and consumers,
an energy service provider, the legacy utility grid, and a
community battery. The list of main contributions presented
in the current work is summarized as follows:
We deal with various stochastic processes in the mi-
crogrid, such as demand and generation of customers,
to provide a comprehensive problem formulation that
depicts the dynamics of a microgrid as accurately as
possible.
We adopt the Deep Q-Networks (DQN) method to solve
the Markov Decision Process (MDP) problem. The goal is
to decide retail and purchase energy prices that minimize
the operation cost.
We consider the nature of the microgrid and the individual
interest of its members to evaluate the sustainability and
flexibility of our framework.
The paper follows this structure: Section II introduces and
compares the related works on dynamic pricing mechanisms.
Section III provides the MDP formulation for the dynamics of
the pricing problem for P2P energy trading platforms. After
that, we present the details about using the DQN algorithm
for determining optimal retail and purchase prices in section
IV. Lastly, in section V, we show the experiments and results.
arXiv:2210.13555v1 [cs.LG] 24 Oct 2022
II. RELATED WORK
In [3], Paudel and Gooi proposed a pricing strategy for a
P2P energy trading platform using the alternating direction
method of multiplier (ADMM). They introduced a P2P market
operator that offers prices and provides a platform for energy
trading from prosumers to consumers. Given the many factors
influencing the energy trading price, the authors chose to
focus more on the social welfare of the community microgrid.
They modeled the personal satisfaction of each household
considering consumption and used it to define a function
of welfare that depends on the per unit price of energy.
The optimization process determines the best energy platform
price that maximizes the aggregated welfare of the community
microgrid. However, this work did not consider the dynamic
nature of energy pricing and provided only a stable solution
that does not vary over time.
On the other hand, Kim, Zhang, et al. [4] were one of the
first to apply RL to allow setting an optimal retail price based
on the dynamics of the customer behavior and the change in
electricity cost. More specifically, they formulated the dynamic
pricing problem as an MDP problem, where a service provider
decides the action of choosing a retail energy price at each
time step t. They defined the cost as the weighted sum of
the service provider’s cost and the customers’ cost at each
time step. They solved this MDP problem by adopting a Q-
learning algorithm with some proposed improvements. Apart
from dynamic pricing, the authors also considered the case
where customers can schedule their energy consumption based
on the observed energy price to minimize their long-term cost,
which turns this problem into a multi-agent learning case.
However, the authors did not consider the prosumers’ energy
generation capability that largely influences the smart-grid
dynamics and impacts the retail energy price. Furthermore,
the Q-learning algorithm used in this work has high memory
space requirements to store the state-action values and takes
a long time to converge, making it inefficient to apply with
bigger state spaces. Our formulation of the reward function is
inspired by this work.
Likewise, authors in [7] formulated multi-timescale dispatch
and scheduling for a smart-grid model as an MDP problem
considering the uncertainty of wind generation and energy
demand. Specifically, they proposed the dispatching and pric-
ing in two timescales: real-time and day-ahead scheduling.
While the authors made a vast contribution to the integration
of wind power into the bulk power grid, they did not consider
customers who can generate wind power and actively trade
energy with other customers within a smart grid.
Other approaches propose statistical regression models,
which identify the set of independent variables required for the
complex process of forecasting the electricity price. Authors
in [6] argue that there is no fit-for-all set of variables and
hence narrowed down their scope by selecting 19 variables
based on the characteristics of the UK energy market. They
performed a multivariable regression using gradient boosting,
random forests, and XGBoost, where the task of each of the
models was to make an electricity price forecast 1-12 hours
ahead.
Instead of focusing on maximizing social welfare, Joe-
Wong, Sen, et al. [5] approached the price offerings optimizing
problem from the service provider’s point of view, maximizing
its revenue. By assessing consumers’ device-specific schedul-
ing flexibility and modeling their willingness to shift the
energy consumption to off-peak periods, the authors formu-
lated an optimization problem to determine cost-minimizing
prices for service providers. The authors also argue that real-
time pricing is less customer friendly than day-ahead price
scheduling since it does not allow the customers to plan their
activities in advance and thus creates more uncertainty.
In Table I, we show the comparison of our proposed
framework with the previous studies on dynamic pricing
mechanisms for smart grid scenarios.
III. PROBLEM FORMULATION
In this work, we define a microgrid composed of a service
provider (SP), a set of prosumers P, a set of consumers C,
and a community battery. We consider a temporally dynamic
microgrid, where at each time step t, the SP adopts a retail
energy price at:R+7→ R+and a purchase energy price
pt:R+7→ R+. SP uses atto charge both consumers and
prosumers depending on their total load demand and uses
ptto calculate how much it has to pay to the prosumers
for their energy surplus. In other words, SP regulates both
the price to sell energy and the purchase price for which it
buys surplus energy from prosumers. Furthermore, SP can
also purchase the microgrid’s energy requirements from the
utility grid (UG) using a fixed cost function. We also consider
a shared community battery that facilitates energy trading
within the microgrid by storing the surplus energy and partially
covering the customers’ demands when requested.
We assume that the set of retail pricing functions’ and
the set of purchase pricing functions’ coefficients are both
Table I: Summary of Related Work (X: considered, - : not considered)
Approach Price prediction Real data Prosumers’ energy
generation capabilities Shared battery system
[3] Optimization (ADMM) X-X-
[5] Optimization X X - -
[6] ML Regression X X X -
[7] MDP X- - -
[4] RL (Q-Learning) X X - -
Our work RL (DQN) X X X X
摘要:

EnergyPricinginP2PEnergySystemsUsingReinforcementLearningNicolasAvila§,ShahadHardan§,ElnuraZhalieva§,MoayadAloqaily,MohsenGuizaniMohamedBinZayedUniversityofArticialIntelligence(MBZUAI),UAEE-mails:fnicolas.avila;shahad.hardan;elnura.zhalieva;moayad.aloqaily;mohsen.guizanig@mbzuai.ac.aeAbstract—Thein...

展开>> 收起<<
Energy Pricing in P2P Energy Systems Using Reinforcement Learning Nicolas Avila Shahad Hardan Elnura Zhalieva Moayad Aloqaily Mohsen Guizani.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:1.19MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注