Energy suppliers need to maintain an equilibrium point between supply and demand, since producing ex-
cessive amounts of energy will result in energy wastage. In contrast, failure to meet consumers’ demands may
lead to the need to purchase energy at higher rates; otherwise, frequent blackouts will happen. Therefore,
various load forecasting techniques have been considered for electricity networks’ efficient and reliable opera-
tion. Statistical forecasting methods, e.g., multiple linear regression (MLR), autoregressive (AR), and moving
average (MA) techniques, were used to project past and present load profiles into future predictions. Later,
the introduction of smart metering and the evolution of artificial intelligence (AI) technology paved the way
for replacing traditional prediction techniques with various machine learning (ML) algorithms, due to their
ability in analysing large amounts of datasets in short periods of time while providing impressive accuracy
levels [3]. Advanced metering infrastructure (AMI), a system of smart meters connected to a communication
network for two-way communications between customers and utility companies, is the first step toward smart
energy, which helps collect and analyse smart-meter data. However, collecting consumers’ load profiles into
a central entity to conduct energy forecasting raises privacy concerns. Individuals’ load information could be
misused by revealing consumer habits and household occupancy.
To address this issue, the ML community recently introduced a new learning paradigm termed as federated
learning (FL) [4]. The FL is analogous to the concept of distributed learning in terms of handling enormous
datasets and developing efficient and scalable systems. However, maintaining data privacy is the goal of FL
as it does not involve the collection of data in a central location, but instead it sends the model to the clients
where the data is generated. The FL framework is orchestrated by a server placed in a central entity, i.e.,
an energy supplier, to train and improve a shared model with many clients collaboratively. Two typical FL
architectures exist based on the scale of the federation. The first is cross-device, where the number of clients
may be massive, for example, consumers’ smart meters. The second is cross-silo, which considers relatively
limited and reliable clients, for example, substations. The FL process starts by initialising a global model
in the server and then sending it to the clients to conduct model training. Once completed, the clients send
back the model updates to the server, which will aggregate them, resulting in an updated model. Then, the
updated model is sent to the clients for another training round. This process is repeated until the limit of
communication rounds is reached, or the model achieves the desired accuracy.
The use of FL in energy forecasting is still in very early stages, and few studies have considered this
approach [5, 6, 7, 8, 9]. In these studies, the authors focus on utilising long short-term memory (LSTM)
architectures, a type of recurrent neural network (RNN) used in the field of deep learning (DL), due to their
remarkable performance in predicting time-series data sequences. However, the mentioned works overlook a
critical issue: DL models are extremely resource-consuming (energy, memory, processor, etc.), and the lengthy
and extensive underlying mathematical operations demand resource-rich hardware. Considering such schemes
of combining FL with LSTM models requires extended computation time to reach the desired precision and
impede their scalability. Furthermore, individual households are the focal point of the abovementioned studies
to be used as FL clients. Conversely, our study applies FL at the substation level, i.e., a part of the power
2