Non-intrusive Load Monitoring b ased on Self- supervised Learning Shuyi Chen Student Member IEEE Bochao Zhao Member IEEE Mingjun Zhong Member IEEE Wenpeng

2025-05-02 1 0 883.29KB 12 页 10玖币

侵权投诉

Non-intrusive Load Monitoring based on Self-

supervised Learning

Shuyi Chen, Student Member, IEEE, Bochao Zhao, Member, IEEE, Mingjun Zhong, Member, IEEE, Wenpeng

Luan*, Senior Member, IEEE, and Yixin Yu, Life Senior Member, IEEE

Abstract—Deep learning models for non-intrusive load

monitoring (NILM) tend to require a large amount of labeled data

for training. However, it is difficult to generalize the trained

models to unseen sites due to different load characteristics and

operating patterns of appliances between data sets. For addressing

such problems, self-supervised learning (SSL) is proposed in this

paper, where labeled appliance-level data from the target data set

or house is not required. Initially, only the aggregate power

readings from target data set are required to pre-train a general

network via a self-supervised pretext task to map aggregate power

sequences to derived representatives. Then, supervised

downstream tasks are carried out for each appliance category to

fine-tune the pre-trained network, where the features learned in

the pretext task are transferred. Utilizing labeled source data sets

enables the downstream tasks to learn how each load is

disaggregated, by mapping the aggregate to labels. Finally, the

fine-tuned network is applied to load disaggregation for the target

sites. For validation, multiple experimental cases are designed

based on three publicly accessible REDD, UK-DALE, and REFIT

data sets. Besides, state-of-the-art neural networks are employed

to perform NILM task in the experiments. Based on the NILM

results in various cases, SSL generally outperforms zero-shot

learning in improving load disaggregation performance without

any sub-metering data from the target data sets.

Index Terms—Non-intrusive load monitoring, deep neural

network, self-supervised learning, sequence-to-point learning.

I. INTRODUCTION

N recent years, energy shortage and environmental

pollution worldwide have become increasingly serious.

Therefore, the approaches of efficient energy utilization

and carbon emissions reduction are being explored [1], [2].

Meanwhile, with the global deployment of smart meters, benign

interaction between power suppliers and users has been

established for enhancing demand side management and

optimizing power grid operation [3]. As one of the energy

conservation applications, electricity consumption detail

monitoring has attracted extensive attention around the world

[4]. In general, load monitoring technology is mainly

categorized into intrusive way and non-intrusive way. Note that

intrusive load monitoring requires extra sensor installation for

sub-metering. Alternatively, the concept of non-intrusive load

monitoring (NILM) was proposed by Hart [5] in 1984 as

This work was supported in part by the Joint Funds of the National Natural

Science Foundation of China (No. U2066207) and the National Key Research

and Development Program of China (No. 2020YFB0905904). (Corresponding

author: W. Luan)

identifying power consumed by each individual appliance via

analyzing aggregate power readings using only software tools.

NILM offers appliance-level power consumption feedback to

both demand and supply sides economically and efficiently,

contributing to power system planning and operation [1],

energy bill savings [6], demand side management [7], energy

conservation and emission reduction [3], [6], [8], etc.

NILM is a single-channel blind source separation problem,

aiming to disaggregate the appliance-level energy consumption

from the aggregate measurements [9]. Combinatorial

optimization (CO) is initially applied to perform NILM in [5],

searching for the best combination of operational states of

individual appliances at each time instance. However, CO relies

on the power range of each operational state as prior

knowledge, making it unavailable to the newly added

appliances [10]. Benefiting from the technology development

in recent years on big data, artificial intelligence and edge

computing, plenty of NILM approaches have been proposed

based on machine learning, mathematics, and signal processing

[8], [11]. Factorial hidden Markov model (FHMM) and its

variants [12]-[14] are popular in carrying out NILM. Given an

aggregate power signal as the observation, such FHMM-based

NILM methods estimate the hidden operational states of each

appliance considering their state continuity in time-series [15],

[16]. Thus, FHMM-based methods usually achieve good results

in disaggregating loads with periodic operation such as

refrigerators. However, their performance is limited for the

loads with short-lasting working cycles and the ones with less

frequent usage. Note that FHMM-based methods are regarded

as state-based NILM approaches, where the aggregate power

measurement at each time instance is assigned to each

operational state per appliance [17]. Alternatively, NILM

approaches can be event-based, where sudden changes in power

signals referring to turn-on, turn-off, and state transition events

are featured [17]. Such event-based NILM methods can be

carried out via subtractive clustering and the maximum

likelihood classifier [18]. Besides, graph signal processing

concepts are applied to perform NILM, mapping correlation

among samples to the underlying graph structure [19], [20].

Although such event-based NILM approaches can achieve high

load identification accuracy, they tend to suffer from

S. Chen, B. Zhao, W. Luan and Y. Yu is with the School of Electrical and

Information Engineering, Tianjin University, Tianjin 300072, China (e-mail:

wenpeng.luan@tju.edu.cn).

M. Zhong is with the Department of Computing Science, University of

Aberdeen, Aberdeen, the UK (e-mail: mingjun.zhong@abdn.ac.uk).

measurement noises.

Deep neural networks (DNN), performing well in computer

vision, speech recognition, and natural language processing,

have been employed in load disaggregation since 2015 [21].

Since then, DNN-based NILM approaches become more and

more popular, including long short-term memory (LSTM) [15],

[21], gated recurrent unit (GRU) [10], [22], denoising

autoencoder (dAE) [21], [23] and convolutional neural network

(CNN) [24], [25], etc., showing competitive performance

against traditional NILM methods. Although LSTM is suitable

for long time-series related tasks due to avoiding the vanishing

gradient problem, it underperforms in NILM task compared to

CNN [26], [27]. As a variant of LSTM, GRU can also

remember data patterns. In addition, GRU contains fewer

parameters, thus requires shorter training time, which is suitable

for online application in NILM [10], [28]. Note that the

bidirectional gated recurrent unit (Bi-GRU) is employed to

perform NILM in [10], where the network can be trained

simultaneously in positive and negative time directions.

Besides, dAE is applied to NILM by recovering the power

signal of target appliances (clean signal) from the aggregate

(noisy signal) [8], [21], where CNN layers are usually

embedded [21], [23]. A state-of-the-art CNN-based NILM

method, S2p, is proposed in [9] and claimed to outperform

benchmarks in NILM task [9], [29], beneficial from meaningful

latent features learned from sub-metering data. Compared to

traditional NILM methods, advantages of DNN-based NILM

approaches include automatic feature extraction from power

readings and linearity between computational complexity and

appliance amount [4]. However, the promising performance of

the aforementioned DNN-based methods relies on a large

amount of sub-metering data from the target set for training [4].

Since such data collection may last for months or even years

[4], it is neither user-friendly nor economical in practice.

Alternatively, transfer learning concepts are proposed, where

transferable networks can be trained on a source (seen) data set

and applied to the load disaggregation task on a target (unseen)

data set [2]. Depending on whether network fine-tuning is

required, transfer learning can be classified as few-shot learning

(FSL) and zero-shot learning (ZSL) [30]. For fine-tuning in

FSL, a small amount of labeled data from the target set is still

required [30]. However, when labels are unable to be captured

from the target data sets, ZSL offers proper solutions. In [31],

ZSL achieves tiny performance drop compared to baseline

when it is employed in load disaggregation by both GRU and

CNN networks, showing transferability across data sets.

However, in ZSL, it is difficult to generalize networks between

data sets with different load characteristics and operating

patterns of appliances. The same as ZSL, self-supervised

learning (SSL) requires no labeled data from the target set. SSL

is an efficient way to extract universal features from large-scale

unlabeled data, contributing to robustness enhancement [32],

thus it performs well in image processing and speech

recognition [33]. To the best of our knowledge, SSL has not

been used to solve NILM problem.

Driven by such research gaps, in this paper, SSL is applied

to two state-of-the-art NILM algorithms based on CNN and

GRU, as S2p [9] and Bi-GRU [10]. For performing NILM, a

self-supervised pretext task training is initially carried out for

learning features from the aggregate power readings from the

unlabeled data in the target set. Then the pre-trained network is

fine-tuned in the supervised downstream task training based on

the labeled data from the source set for transferring the pre-

learned knowledge to load disaggregation. After pre-training

and fine-tuning, the network can be applied to load

disaggregation for target sites. The proposed method is

validated on the real-world data sets at 1-min granularity, in the

scenarios designed for the same data set or across various data

sets. The contributions of this paper are clarified as follows:

● SSL is applied to load disaggregation based on deep

learning without sub-metering on the target set, by

setting a pretext task for network pre-training on

unlabeled data from the target set with fine-tuning.

● Experiments are carried out for all combinations of

two state-of-the-art DNN-based NILM methods (S2p

[9] and Bi-GRU [10]) and learning frameworks (SSL

with various fine-tuning ways and ZSL), on three real-

world data sets;

● Six cases differing in data selection are designed for

performance evaluation on the data across houses or

sets, showing SSL generally outperforms in various

metrics and energy consumption estimation results,

with comparable training time cost.

The rest of this paper is organized as follows: in Section II,

the NILM formulation is clarified, followed by introducing the

preliminaries for NILM neural networks and SSL; The

methodology of SSL for NILM is explained in Section III;

Section IV contains data sets, evaluation metrics, and

experimental settings, followed by experimental results with

discussion illustrated in Section V; eventually, the conclusion

is drawn and the future work is prospected in Section VI.

II. PRELIMINARIES

In this section, we first formulate the NILM problem and

then clarify seq2seq and seq2point concepts, followed by an

introduction for two seq2point network architectures. Finally,

the overall structure of SSL is demonstrated.

A. NILM Problem Formulation

Assuming that the aggregate power reading measured in a

household at time index

[1, ]tT

, where

refers to the

total number of samples. Then the simultaneous power

consumed by appliance

to be disaggregated is denoted

. The measurement noise is denoted by

, usually

regarded as Gaussian distributed [7]. Then the total load power

for a household can be expressed as:

t t t

y x e



 ()

Thus, for each time index

, NILM problem is to estimate

m

, given the aggregate power

. When applying machine

learning or deep learning to NILM, it will become a regression

or classification problem [7].

B. Sequence-to-sequence vs. Sequence-to-point NILM

Frameworks

NILM can be carried out via neural networks with a seq2seq

or seq2point framework [24]. In a seq2seq NILM solution, for

each appliance, a network learns the non-linear regression

between sequences with the same time stamps, referring to the

aggregate and appliance-level power. For an arbitrary aggregate

power sequence

covering time instance

, the power

consumed by appliance

is predicted by the network,

thus can be finalized as the average value of all such predictions

[9]. Unlike the seq2seq framework, the seq2point framework

predicts the appliance-level power consumed at only one point

of each sliding window iteratively. The inputs and outputs of

both seq2seq and seq2point frameworks in a NILM task are

illustrated in Fig.1.

Sliding

window

Mains power

DNN

...

Ending element in

corresponding

sequence

DNN

Corresponding

appliance-level

power sequence

...

DNN

Midpoint element

in corresponding

sequence

...

Mains power

Sliding

window

Sliding

window

(a) seq2seq (b) S2p in [9] (c) Bi-GRU in [10]

Fig. 1. Examples for seq2seq and seq2point frameworks.

Note that seq2point frameworks are demonstrated in Fig. 1

(b) and (c) on two architectures, as S2p proposed in [9] and Bi-

GRU proposed in [10], respectively. Compared to the seq2seq

framework, the seq2point framework emphasizes the

representational power at one element and eases the prediction

task. Then, S2p and Bi-GRU are introduced in details.

1) S2p: The utilization of S2p in NILM is based on the

assumption that the midpoint of each sliding window acts as its

non-linear regression representation. Namely, S2p makes full

use of the past and future information to infer the midpoint, as

shown in Fig. 1 (b).

For a defined neural network

, the input is a power

sequence denoted by

:1t t W+−

segmented by a sliding window

from the aggregate, where

is time index and the window size

is set to an odd number. Thus, by mapping each sequence

:1t t W+−

to the power



consumed by appliance

( 1) / 2



= + −tW

, the entire power signal

for appliance

can be predicted. Such model can be formulated as:

()

t t W



+−

=+y

 ()

where



is W-dimensional Gaussian random noise. Besides, the

loss function in the network training is formulated:

1log ( | , )

TW m

p t t W p

L p x



−+

+−

=y

 ()

where



is a set of network parameters.

The CNN-based architecture of S2p is illustrated in Fig. 2 (a)

containing five convolutional layers and one dense layer. In

each iteration, the input signal refers to an n-length sliding

window for aggregate measurements. Then, five convolutional

layers are employed for feature extraction through an activation

function called ReLU. Eventually, the feature maps are

flattened and fed to a dense layer, and an appliance-level power

corresponding to the midpoint of the input window is obtained.

It is claimed in [29] that S2p achieves performance

improvement against seq2seq framework on the same network

architecture.

2) Bi-GRU: Unlike S2p in Fig. 1 (b), the power consumed by

each appliance at time index

1tW+−

is mapped by the pre-

defined sequence

:1t t W+−

in Bi-GRU as historical aggregate

measurements, shown in Fig. 1 (c). That is, as window sliding,

power prediction per appliance can be obtained from only the

past information, which is applicable for real-time load

disaggregation. Moreover, GRU is beneficial from less memory

occupancy and fewer parameters than other network

architectures such as LSTM [28]. The architecture of Bi-GRU

is demonstrated in Fig. 2 (b).

As shown in Fig. 2 (b), after each aggregate power sequence

is input to the network, a convolutional layer is used for feature

extraction. Then two Bi-GRU layers are applied to enhance the

memory for the data patterns based on the extracted features,

followed by a dense layer as in the S2p network. Note that

dropout performs overfitting prevention for such layers. The

Conv. layer

Bi-GRU layer1

Dense layer

Output

Bi-GRU layer2

Conv. layer1

Conv. layer2

Conv. layer3

Conv. layer4

Conv. layer5

Flatten

Dense layer

Output

Dropout

(a) S2p (b) Bi-GRU

Fig. 2. The architectures for S2p and Bi-GRU.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Non-intrusiveLoadMonitoringbasedonSelf-supervisedLearningShuyiChen,StudentMember,IEEE,BochaoZhao,Member,IEEE,MingjunZhong,Member,IEEE,WenpengLuan*,SeniorMember,IEEE,andYixinYu,LifeSeniorMember,IEEEAbstract—Deeplearningmodelsfornon-intrusiveloadmonitoring(NILM)tendtorequirealargeamountoflabeleddatafo...

展开>> 收起<<

Non-intrusive Load Monitoring b ased on Self- supervised Learning Shuyi Chen Student Member IEEE Bochao Zhao Member IEEE Mingjun Zhong Member IEEE Wenpeng.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Non-intrusive Load Monitoring b ased on Self- supervised Learning Shuyi Chen Student Member IEEE Bochao Zhao Member IEEE Mingjun Zhong Member IEEE Wenpeng

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: