approximators [
8
]. A deep learning model replaces traditional handcrafted features with trainable layers, which leads to
better performance and avoids saturation when applied to large datasets [9].
Due to the dependence of supervised learning algorithms on labeled datasets, its capabilities are limited in the digital
domain, where plenty of unlabeled data is accessible. Meanwhile, unsupervised algorithms that train with unlabeled
data are more effective for shallow architectures [10].
Most processes in the industry run smoothly, and faulty samples are rare in the field. Because of this, normal samples
are far more common than faulty samples in the industrial process. There are relatively few faulty samples, making
implementing traditional fault diagnosis methods challenging. The fault diagnosis domain defines such problems as
imbalances in class. Such a problem has been the subject of numerous research studies.
Various sampling or generating procedures help balance the class distribution in the data preprocessing phase. In general,
undersampling [
11
], and oversampling [
12
] often lead to overfitting and underfitting without guidance or indication.
Data generation [
13
] can also be unreliable from being practical since it is a novel data-level technique. In addition,
Several conditions must be met for Cleaning-Resampling [
14
]. There are many domains in which cost-sensitive learning
can be applied [
15
]. Nevertheless, it requires experts in the domain to provide the cost matrix in the early stages,
which is rarely possible. As a result of developing new loss functions, many recent algorithm-level approaches have
been proposed, such as FocalLoss [
16
]. The class-imbalance problem in the fault diagnosis domain can be effectively
addressed with hybrid approaches. Hybrid approaches are the combination of data-level techniques and algorithm-level
methods [17].
All imbalanced industrial process datasets cannot be treated with these methods. They are also sensitive to outliers,
which makes their performance fluctuate. Consequently, usability is poor due to the requirement of technical expertise
in the design of the cost matrix. In other words, these methods cannot be adapted to complex processes without expert
knowledge and presumptions, so they are not universal and not adaptable.
A strategic approach is needed to mitigate the above problems. Furthermore, due to human error, the labels applied by
experts to the data from an actual steam turbine may not be reliable. Moreover, a less-knowledgeable algorithm will be
used since the label of the data is uncertain. Reinforcement learning algorithms may solve this problem.
Based on the reward-based sequential decision-making process, RL is a branch of machine learning that efficiently
and automatically learns and adapts to the environment to find the optimal response to any changes. It is essential to
understand that the recommendation process in RL-based recommendation systems is treated as a time-based dynamic
interaction between the user and the recommendation agent. As soon as the recommendation system recommends an
item to a user, a positive reward will be assigned if the user expresses an interest in it (through clicking or viewing, for
example) [18].
Using supervised and unsupervised algorithms may expose the user to many problems. In addition, many of these
algorithms have difficulties solving unbalanced data problems. Additionally, we are dealing with data that has uncertainty
in the labels offered by experts. Due to all these reasons, we designed a reinforcement learning-based recommender
system. We have also considered updating the reinforcement learning policy to address the imbalance and uncertainty
in labels. A daily data collection schedule for the steam turbine was followed, and the data collection days were
indicated. In order to reach an optimal policy, we begin with analyzing the labels from the first day’s data, then update
the algorithm again with the data of the following day’s data. The discussion will continue until the last day, after which
we will come up with a conclusion.
In this paper, we describe the main contributions of our framework, referred to as DQLAP:
•
Building a recommender system using reinforcement learning. Imbalances in data will be handled by this
method. Additionally, some problems mentioned with supervised and unsupervised methods do not appear.
Our approach enables the expert to make informed decisions without relying on feature engineering.
•
Considering the property of transferability and the regularly updated policy, it can give accurate performance
based on less data and provide a forecast for the upcoming day.
•
Analyzing the system’s performance by comparing it to the declared labels by an independent expert; Since
the declared labels cannot be relied upon.
2 Prior Art
Machine learning (ML) is an emerging approach to fault diagnosis that utilizes artificial intelligence (AI). In fault
diagnosis, artificial neural networks (ANN) [
19
], [
20
], support vector machines (SVM) [
21
], and extreme learning
machines (ELMs) [
22
] have become widely used and effective. In terms of fault detection, these traditional approaches
2