DQLAP D EEPQ-L EARNING RECOMMENDER ALGORITHM WITH UPDATE POLICY FOR A REAL STEAM TURBINE SYSTEM M.H. Modirrousta

2025-04-26 0 0 535.32KB 12 页 10玖币
侵权投诉
DQLAP: DEEP Q-LEARNING RECOMMENDER ALGORITHM
WITH UPDATE POLICY FOR A REAL STEAM TURBINE SYSTEM
M.H. Modirrousta
Fault detection and Identification LAB (FDI)
K.N.Toosi University of Technology
Tehran, Iran
mohammadbc@email.kntu.ac.ir
M. Aliyari Shoorehdeli
Faculty of Electrical Engineering
K.N.Toosi University of Technology
Tehran, Iran
aliyari@kntu.ac.ir
M. Yari
Faculty of Mechatronic Engineering
K.N.Toosi University of Technology
Tehran, Iran
yari.mostafa@mapnaec.com
A. Ghahremani
Faculty of Mechanical Engineering
K.N.Toosi University of Technology
Tehran, Iran
ghahremani.arash@mapnaec.com
ABSTRACT
In modern industrial systems, diagnosing faults in time and using the best methods becomes more
and more crucial. It is possible to fail a system or to waste resources if faults are not detected or are
detected late. Machine learning and deep learning have proposed various methods for data-based fault
diagnosis, and we are looking for the most reliable and practical ones. This paper aims to develop a
framework based on deep learning and reinforcement learning for fault detection. We can increase
accuracy, overcome data imbalance, and better predict future defects by updating the reinforcement
learning policy when new data is received. By implementing this method, we will see an increase
of
3%
in all evaluation metrics, an improvement in prediction speed, and
3%
-
4%
in all evaluation
metrics compared to typical backpropagation multi-layer neural network prediction with similar
parameters.
Keywords Deep Learning ·Reinforcement learning ·Fault detection ·Update policy.
1 Introduction
Higher reliability is necessary as industrial systems become increasingly specialized and more costly. Detection and
analysis errors may result in a decline in performance or even malfunctioning the measuring equipment. The intelligent
production of industry 4.0 is based on the use of various new intelligent technologies [
1
], control [
2
], and quality
prediction [3].
It uses easy-to-measure variables to identify faults from normal process data during the data-driven analysis of faults.
It has been extensively proven that data-driven fault diagnosis methods provide flexibility, simplicity, and low cost
for fault diagnosis. Recent years have developed various fault diagnosis methods based on machine learning [
4
], [
5
].
Today’s modern systems heavily rely on the analysis of data and artificial intelligence (AI). The use of artificial neural
networks (ANN), machine learning, deep learning, and fuzzy logic in analyzing data is widespread for monitoring, fault
detection, and other management functions. Several artificial intelligence systems have detected and found faults in
industrial systems [6].
It has been demonstrated that classical ML algorithms generally produce easy-to-understand models with substantial
mappings. However, their performance saturates as dataset sizes increase. Digitalization and speed of generating
data, coupled with ML algorithms that have limitations in handling large datasets, led to the development of deep
learning (DL) architectures [
7
]. The Deep Learning architecture is constructed of simple mappings that are general
arXiv:2210.06399v1 [cs.LG] 12 Oct 2022
approximators [
8
]. A deep learning model replaces traditional handcrafted features with trainable layers, which leads to
better performance and avoids saturation when applied to large datasets [9].
Due to the dependence of supervised learning algorithms on labeled datasets, its capabilities are limited in the digital
domain, where plenty of unlabeled data is accessible. Meanwhile, unsupervised algorithms that train with unlabeled
data are more effective for shallow architectures [10].
Most processes in the industry run smoothly, and faulty samples are rare in the field. Because of this, normal samples
are far more common than faulty samples in the industrial process. There are relatively few faulty samples, making
implementing traditional fault diagnosis methods challenging. The fault diagnosis domain defines such problems as
imbalances in class. Such a problem has been the subject of numerous research studies.
Various sampling or generating procedures help balance the class distribution in the data preprocessing phase. In general,
undersampling [
11
], and oversampling [
12
] often lead to overfitting and underfitting without guidance or indication.
Data generation [
13
] can also be unreliable from being practical since it is a novel data-level technique. In addition,
Several conditions must be met for Cleaning-Resampling [
14
]. There are many domains in which cost-sensitive learning
can be applied [
15
]. Nevertheless, it requires experts in the domain to provide the cost matrix in the early stages,
which is rarely possible. As a result of developing new loss functions, many recent algorithm-level approaches have
been proposed, such as FocalLoss [
16
]. The class-imbalance problem in the fault diagnosis domain can be effectively
addressed with hybrid approaches. Hybrid approaches are the combination of data-level techniques and algorithm-level
methods [17].
All imbalanced industrial process datasets cannot be treated with these methods. They are also sensitive to outliers,
which makes their performance fluctuate. Consequently, usability is poor due to the requirement of technical expertise
in the design of the cost matrix. In other words, these methods cannot be adapted to complex processes without expert
knowledge and presumptions, so they are not universal and not adaptable.
A strategic approach is needed to mitigate the above problems. Furthermore, due to human error, the labels applied by
experts to the data from an actual steam turbine may not be reliable. Moreover, a less-knowledgeable algorithm will be
used since the label of the data is uncertain. Reinforcement learning algorithms may solve this problem.
Based on the reward-based sequential decision-making process, RL is a branch of machine learning that efficiently
and automatically learns and adapts to the environment to find the optimal response to any changes. It is essential to
understand that the recommendation process in RL-based recommendation systems is treated as a time-based dynamic
interaction between the user and the recommendation agent. As soon as the recommendation system recommends an
item to a user, a positive reward will be assigned if the user expresses an interest in it (through clicking or viewing, for
example) [18].
Using supervised and unsupervised algorithms may expose the user to many problems. In addition, many of these
algorithms have difficulties solving unbalanced data problems. Additionally, we are dealing with data that has uncertainty
in the labels offered by experts. Due to all these reasons, we designed a reinforcement learning-based recommender
system. We have also considered updating the reinforcement learning policy to address the imbalance and uncertainty
in labels. A daily data collection schedule for the steam turbine was followed, and the data collection days were
indicated. In order to reach an optimal policy, we begin with analyzing the labels from the first day’s data, then update
the algorithm again with the data of the following day’s data. The discussion will continue until the last day, after which
we will come up with a conclusion.
In this paper, we describe the main contributions of our framework, referred to as DQLAP:
Building a recommender system using reinforcement learning. Imbalances in data will be handled by this
method. Additionally, some problems mentioned with supervised and unsupervised methods do not appear.
Our approach enables the expert to make informed decisions without relying on feature engineering.
Considering the property of transferability and the regularly updated policy, it can give accurate performance
based on less data and provide a forecast for the upcoming day.
Analyzing the system’s performance by comparing it to the declared labels by an independent expert; Since
the declared labels cannot be relied upon.
2 Prior Art
Machine learning (ML) is an emerging approach to fault diagnosis that utilizes artificial intelligence (AI). In fault
diagnosis, artificial neural networks (ANN) [
19
], [
20
], support vector machines (SVM) [
21
], and extreme learning
machines (ELMs) [
22
] have become widely used and effective. In terms of fault detection, these traditional approaches
2
have some limitations. The statistical significance of a fault requires more examples to be collected for it to be
significant. This is because few new examples can only do so marginally. A fault that occurs at an early stage can also
be challenging to understand because there is a lack of accurate data. Obtaining several valid fault data within a short
timeframe is difficult because faults are complex, unstable, and unpredictable. In order to diagnose faults, they rely on
hand-crafted feature extractors [23] that obtain some time- and frequency-domain features.
In the field of machine fault diagnosis, deep learning (DL) [
7
], which has a strong ability to learn features, has gained
significant consideration recently [
24
], [
25
]. With limited sampling data, some knowledge-driven methods may also be
applied when minority samples are scarce. Li et al. [
26
] propose a fast and accurate few-shot bearing fault diagnosis
method (MLFD) through meta-learning. Zhuo et al. [
27
] present a generative model with fault attribute space using the
auxiliary triplet loss.
The model must be trained with a large amount of data to reach high accuracy in deep learning. A low level of accuracy
or insufficient labeled data reduces the performance of supervised learning.
Deep reinforcement learning (DRL) integrates the capabilities of DL [
7
] with the decision-making abilities of rein-
forcement learning (RL) [
28
]. It has achieved great success in gaming, control, and interaction systems [
29
], [
30
],
[
31
], [
32
], [
33
]. DRL is rarely used in classification tasks because its approach attempts to deal with the sequential
decision problem. Using the classification Markov decision process (CMDP), Wiering et al. Described a classification
problem as a sequential decision-making process. The resulting MLP network was superior to a typical backpropagation
MLP network. With DRL, Lin et al. resolved imbalanced classification by converting imbalanced classification into
a sequenced decision-making problem [
34
]. Fan et al. designed the DiagSelect framework to perform intelligent
imbalanced sample selection using RL to obtain better diagnosis performance autonomously [
35
]. Wang et al. outlined
a new methodology for fault diagnosis based on time-frequency representations (TFR) and dynamic response mappings
(DRLs) [36].
Our insights from these successful experiments suggest that we should investigate a comprehensive approach from the
viewpoint of DRL as a way to solve the previously mentioned shortcomings of fault diagnosis methods.
3 Background
3.1 Reinforcement Learning
As a reward-based system, reinforcement learning (RL) strives to maximize the rewards of the interactions between
an agent and its environment [
28
]. In response to feedback from the environment, the agent learns about its behavior,
which is then attempted to improve due to its actions. Reward learning problems can be solved by devising policies
(e.g., mapping between states and actions) that maximize the accumulation of rewards. In a reinforcement learning
problem, five important entities are involved: the state, the action, the reward, the policy, and the value. Generally,
reinforcement learning problems are modeled with Markov decision processes.
3.2 Q-Learning
During reinforcement learning, agents learn policies as they transition between states using the Q-learning algorithm
[
37
]. In order to determine the optimal set of policies, we must assess all possible actions related to the different states
of the agent. The value of this algorithm is maintained continuously by updating the Q-value according to the next state
of the algorithm and the greedy action. A Q-function is essentially a function that accepts various arguments, such as
state vectors, action vectors, rewards vectors, and learning rates. The discount factor is then calculated for the Q-value.
However, because Q-learning-based systems require high dimensionality, they do not perform well in large state spaces.
3.3 Deep Q Network (DQN)
Several techniques have been developed to deal with significant space state problems, including Deep Q-Network
(DQN), a networked Q-learning algorithm combining reinforcement learning with a class of artificial neural networks
known as Deep Q Networks. Depending on the outcome of an action, the environment grants a positive reward or
penalizes it with a negative reward. In addition to updating the weights of the DNN, the reward is also used to improve
the performance of this machine learning algorithm.
In practice, when playing Atari games, the DQN algorithm has shown to be able to achieve impressive results. The
purpose of this algorithm is to integrate DNNs with Q-Learning algorithms to determine self-aware decision policies
π
used for mapping relations between states Sand actions Asuch that A=π(S)[38].
3
摘要:

DQLAP:DEEPQ-LEARNINGRECOMMENDERALGORITHMWITHUPDATEPOLICYFORAREALSTEAMTURBINESYSTEMM.H.ModirroustaFaultdetectionandIdenticationLAB(FDI)K.N.ToosiUniversityofTechnologyTehran,Iranmohammadbc@email.kntu.ac.irM.AliyariShoorehdeliFacultyofElectricalEngineeringK.N.ToosiUniversityofTechnologyTehran,Iranaliy...

展开>> 收起<<
DQLAP D EEPQ-L EARNING RECOMMENDER ALGORITHM WITH UPDATE POLICY FOR A REAL STEAM TURBINE SYSTEM M.H. Modirrousta.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:535.32KB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注