DQLAP D EEPQ-L EARNING RECOMMENDER ALGORITHM WITH UPDATE POLICY FOR A REAL STEAM TURBINE SYSTEM M.H. Modirrousta

2025-04-26 0 0 535.32KB 12 页 10玖币

侵权投诉

DQLAP: DEEP Q-LEARNING RECOMMENDER ALGORITHM

WITH UPDATE POLICY FOR A REAL STEAM TURBINE SYSTEM

M.H. Modirrousta

Fault detection and Identiﬁcation LAB (FDI)

K.N.Toosi University of Technology

Tehran, Iran

mohammadbc@email.kntu.ac.ir

M. Aliyari Shoorehdeli

Faculty of Electrical Engineering

K.N.Toosi University of Technology

Tehran, Iran

aliyari@kntu.ac.ir

M. Yari

Faculty of Mechatronic Engineering

K.N.Toosi University of Technology

Tehran, Iran

yari.mostafa@mapnaec.com

A. Ghahremani

Faculty of Mechanical Engineering

K.N.Toosi University of Technology

Tehran, Iran

ghahremani.arash@mapnaec.com

ABSTRACT

In modern industrial systems, diagnosing faults in time and using the best methods becomes more

and more crucial. It is possible to fail a system or to waste resources if faults are not detected or are

detected late. Machine learning and deep learning have proposed various methods for data-based fault

diagnosis, and we are looking for the most reliable and practical ones. This paper aims to develop a

framework based on deep learning and reinforcement learning for fault detection. We can increase

accuracy, overcome data imbalance, and better predict future defects by updating the reinforcement

learning policy when new data is received. By implementing this method, we will see an increase

in all evaluation metrics, an improvement in prediction speed, and

in all evaluation

metrics compared to typical backpropagation multi-layer neural network prediction with similar

parameters.

Keywords Deep Learning ·Reinforcement learning ·Fault detection ·Update policy.

1 Introduction

Higher reliability is necessary as industrial systems become increasingly specialized and more costly. Detection and

analysis errors may result in a decline in performance or even malfunctioning the measuring equipment. The intelligent

production of industry 4.0 is based on the use of various new intelligent technologies [

], control [

], and quality

prediction [3].

It uses easy-to-measure variables to identify faults from normal process data during the data-driven analysis of faults.

It has been extensively proven that data-driven fault diagnosis methods provide ﬂexibility, simplicity, and low cost

for fault diagnosis. Recent years have developed various fault diagnosis methods based on machine learning [

], [

Today’s modern systems heavily rely on the analysis of data and artiﬁcial intelligence (AI). The use of artiﬁcial neural

networks (ANN), machine learning, deep learning, and fuzzy logic in analyzing data is widespread for monitoring, fault

detection, and other management functions. Several artiﬁcial intelligence systems have detected and found faults in

industrial systems [6].

It has been demonstrated that classical ML algorithms generally produce easy-to-understand models with substantial

mappings. However, their performance saturates as dataset sizes increase. Digitalization and speed of generating

data, coupled with ML algorithms that have limitations in handling large datasets, led to the development of deep

learning (DL) architectures [

]. The Deep Learning architecture is constructed of simple mappings that are general

arXiv:2210.06399v1 [cs.LG] 12 Oct 2022

approximators [

]. A deep learning model replaces traditional handcrafted features with trainable layers, which leads to

better performance and avoids saturation when applied to large datasets [9].

Due to the dependence of supervised learning algorithms on labeled datasets, its capabilities are limited in the digital

domain, where plenty of unlabeled data is accessible. Meanwhile, unsupervised algorithms that train with unlabeled

data are more effective for shallow architectures [10].

Most processes in the industry run smoothly, and faulty samples are rare in the ﬁeld. Because of this, normal samples

are far more common than faulty samples in the industrial process. There are relatively few faulty samples, making

implementing traditional fault diagnosis methods challenging. The fault diagnosis domain deﬁnes such problems as

imbalances in class. Such a problem has been the subject of numerous research studies.

Various sampling or generating procedures help balance the class distribution in the data preprocessing phase. In general,

undersampling [

], and oversampling [

] often lead to overﬁtting and underﬁtting without guidance or indication.

Data generation [

] can also be unreliable from being practical since it is a novel data-level technique. In addition,

Several conditions must be met for Cleaning-Resampling [

]. There are many domains in which cost-sensitive learning

can be applied [

]. Nevertheless, it requires experts in the domain to provide the cost matrix in the early stages,

which is rarely possible. As a result of developing new loss functions, many recent algorithm-level approaches have

been proposed, such as FocalLoss [

]. The class-imbalance problem in the fault diagnosis domain can be effectively

addressed with hybrid approaches. Hybrid approaches are the combination of data-level techniques and algorithm-level

methods [17].

All imbalanced industrial process datasets cannot be treated with these methods. They are also sensitive to outliers,

which makes their performance ﬂuctuate. Consequently, usability is poor due to the requirement of technical expertise

in the design of the cost matrix. In other words, these methods cannot be adapted to complex processes without expert

knowledge and presumptions, so they are not universal and not adaptable.

A strategic approach is needed to mitigate the above problems. Furthermore, due to human error, the labels applied by

experts to the data from an actual steam turbine may not be reliable. Moreover, a less-knowledgeable algorithm will be

used since the label of the data is uncertain. Reinforcement learning algorithms may solve this problem.

Based on the reward-based sequential decision-making process, RL is a branch of machine learning that efﬁciently

and automatically learns and adapts to the environment to ﬁnd the optimal response to any changes. It is essential to

understand that the recommendation process in RL-based recommendation systems is treated as a time-based dynamic

interaction between the user and the recommendation agent. As soon as the recommendation system recommends an

item to a user, a positive reward will be assigned if the user expresses an interest in it (through clicking or viewing, for

example) [18].

Using supervised and unsupervised algorithms may expose the user to many problems. In addition, many of these

algorithms have difﬁculties solving unbalanced data problems. Additionally, we are dealing with data that has uncertainty

in the labels offered by experts. Due to all these reasons, we designed a reinforcement learning-based recommender

system. We have also considered updating the reinforcement learning policy to address the imbalance and uncertainty

in labels. A daily data collection schedule for the steam turbine was followed, and the data collection days were

indicated. In order to reach an optimal policy, we begin with analyzing the labels from the ﬁrst day’s data, then update

the algorithm again with the data of the following day’s data. The discussion will continue until the last day, after which

we will come up with a conclusion.

In this paper, we describe the main contributions of our framework, referred to as DQLAP:

•

Building a recommender system using reinforcement learning. Imbalances in data will be handled by this

method. Additionally, some problems mentioned with supervised and unsupervised methods do not appear.

Our approach enables the expert to make informed decisions without relying on feature engineering.

•

Considering the property of transferability and the regularly updated policy, it can give accurate performance

based on less data and provide a forecast for the upcoming day.

•

Analyzing the system’s performance by comparing it to the declared labels by an independent expert; Since

the declared labels cannot be relied upon.

2 Prior Art

Machine learning (ML) is an emerging approach to fault diagnosis that utilizes artiﬁcial intelligence (AI). In fault

diagnosis, artiﬁcial neural networks (ANN) [

], [

], support vector machines (SVM) [

], and extreme learning

machines (ELMs) [

] have become widely used and effective. In terms of fault detection, these traditional approaches

have some limitations. The statistical signiﬁcance of a fault requires more examples to be collected for it to be

signiﬁcant. This is because few new examples can only do so marginally. A fault that occurs at an early stage can also

be challenging to understand because there is a lack of accurate data. Obtaining several valid fault data within a short

timeframe is difﬁcult because faults are complex, unstable, and unpredictable. In order to diagnose faults, they rely on

hand-crafted feature extractors [23] that obtain some time- and frequency-domain features.

In the ﬁeld of machine fault diagnosis, deep learning (DL) [

], which has a strong ability to learn features, has gained

signiﬁcant consideration recently [

], [

]. With limited sampling data, some knowledge-driven methods may also be

applied when minority samples are scarce. Li et al. [

] propose a fast and accurate few-shot bearing fault diagnosis

method (MLFD) through meta-learning. Zhuo et al. [

] present a generative model with fault attribute space using the

auxiliary triplet loss.

The model must be trained with a large amount of data to reach high accuracy in deep learning. A low level of accuracy

or insufﬁcient labeled data reduces the performance of supervised learning.

Deep reinforcement learning (DRL) integrates the capabilities of DL [

] with the decision-making abilities of rein-

forcement learning (RL) [

]. It has achieved great success in gaming, control, and interaction systems [

], [

[

], [

]. DRL is rarely used in classiﬁcation tasks because its approach attempts to deal with the sequential

decision problem. Using the classiﬁcation Markov decision process (CMDP), Wiering et al. Described a classiﬁcation

problem as a sequential decision-making process. The resulting MLP network was superior to a typical backpropagation

MLP network. With DRL, Lin et al. resolved imbalanced classiﬁcation by converting imbalanced classiﬁcation into

a sequenced decision-making problem [

]. Fan et al. designed the DiagSelect framework to perform intelligent

imbalanced sample selection using RL to obtain better diagnosis performance autonomously [

]. Wang et al. outlined

a new methodology for fault diagnosis based on time-frequency representations (TFR) and dynamic response mappings

(DRLs) [36].

Our insights from these successful experiments suggest that we should investigate a comprehensive approach from the

viewpoint of DRL as a way to solve the previously mentioned shortcomings of fault diagnosis methods.

3 Background

3.1 Reinforcement Learning

As a reward-based system, reinforcement learning (RL) strives to maximize the rewards of the interactions between

an agent and its environment [

]. In response to feedback from the environment, the agent learns about its behavior,

which is then attempted to improve due to its actions. Reward learning problems can be solved by devising policies

(e.g., mapping between states and actions) that maximize the accumulation of rewards. In a reinforcement learning

problem, ﬁve important entities are involved: the state, the action, the reward, the policy, and the value. Generally,

reinforcement learning problems are modeled with Markov decision processes.

3.2 Q-Learning

During reinforcement learning, agents learn policies as they transition between states using the Q-learning algorithm

[

]. In order to determine the optimal set of policies, we must assess all possible actions related to the different states

of the agent. The value of this algorithm is maintained continuously by updating the Q-value according to the next state

of the algorithm and the greedy action. A Q-function is essentially a function that accepts various arguments, such as

state vectors, action vectors, rewards vectors, and learning rates. The discount factor is then calculated for the Q-value.

However, because Q-learning-based systems require high dimensionality, they do not perform well in large state spaces.

3.3 Deep Q Network (DQN)

Several techniques have been developed to deal with signiﬁcant space state problems, including Deep Q-Network

(DQN), a networked Q-learning algorithm combining reinforcement learning with a class of artiﬁcial neural networks

known as Deep Q Networks. Depending on the outcome of an action, the environment grants a positive reward or

penalizes it with a negative reward. In addition to updating the weights of the DNN, the reward is also used to improve

the performance of this machine learning algorithm.

In practice, when playing Atari games, the DQN algorithm has shown to be able to achieve impressive results. The

purpose of this algorithm is to integrate DNNs with Q-Learning algorithms to determine self-aware decision policies

used for mapping relations between states Sand actions Asuch that A=π(S)[38].

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DQLAP:DEEPQ-LEARNINGRECOMMENDERALGORITHMWITHUPDATEPOLICYFORAREALSTEAMTURBINESYSTEMM.H.ModirroustaFaultdetectionandIdenticationLAB(FDI)K.N.ToosiUniversityofTechnologyTehran,Iranmohammadbc@email.kntu.ac.irM.AliyariShoorehdeliFacultyofElectricalEngineeringK.N.ToosiUniversityofTechnologyTehran,Iranaliy...

展开>> 收起<<

DQLAP D EEPQ-L EARNING RECOMMENDER ALGORITHM WITH UPDATE POLICY FOR A REAL STEAM TURBINE SYSTEM M.H. Modirrousta.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DQLAP D EEPQ-L EARNING RECOMMENDER ALGORITHM WITH UPDATE POLICY FOR A REAL STEAM TURBINE SYSTEM M.H. Modirrousta

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: