Improving Convolutional Neural Networks for Fault Diagnosis by Assimilating Global Features Saif S. S. Al-Wahaibi1and Qiugang Lu1

2025-05-08 2 0 3.85MB 6 页 10玖币

侵权投诉

Improving Convolutional Neural Networks for Fault Diagnosis by

Assimilating Global Features

Saif S. S. Al-Wahaibi1and Qiugang Lu1,†

Abstract— Deep learning techniques have become prominent

in modern fault diagnosis for complex processes. In particular,

convolutional neural networks (CNNs) have shown an appealing

capacity to deal with multivariate time-series data by con-

verting them into images. However, existing CNN techniques

mainly focus on capturing local or multi-scale features from

input images. A deep CNN is often required to indirectly

extract global features, which are critical to describe the

images converted from multivariate dynamical data. This paper

proposes a novel local-global CNN (LG-CNN) architecture

that directly accounts for both local and global features for

fault diagnosis. Speciﬁcally, the local features are acquired by

traditional local kernels whereas global features are extracted

by using 1D tall and fat kernels that span the entire height

and width of the image. Both local and global features are

then merged for classiﬁcation using fully-connected layers. The

proposed LG-CNN is validated on the benchmark Tennessee

Eastman process (TEP) dataset. Comparison with traditional

CNN shows that the proposed LG-CNN can greatly improve the

fault diagnosis performance without signiﬁcantly increasing the

model complexity. This is attributed to the much wider local

receptive ﬁeld created by the LG-CNN than that by CNN. The

proposed LG-CNN architecture can be easily extended to other

image processing and computer vision tasks.

I. INTRODUCTION

Deep learning (DL) has attracted increasing attention for

fault detection and diagnosis (FDD) over the last decade.

Primarily, the strength of DL lies in its ability to utilize the

extensive data present in industrial systems to establish com-

plex models for distinguishing anomalies, diagnosing faults,

and forecasting without needing much prior knowledge [1].

Among various DL methods for FDD, convolutional neural

networks (CNNs) have shown great promise due to their eﬃ-

ciency in capturing spatiotemporal correlations and reduced

trainable parameters from weight sharing [2].

Originally developed for image classiﬁcation, CNNs entail

neural networks consisting of convolutions with local kernels

and pooling operations to extract features from images [2],

[3]. They have also been used in FDD to handle time-series

data. Janssens 𝑒𝑡 𝑎𝑙. [4] made one of the earliest attempts at

using CNNs for fault diagnosis. The authors highlighted the

capability of CNNs to learn new features from input images

converted from time-series data to better classify faults in

rotating machinery. Further developments on CNN for FDD

can be referred to [5]–[10]. Note that diﬀerent from images

in computer vision, the images converted from time-series

*This work was supported by the Texas Tech University

1S. Al-Wahaibi and Q. Lu are with the Department of Chemical En-

gineering, Texas Tech University, Lubbock, TX 79409, USA. Email:

Saif.Al-Wahaibi@ttu.edu; Jay.Lu@ttu.edu

†Corresponding author: Q. Lu

data often possess strong non-localized features. To this end,

kernels of diﬀerent sizes, i.e., multi-scale CNN, have been

used in [5] to cover local receptive ﬁelds (LRF) with varying

resolutions to improve the learned features. Other techniques

such as global average pooling has been employed in [10],

[11] to maintain the integrity of information pertaining to

global correlations. However, these approaches either can

only directly capture wider local features (e.g., multi-scale

CNN) or lack learnable parameters in acquiring global corre-

lations (e.g., global average pooling). Thus, they often need

to construct deep networks to capture the global features that

are crucial in multivariate time-series data for FDD [12]. In

addition, most research studies mentioned above are mainly

concerned with 1D or low-dimensional time-series data such

as the wheel bearing data [6]. Research on extending CNN

for FDD for high-dimensional multivariate time-series data,

e.g., those obtained from chemical processes, still remains

limited. One exemplary work is reported in [7] where deep

CNNs are constructed to diagnose faults from the Tennessee

Eastman process (TEP). Gramian angular ﬁeld is used in

[8] to convert multi-dimensional data into multi-channel 2D

images to apply CNN for fault diagnosis. Nevertheless, these

works still cannot directly extract global features from the

multivariate time-series data or equivalently, the formed 2D

images. Instead, they also rely on constructing deep CNNs to

expand the LRF to the entire image for capturing global cor-

relations. As a result, the number of trainable parameters can

easily go beyond several millions, causing signiﬁcant training

complexity [7], [8]. Hence, there is a pressing demand

for developing a novel CNN-based FDD framework that

adequately extracts global spatiotemporal correlations while

maintaining a reasonable number of learnable parameters for

multivariate time-series datasets.

This paper proposes a novel local-global CNN (LG-CNN)

framework for fault diagnosis for complex dynamical pro-

cesses. The proposed framework converts multivariate time-

series data into images and simultaneously collects both

global and local features to classify faults. Local correlations

are captured using typical local square kernels, whereas

global correlations are integrated using 1D tall (temporal)

and fat (spatial) kernels that span the entire height and width

of the image. The spatial and temporal global features ex-

tracted from the tall and fat kernels are then cohered together

to acquire global spatiotemporal patterns in the images. Such

global spatiotemporal features are then concatenated with

local features extracted with the typical square kernels to

merge the information prior to fault diagnosis. The developed

LG-CNN is validated with the TEP data and simulation

arXiv:2210.01077v1 [cs.CV] 3 Oct 2022

results show that the incorporation of global features into

CNN can signiﬁcantly enhance the diagnosis performance

without signiﬁcantly increasing the model complexity.

This paper is organized as follows. Section II presents

fundamentals about traditional CNNs. The proposed LG-

CNN architecture is elaborated in Section III, followed by

a case study of fault diagnosis for TEP in Section IV. The

conclusions are given in Section V.

II. PRELIMINARIES

In this section, we brieﬂy introduce the main components

in a typical CNN including convolutional, pooling, and fully-

connected (FC) layers. In addition, batch normalization (BN)

is introduced to mitigate the internal covariance shift issues.

A. Convolutional Layers

In a convolutional layer, a kernel ﬁlter slides across an in-

put feature map where an aﬃne transformation is conducted

at every slide location such that:

𝐂𝑙

𝑗=𝑏𝑙

𝑗+∑𝐼𝑙−1

𝑖=1 𝐗𝑙−1

𝑖∗𝐊𝑙

𝑖,𝑗 , 𝑗 = 1,2,…, 𝐼𝑙,(1)

where 𝐊𝑙

𝑖,𝑗 ∈ℝ𝑘×𝜂is the kernel of size 𝑘×𝜂in layer

𝑙∈ {1,2,…, 𝐿}and channel 𝑖∈ {1,2,…, 𝐼𝑙−1},𝐗𝑙−1

𝑖∈

ℝ𝑛×𝑚is the input feature map of size 𝑛×𝑚to layer 𝑙.𝐿

is the number of layers and 𝐼𝑙is the number of channels

in the 𝑙-th layer. C𝑙

𝑗is the 𝑗-th output map in layer 𝑙after

the convolution and 𝑏𝑙

𝑗is the bias. The symbol ∗represents

the convolution operation. An activation function, such as

the rectiﬁed linear unit (ReLU), is usually applied to C𝑙

𝑗to

add non-linearity. Graphically, the ﬁrst green feature map

in Fig. 1 illustrates a 3×3 convolution. The square patch

of size 3 × 3 in the input image represents the LRF of the

dark green neuron output in the ﬁrst feature map in the top

branch. Thereby, a LRF can be thought of as the “ﬁeld of

view" incorporated in calculating a new feature through the

convolutional operation.

B. Pooling Layers

Pooling operations after convolutional layers often act as a

sub-sampling step to reduce dimensionality while preserving

information. Speciﬁcally, in a localized group of activa-

tions on a feature map, pooling summarizes their responses

through either averaging or maximizing operations [13]. We

use max pooling to each local 𝑠×𝑠region of the input feature

map X𝑙−1

𝑖∈ℝ𝑛×𝑚, and the resultant new feature map is

shown to be [11]

𝐏𝑙

𝑖= (max 𝐗𝑙−1

𝑖,𝑟 )𝑆

𝑟=1,(2)

where 𝐏𝑙

𝑖is the output of the max pooling operation, and 𝑆

is the total number of 𝑠×𝑠regions in X𝑙−1

𝑖.

C. Fully-Connected Layers

After all convolutional and pooling layers, the obtained

feature maps represent the main features learned by a CNN

from an input image. These maps are then ﬂattened into a

vector and passed through FC layers for classiﬁcation or

regression. Speciﬁcally, the output of the 𝑙-th FC layer is

calculated using

𝑎𝑙

𝑧=𝑏𝑙

𝑧+x𝑙−1 ⋅𝝎𝑙

𝑧, 𝑧 = 1,2,…, 𝑍, (3)

where 𝑎𝑙

𝑧is the output of the 𝑧-th neuron, 𝑍is total number

of neurons in the 𝑙-th layer, 𝑏𝑙

𝑧is the bias, x𝑙−1 ∈ℝ𝜁is

the activation vector from the previous layer that contains 𝜁

neurons, 𝝎𝑙

𝑧∈ℝ𝜁is the weight vector associated with neuron

𝑧, and ⋅represents the dot product. To add non-linearity, an

activation function is applied to 𝑎𝑙

𝑧.

D. Batch Normalization

BN accelerates CNN learning by reducing the eﬀects of

internal covariance shift [14]. Layers experience these eﬀects

in the learning process when previous layers update their

weights and biases resulting in a need to continuously adapt

to these changes, and therefore, hinder the learning process.

In a 2D-CNN, these eﬀects are mitigated by normalizing the

activations from a preceding layer:

X𝑙−1

𝑖=X𝑙−1

𝑖−𝔼[X𝑙−1

𝑖]

√𝕍 𝕒𝕣 [X𝑙−1

𝑖]

,(4)

where ̃

X𝑙−1

𝑖∈ℝ𝑛×𝑚is the 𝑖-th channel in the (𝑙− 1)-th layer,

𝔼[⋅]is the expectation over the training batch and all pixel

locations, and 𝕍 𝕒𝕣 [⋅]is the variance. Then, representation is

restored to the layer by the aﬃne computation [14]

Y𝑙

𝑖=̃

X𝑙−1

𝑖𝛼𝑙

𝑖+𝛽𝑙

𝑖,(5)

where Y𝑙

𝑖∈ℝ𝑛×𝑚is the BN output for channel 𝑖in layer 𝑙,

and 𝛼𝑙

𝑖and 𝛽𝑙

𝑖are learnable parameters for each channel.

III. METHODOLOGY

The proposed LG-CNN shown in Fig. 1 consists of multi-

scale convolutions to extract both local and global features.

In addition, we use 1 × 1 convolution [6], max pooling,

and strided convolution [15] for dimension reduction with

minimum information loss.

A. Local Correlations

The top branch in LG-CNN (green color in Fig. 1) shows

the usage of traditional 3 × 3 kernels to extract local features

from the input image. Note that here we apply BN steps

before the ReLU activations. Further, padding is added to

ensure that the output has the same dimension as the input

image. In addition, to reduce dimensions, 1×1 convolution

is conducted to squeeze the number of channels after the

convolution. Note that traditional 3×3 kernels usually capture

a small LRF region [3], [9]. The overall mapping from

the input image to the extracted feature maps after 1×1

convolutions is abstracted as:

𝚿=𝑓𝜽𝐿(𝐗0) ∈ ℝ𝑐𝐿×𝑛×𝑚,(6)

where 𝚿is the feature maps extracted from the top branch,

𝑓𝜃𝐿(⋅)represents all operations from input images to the

extracted feature maps in the top branch, with a collection of

trainable parameters into 𝜃𝐿, and 𝑐𝐿(subscript 𝐿represents

“local”) is the channel number in the extracted feature maps.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ImprovingConvolutionalNeuralNetworksforFaultDiagnosisbyAssimilatingGlobalFeaturesSaifS.S.Al-Wahaibi1andQiugangLu1;£AbstractDeeplearningtechniqueshavebecomeprominentinmodernfaultdiagnosisforcomplexprocesses.Inparticular,convolutionalneuralnetworks(CNNs)haveshownanappealingcapacitytodealwithmultivari...

展开>> 收起<<

Improving Convolutional Neural Networks for Fault Diagnosis by Assimilating Global Features Saif S. S. Al-Wahaibi1and Qiugang Lu1.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Improving Convolutional Neural Networks for Fault Diagnosis by Assimilating Global Features Saif S. S. Al-Wahaibi1and Qiugang Lu1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: