Convolutional Neural Networks Basic Concepts and Applications in Manufacturing Shengli Jiangy Shiyi Qiny Joshua L. Pulsipherz and Victor M. Zavalay

2025-05-06 0 0 8.4MB 36 页 10玖币

侵权投诉

Convolutional Neural Networks:

Basic Concepts and Applications in Manufacturing

Shengli Jiang†, Shiyi Qin†, Joshua L. Pulsipher‡, and Victor M. Zavala†*

†Department of Chemical and Biological Engineering

University of Wisconsin-Madison, 1415 Engineering Dr, Madison, WI 53706, USA

‡Department of Chemical Engineering

Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213, USA

Abstract

We discuss basic concepts of convolutional neural networks (CNNs) and outline uses in man-

ufacturing. We begin by discussing how different types of data objects commonly encountered in

manufacturing (e.g., time series, images, micrographs, videos, spectra, molecular structures) can

be represented in a ﬂexible manner using tensors and graphs. We then discuss how CNNs use con-

volution operations to extract informative features (e.g., geometric patterns and textures) from the

such representations to predict emergent properties and phenomena and/or to identify anoma-

lies. We also discuss how CNNs can exploit color as a key source of information, which enables

the use of modern computer vision hardware (e.g., infrared, thermal, and hyperspectral cameras).

We illustrate the concepts using diverse case studies arising in spectral analysis, molecule design,

sensor design, image-based control, and multivariate process monitoring.

Keywords: computer vision, convolutional neural networks, manufacturing, images, graphs.

1 Introduction

Manufacturing is seeing an increasing use of real-time sensing and instrumentation technologies that

generate data in the form of images/video (e.g., infrared and thermal), vibration/audio, and other

complex data forms such as chemical spectra and geometrical structures (e.g., 3D printed objects,

synthesized molecules, crystals). Manufacturing is also seeing the increasing use of automation sys-

tems that aim to exploit such data to make decisions (e.g., optimize production and detect anomalies).

Moreover, modern automation systems are being designed to take instructions/targets in the form

of complex data objects (e.g., voice, text, chemical spectra, and molecular structures).

Modern automation systems used in manufacturing embed highly sophisticated computing work-

ﬂows that use tools from data science and machine learning (ML) to extract and interpret actionable

information from complex data streams. Such workﬂows resemble those used in other advanced

*Corresponding Author: victor.zavala@wisc.edu

arXiv:2210.07848v1 [cs.CV] 14 Oct 2022

http://zavalab.engr.wisc.edu

technologies such as autonomous vehicles (e.g., aerial, terrestrial, and aquatic) and robotics. More-

over, such workﬂows begin to resemble human systems in which visual, auditive, tactile, and olfac-

tory signals (data) are routinely used to make decisions. For instance, the human olfactory system

generates signals when exposed to speciﬁc chemical structures and such signals are processed and

interpreted by the brain to detect anomalies. Similarly, the human visual system generates inter-

pretable signals when exposed to objects with speciﬁc geometrical and color features and our audi-

tory system generates interpretable signals when exposed with speciﬁc frequencies. As such, from a

conceptual stand-point, we can see that sensing and data science technologies are enabling increas-

ing convergence between industrial (artiﬁcial) and human (natural) perception and decision-making.

This opens new and exciting opportunities to synergize human and artiﬁcial intelligence with the ul-

timate goal of making manufacturing more efﬁcient, safe, sustainable, and reliable.

In this chapter, we focus on ML technologies that enable information extraction from complex

data sources commonly encountered in manufacturing. Speciﬁcally, we review basic concepts of

convolutional neural networks (CNNs) and outline how these tools can be used to conduct diverse

decision-making tasks of interest in manufacturing. At their core, CNNs use a powerful and ﬂexi-

ble mathematical operation known as convolution to extract information from data objects that are

represented in the form of regular grids (1D vectors, 2D matrices, and high-dimensional tensors)

and irregular grids (2D graphs and high-dimensional hypergraphs). These data representations are

ﬂexible and can be used to encode a wide range of data objects such as audio signals, chemical spec-

tra, molecules, images, and videos. Moreover, such representations can encode multi-channel data,

which allows capturing color and multi-variate inputs (e.g., multi-variate time series and molecular

graphs). CNNs use convolution operations to identify features that can be extracted from the data

that best explain an output; such features are identiﬁed by identifying optimal convolution ﬁlters or

operators, which are the parameters that are learned by the CNN. The learning process of the opera-

tors requires sophisticated optimization algorithms and can be a computationally expensive process.

CNNs is a class of models of an emergent ML ﬁeld known as geometric deep learning, which lever-

ages tools from geometry and topology to represent and process data.

The earliest version of a CNN was proposed in 1980 by Kunihiko Fukushima [21] and was used

for pattern recognition. In the late 1980s, the LeNet model proposed by LeCun et al. introduced

the concept of backward propagation, which streamlined learning computations using optimization

techniques [45]. Although the LeNet model had a simple architecture, it was capable of recognizing

hand-written digits with high accuracy. In 1998, Rowley et al. proposed a CNN model capable of per-

forming face recognition tasks (this work revolutionized object classiﬁcation and detection) [75]. The

complexity of CNN models (and their predictive power) has dramatically expanded with the advent

of parallel computing architectures such as graphics processing units [65]. Modern CNN models for

image recognition include SuperVision [44], GoogLeNet [86], VGG [82], and ResNet [29]. New mod-

els are currently being developed to perform diverse computer vision tasks such as object detection

[69], semantic segmentation [49], action recognition [81], and 3D analysis [37]. Nowadays, CNNs are

routinely used in smartphones (unlock feature based on face recognition) [64].

http://zavalab.engr.wisc.edu

While CNNs were originally developed for computer vision, the grid data representation used

by CNNs is ﬂexible and can be used to process datasets arising in many different applications. For

instance, in the ﬁeld of chemistry, Hirohara and co-workers proposed a matrix representations of

SMILES strings (which encodes molecular topology) by using a technique known as one-hot en-

coding [30]. The authors used this representation to train a CNN that could predict the toxicity of

chemicals; it was shown that the CNN outperformed traditional models based on ﬁngerprints (an

alternative molecular representation). In biology, Xie and co-workers applied CNNs to count and

detect cells from micrographs [92]. More recently, CNNs have been expanded to process graph data

representations, which has greatly expanded their application scope. These types of CNNs (known

as graph neural networks) have been widely used in the context of molecular property predictions

[17,26].

Manufacturing covers a broad space of important products and processes that is virtually impos-

sible to enumerate; in this chapter, we focus our attention on applications of CNNs to examples of

potential relevance to chemical and biological manufacturing (which cover domains such as phar-

maceuticals, agricultural products, food products, consumer products, petrochemicals, and materi-

als). We also highlight that these manufacturing sectors are seeing an emergent use of autonomous

platforms that enable ﬂexible and high-throughput experimentation and/or on-demand production

production; as such, the concepts discussed can be applicable in such context. We provide speciﬁc

case studies that we believe provide representative examples on how CNNs can be used to facilitate

decision-making in manufacturing. Speciﬁcally, we show how to use CNNs to i) decode multivariate

time series data; ii) decode complex signals generated from microscopy and ﬂow cytometry to detect

contaminants in air and solution; iii) decode real-time ATR-FTIR spectra to characterize plastic waste

streams; iv) predict surfactant properties directly from their molecular structures, and v) map image

data into signals for feedback control.

2 Data Objects and Mathematical Representations

A wide range of datasets encountered in manufacturing can be represented in the form of a couple

of fundamental mathematical objects: tensors and graphs. Such representations are so general that, in

fact, it is difﬁcult to imagine a dataset that cannot be represented in this way. The key distinction

between a tensor and a graph is that a tensor is a regular object (e.g., a regular mesh) while a graph

is not (e.g., an irregular mesh). Moreover, tensors implicitly encode positional context, while graphs

might or might not encode positional context. Differences between tensors and graph representa-

tions play a key role in the way that CNN architectures are designed to extract information from

data. Unfortunately, it is often not obvious what data representation is most suitable for a particular

application and sometimes the representation might not naturally emerge from the data. In fact, one

can think of CNNs as a tool for representation learning, in the sense that it aims to learn the best way

to represent the data to make a prediction.

http://zavalab.engr.wisc.edu

2.1 Tensor Representations

Data objects are often attached to a grid; the most common example of this is a grayscale image,

which can be represented as a 2D grid object. Here, every spatial grid point is a pixel and the data en-

try in such pixel is the intensity of light. A grayscale video is simply a sequence of grayscale images

and can be represented as a 3D grid object (a couple of spatial dimensions plus time); here, every

space-time grid point is a voxel and the data entry is the intensity of light. A common misconception

of grid data is that it can only be used to represent images and videos but its scope is much broader.

For instance, 3D density ﬁelds of chemicals or velocities (as those generated using molecular and

ﬂuid dynamics simulations) can be represented as a 3D grid; moreover, the thermal ﬁeld of a surface

can be represented as a 2D grid.

Tensors are mathematical objects used to represent grid data. A tensor is a generalization of vec-

tors (1D tensors) and matrices (2D tensors) to high dimensions [27]. A key property of a tensor is that

it implicitly encodes positional context and order (it lives in a Euclidean space); speciﬁcally, every

entry of a tensor has an associated set of coordinates/indexes that uniquely specify the location of an

entry in the tensor. Due to its positional context, the nature of the tensor can altered by rotations; for

instance, rotating an image (or transposing its associated matrix) distorts its properties.

0 0 0 1 1 1 1 0 0 0

0 1 1 2 2 2 2 1 1 0

0 1 2 3 4 4 3 2 1 0

1 2 3 5 6 6 5 3 2 1

1 2 4 6 8 8 6 4 2 1

1 2 3 5 6 6 5 3 2 1

0 1 2 3 4 4 3 2 1 0

0 1 1 2 2 2 2 1 1 0

0 0 0 1 1 1 1 0 0 0

1 2 3 4 5 6 7 8 9 10

Figure 1: Representation of a grayscale image (left) as a 2D grid object (middle). The grid is a matrix

in which each entry represents a pixel and the numerical value in the entry is the intensity of light.

Representation of the grayscale image as a manifold (right); this reveals geometrical patterns of the

image.

Tensors are ﬂexible objects that can also be used to represent multi-attribute/multi-channel grid

data. For example, color images can be represented as a superposition of three grids (red, green, and

blue channels). Here, each channel is a 2D tensor (a matrix) and the stacking of these three channels

is a 3D tensor. Channels can also be used to represent multi-variate data in each entry of a grid. For

instance, an audio or sensor signal is a time series that can be represented as a one-channel vector,

while a multivariate time series (e.g., as such obtained from a collection of sensors in a manufacturing

facility) can be represented as multi-channel vector.

http://zavalab.engr.wisc.edu

It is important to highlight that there is an inherent duality between images (reality as perceived

by human vision or an optical device) and tensors (its mathematical representation). Speciﬁcally,

images are optical ﬁelds that our visual sensing system capture and process to navigate the world

and make decisions, while tensors are artiﬁcial mathematical representations used for computer pro-

cessing. Making this distinction explicit is important, because humans typically excel at extracting

information from visual data (without having any knowledge of mathematics) compared to numeri-

cal data (e.g., number sequences). As such, this begs the questions: Why do automation systems present

data to human operators as numbers? What are the best visual representations that humans can use to inter-

pret and analyze data more easily? These questions are at the core of human-computer interaction and

highlight the relevance of data visualization and processing techniques.

It is also important to highlight that the human vision system and the brain have inherent limita-

tions in sensing and interpreting optical ﬁelds. For instance, the human vision and auditory system

cannot capture all frequencies present in an optical ﬁeld or an audio signal; as such, we need instru-

mentation (e.g., microscopes and nocturnal vision systems) that reveal/highlight information that

cannot be captured with our limited senes. Moreover, the human brain often gets “confused” by dis-

tortions of optical ﬁelds and audio signals (e.g., rotations and deformations) and by noise (e.g., fog

and white noise). These limitations can be overcome with the use of artiﬁcial intelligence tools such

as CNNs. Here, sensing signals such as images and audio signals are represented mathematically

as grid data to extract information. Unfortunately, grid data representations are limited in that they

inherently represent regular objects and are susceptible to rotation and deformations.

2.2 Graph Representations

Graphs provide another ﬂexible and powerful mathematical data representation. A graph is a topo-

logical object that consists of a set of nodes and a set of edges; each node is a point in a graph object

and each edge connects a pair of nodes. The connectivity (topology) of a graph is represented as an

adjacency matrix.

0 0 0 1 1 1 1 0 0 0

0 1 1 2 2 2 2 1 1 0

0 1 2 3 4 4 3 2 1 0

1 2 3 5 6 6 5 3 2 1

1 2 4 6 8 8 6 4 2 1

1 2 3 5 6 6 5 3 2 1

0 1 2 3 4 4 3 2 1 0

0 1 1 2 2 2 2 1 1 0

0 0 0 1 1 1 1 0 0 0

()

Figure 2: Representation of a grayscale image (left) as graph (middle). Each node in the graph repre-

sents a pixel and the weight in the node encodes the intensity of light. The graph is a topologically

invariant object that is not affected by deformations (right).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ConvolutionalNeuralNetworks:BasicConceptsandApplicationsinManufacturingShengliJiangy,ShiyiQiny,JoshuaL.Pulsipherz,andVictorM.Zavalay*yDepartmentofChemicalandBiologicalEngineeringUniversityofWisconsin-Madison,1415EngineeringDr,Madison,WI53706,USAzDepartmentofChemicalEngineeringCarnegieMellonUniversit...

展开>> 收起<<

Convolutional Neural Networks Basic Concepts and Applications in Manufacturing Shengli Jiangy Shiyi Qiny Joshua L. Pulsipherz and Victor M. Zavalay.pdf

共36页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Convolutional Neural Networks Basic Concepts and Applications in Manufacturing Shengli Jiangy Shiyi Qiny Joshua L. Pulsipherz and Victor M. Zavalay

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: