Convolutional Neural Networks Basic Concepts and Applications in Manufacturing Shengli Jiangy Shiyi Qiny Joshua L. Pulsipherz and Victor M. Zavalay

2025-05-06 0 0 8.4MB 36 页 10玖币
侵权投诉
Convolutional Neural Networks:
Basic Concepts and Applications in Manufacturing
Shengli Jiang, Shiyi Qin, Joshua L. Pulsipher, and Victor M. Zavala*
Department of Chemical and Biological Engineering
University of Wisconsin-Madison, 1415 Engineering Dr, Madison, WI 53706, USA
Department of Chemical Engineering
Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213, USA
Abstract
We discuss basic concepts of convolutional neural networks (CNNs) and outline uses in man-
ufacturing. We begin by discussing how different types of data objects commonly encountered in
manufacturing (e.g., time series, images, micrographs, videos, spectra, molecular structures) can
be represented in a flexible manner using tensors and graphs. We then discuss how CNNs use con-
volution operations to extract informative features (e.g., geometric patterns and textures) from the
such representations to predict emergent properties and phenomena and/or to identify anoma-
lies. We also discuss how CNNs can exploit color as a key source of information, which enables
the use of modern computer vision hardware (e.g., infrared, thermal, and hyperspectral cameras).
We illustrate the concepts using diverse case studies arising in spectral analysis, molecule design,
sensor design, image-based control, and multivariate process monitoring.
Keywords: computer vision, convolutional neural networks, manufacturing, images, graphs.
1 Introduction
Manufacturing is seeing an increasing use of real-time sensing and instrumentation technologies that
generate data in the form of images/video (e.g., infrared and thermal), vibration/audio, and other
complex data forms such as chemical spectra and geometrical structures (e.g., 3D printed objects,
synthesized molecules, crystals). Manufacturing is also seeing the increasing use of automation sys-
tems that aim to exploit such data to make decisions (e.g., optimize production and detect anomalies).
Moreover, modern automation systems are being designed to take instructions/targets in the form
of complex data objects (e.g., voice, text, chemical spectra, and molecular structures).
Modern automation systems used in manufacturing embed highly sophisticated computing work-
flows that use tools from data science and machine learning (ML) to extract and interpret actionable
information from complex data streams. Such workflows resemble those used in other advanced
*Corresponding Author: victor.zavala@wisc.edu
1
arXiv:2210.07848v1 [cs.CV] 14 Oct 2022
http://zavalab.engr.wisc.edu
technologies such as autonomous vehicles (e.g., aerial, terrestrial, and aquatic) and robotics. More-
over, such workflows begin to resemble human systems in which visual, auditive, tactile, and olfac-
tory signals (data) are routinely used to make decisions. For instance, the human olfactory system
generates signals when exposed to specific chemical structures and such signals are processed and
interpreted by the brain to detect anomalies. Similarly, the human visual system generates inter-
pretable signals when exposed to objects with specific geometrical and color features and our audi-
tory system generates interpretable signals when exposed with specific frequencies. As such, from a
conceptual stand-point, we can see that sensing and data science technologies are enabling increas-
ing convergence between industrial (artificial) and human (natural) perception and decision-making.
This opens new and exciting opportunities to synergize human and artificial intelligence with the ul-
timate goal of making manufacturing more efficient, safe, sustainable, and reliable.
In this chapter, we focus on ML technologies that enable information extraction from complex
data sources commonly encountered in manufacturing. Specifically, we review basic concepts of
convolutional neural networks (CNNs) and outline how these tools can be used to conduct diverse
decision-making tasks of interest in manufacturing. At their core, CNNs use a powerful and flexi-
ble mathematical operation known as convolution to extract information from data objects that are
represented in the form of regular grids (1D vectors, 2D matrices, and high-dimensional tensors)
and irregular grids (2D graphs and high-dimensional hypergraphs). These data representations are
flexible and can be used to encode a wide range of data objects such as audio signals, chemical spec-
tra, molecules, images, and videos. Moreover, such representations can encode multi-channel data,
which allows capturing color and multi-variate inputs (e.g., multi-variate time series and molecular
graphs). CNNs use convolution operations to identify features that can be extracted from the data
that best explain an output; such features are identified by identifying optimal convolution filters or
operators, which are the parameters that are learned by the CNN. The learning process of the opera-
tors requires sophisticated optimization algorithms and can be a computationally expensive process.
CNNs is a class of models of an emergent ML field known as geometric deep learning, which lever-
ages tools from geometry and topology to represent and process data.
The earliest version of a CNN was proposed in 1980 by Kunihiko Fukushima [21] and was used
for pattern recognition. In the late 1980s, the LeNet model proposed by LeCun et al. introduced
the concept of backward propagation, which streamlined learning computations using optimization
techniques [45]. Although the LeNet model had a simple architecture, it was capable of recognizing
hand-written digits with high accuracy. In 1998, Rowley et al. proposed a CNN model capable of per-
forming face recognition tasks (this work revolutionized object classification and detection) [75]. The
complexity of CNN models (and their predictive power) has dramatically expanded with the advent
of parallel computing architectures such as graphics processing units [65]. Modern CNN models for
image recognition include SuperVision [44], GoogLeNet [86], VGG [82], and ResNet [29]. New mod-
els are currently being developed to perform diverse computer vision tasks such as object detection
[69], semantic segmentation [49], action recognition [81], and 3D analysis [37]. Nowadays, CNNs are
routinely used in smartphones (unlock feature based on face recognition) [64].
2
http://zavalab.engr.wisc.edu
While CNNs were originally developed for computer vision, the grid data representation used
by CNNs is flexible and can be used to process datasets arising in many different applications. For
instance, in the field of chemistry, Hirohara and co-workers proposed a matrix representations of
SMILES strings (which encodes molecular topology) by using a technique known as one-hot en-
coding [30]. The authors used this representation to train a CNN that could predict the toxicity of
chemicals; it was shown that the CNN outperformed traditional models based on fingerprints (an
alternative molecular representation). In biology, Xie and co-workers applied CNNs to count and
detect cells from micrographs [92]. More recently, CNNs have been expanded to process graph data
representations, which has greatly expanded their application scope. These types of CNNs (known
as graph neural networks) have been widely used in the context of molecular property predictions
[17,26].
Manufacturing covers a broad space of important products and processes that is virtually impos-
sible to enumerate; in this chapter, we focus our attention on applications of CNNs to examples of
potential relevance to chemical and biological manufacturing (which cover domains such as phar-
maceuticals, agricultural products, food products, consumer products, petrochemicals, and materi-
als). We also highlight that these manufacturing sectors are seeing an emergent use of autonomous
platforms that enable flexible and high-throughput experimentation and/or on-demand production
production; as such, the concepts discussed can be applicable in such context. We provide specific
case studies that we believe provide representative examples on how CNNs can be used to facilitate
decision-making in manufacturing. Specifically, we show how to use CNNs to i) decode multivariate
time series data; ii) decode complex signals generated from microscopy and flow cytometry to detect
contaminants in air and solution; iii) decode real-time ATR-FTIR spectra to characterize plastic waste
streams; iv) predict surfactant properties directly from their molecular structures, and v) map image
data into signals for feedback control.
2 Data Objects and Mathematical Representations
A wide range of datasets encountered in manufacturing can be represented in the form of a couple
of fundamental mathematical objects: tensors and graphs. Such representations are so general that, in
fact, it is difficult to imagine a dataset that cannot be represented in this way. The key distinction
between a tensor and a graph is that a tensor is a regular object (e.g., a regular mesh) while a graph
is not (e.g., an irregular mesh). Moreover, tensors implicitly encode positional context, while graphs
might or might not encode positional context. Differences between tensors and graph representa-
tions play a key role in the way that CNN architectures are designed to extract information from
data. Unfortunately, it is often not obvious what data representation is most suitable for a particular
application and sometimes the representation might not naturally emerge from the data. In fact, one
can think of CNNs as a tool for representation learning, in the sense that it aims to learn the best way
to represent the data to make a prediction.
3
http://zavalab.engr.wisc.edu
2.1 Tensor Representations
Data objects are often attached to a grid; the most common example of this is a grayscale image,
which can be represented as a 2D grid object. Here, every spatial grid point is a pixel and the data en-
try in such pixel is the intensity of light. A grayscale video is simply a sequence of grayscale images
and can be represented as a 3D grid object (a couple of spatial dimensions plus time); here, every
space-time grid point is a voxel and the data entry is the intensity of light. A common misconception
of grid data is that it can only be used to represent images and videos but its scope is much broader.
For instance, 3D density fields of chemicals or velocities (as those generated using molecular and
fluid dynamics simulations) can be represented as a 3D grid; moreover, the thermal field of a surface
can be represented as a 2D grid.
Tensors are mathematical objects used to represent grid data. A tensor is a generalization of vec-
tors (1D tensors) and matrices (2D tensors) to high dimensions [27]. A key property of a tensor is that
it implicitly encodes positional context and order (it lives in a Euclidean space); specifically, every
entry of a tensor has an associated set of coordinates/indexes that uniquely specify the location of an
entry in the tensor. Due to its positional context, the nature of the tensor can altered by rotations; for
instance, rotating an image (or transposing its associated matrix) distorts its properties.
0 0 0 1 1 1 1 0 0 0
0 1 1 2 2 2 2 1 1 0
0 1 2 3 4 4 3 2 1 0
1 2 3 5 6 6 5 3 2 1
1 2 4 6 8 8 6 4 2 1
1 2 4 6 8 8 6 4 2 1
1 2 3 5 6 6 5 3 2 1
0 1 2 3 4 4 3 2 1 0
0 1 1 2 2 2 2 1 1 0
0 0 0 1 1 1 1 0 0 0
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
Figure 1: Representation of a grayscale image (left) as a 2D grid object (middle). The grid is a matrix
in which each entry represents a pixel and the numerical value in the entry is the intensity of light.
Representation of the grayscale image as a manifold (right); this reveals geometrical patterns of the
image.
Tensors are flexible objects that can also be used to represent multi-attribute/multi-channel grid
data. For example, color images can be represented as a superposition of three grids (red, green, and
blue channels). Here, each channel is a 2D tensor (a matrix) and the stacking of these three channels
is a 3D tensor. Channels can also be used to represent multi-variate data in each entry of a grid. For
instance, an audio or sensor signal is a time series that can be represented as a one-channel vector,
while a multivariate time series (e.g., as such obtained from a collection of sensors in a manufacturing
facility) can be represented as multi-channel vector.
4
http://zavalab.engr.wisc.edu
It is important to highlight that there is an inherent duality between images (reality as perceived
by human vision or an optical device) and tensors (its mathematical representation). Specifically,
images are optical fields that our visual sensing system capture and process to navigate the world
and make decisions, while tensors are artificial mathematical representations used for computer pro-
cessing. Making this distinction explicit is important, because humans typically excel at extracting
information from visual data (without having any knowledge of mathematics) compared to numeri-
cal data (e.g., number sequences). As such, this begs the questions: Why do automation systems present
data to human operators as numbers? What are the best visual representations that humans can use to inter-
pret and analyze data more easily? These questions are at the core of human-computer interaction and
highlight the relevance of data visualization and processing techniques.
It is also important to highlight that the human vision system and the brain have inherent limita-
tions in sensing and interpreting optical fields. For instance, the human vision and auditory system
cannot capture all frequencies present in an optical field or an audio signal; as such, we need instru-
mentation (e.g., microscopes and nocturnal vision systems) that reveal/highlight information that
cannot be captured with our limited senes. Moreover, the human brain often gets “confused” by dis-
tortions of optical fields and audio signals (e.g., rotations and deformations) and by noise (e.g., fog
and white noise). These limitations can be overcome with the use of artificial intelligence tools such
as CNNs. Here, sensing signals such as images and audio signals are represented mathematically
as grid data to extract information. Unfortunately, grid data representations are limited in that they
inherently represent regular objects and are susceptible to rotation and deformations.
2.2 Graph Representations
Graphs provide another flexible and powerful mathematical data representation. A graph is a topo-
logical object that consists of a set of nodes and a set of edges; each node is a point in a graph object
and each edge connects a pair of nodes. The connectivity (topology) of a graph is represented as an
adjacency matrix.
0 0 0 1 1 1 1 0 0 0
0 1 1 2 2 2 2 1 1 0
0 1 2 3 4 4 3 2 1 0
1 2 3 5 6 6 5 3 2 1
1 2 4 6 8 8 6 4 2 1
1 2 4 6 8 8 6 4 2 1
1 2 3 5 6 6 5 3 2 1
0 1 2 3 4 4 3 2 1 0
0 1 1 2 2 2 2 1 1 0
0 0 0 1 1 1 1 0 0 0
()
()
Figure 2: Representation of a grayscale image (left) as graph (middle). Each node in the graph repre-
sents a pixel and the weight in the node encodes the intensity of light. The graph is a topologically
invariant object that is not affected by deformations (right).
5
摘要:

ConvolutionalNeuralNetworks:BasicConceptsandApplicationsinManufacturingShengliJiangy,ShiyiQiny,JoshuaL.Pulsipherz,andVictorM.Zavalay*yDepartmentofChemicalandBiologicalEngineeringUniversityofWisconsin-Madison,1415EngineeringDr,Madison,WI53706,USAzDepartmentofChemicalEngineeringCarnegieMellonUniversit...

展开>> 收起<<
Convolutional Neural Networks Basic Concepts and Applications in Manufacturing Shengli Jiangy Shiyi Qiny Joshua L. Pulsipherz and Victor M. Zavalay.pdf

共36页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:36 页 大小:8.4MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 36
客服
关注