A Continuous Convolutional Trainable Filter for Modelling Unstructured Data Dario Coscia1 Laura Meneghetti1 Nicola Demo1 Giovanni Stabile21 and

2025-04-30 0 0 2.85MB 17 页 10玖币
侵权投诉
A Continuous Convolutional Trainable Filter for Modelling
Unstructured Data
Dario Coscia1, Laura Meneghetti1, Nicola Demo1, Giovanni Stabile§2,1, and
Gianluigi Rozza1
1Mathematics Area, mathLab, SISSA, via Bonomea 265, I-34136, Trieste, Italy
2Department of Pure and Applied Sciences, Informatics and Mathematics Section,
University of Urbino Carlo Bo, Piazza della Repubblica 13, I-61029, Urbino, Italy
May 26, 2023
Abstract
Convolutional Neural Network (CNN) is one of the most important architectures in deep
learning. The fundamental building block of a CNN is a trainable filter, represented as a
discrete grid, used to perform convolution on discrete input data. In this work, we propose a
continuous version of a trainable convolutional filter able to work also with unstructured data.
This new framework allows exploring CNNs beyond discrete domains, enlarging the usage of
this important learning technique for many more complex problems. Our experiments show
that the continuous filter can achieve a level of accuracy comparable to the state-of-the-art
discrete filter, and that it can be used in current deep learning architectures as a building
block to solve problems with unstructured domains as well.
1 Introduction
In the deep learning field, a convolutional neural network (CNN) [28] is one of the most important
architectures, widely used in academia and industrial research. For an overview of the topic,
the interested reader might refer to [30, 16, 2, 5, 52]. Despite the great success in many fields
including, but not limited, to computer vision [26, 40, 22] or natural language processing [50, 11],
current CNNs are constrained to structural data. Indeed, the basic building block of a CNN is
a trainable filter, represented by a discrete grid, which performs cross-correlation, also known as
convolution, on a discrete domain. Nevertheless, the idea behind convolution can be easily extended
mathematically to unstructured domains, for reference see [18]. One possible approach for this kind
of problem is the graph neural networks (GNN) [24, 49], where a graph is built starting from the
topology of the discretized space. This allows us to apply convolution even to unstructured data by
looking at the graph edges, bypassing in this way the limitations of the standard CNNs approach.
However, GNNs typically require huge computational resources, due to their implicit complexity.
Instead in this article, we present a methodology to apply CNNs to unstructured data by intro-
ducing a continuous extension of a convolutional filter, named continuous filter, without modeling
the data using a graph. The main idea, which is depicted graphically in Figure 1, relies on ap-
proximating the continuous filter with a trainable function using a feed-forward neural network
and perform standard continuous convolution between the input data and the continuous filter.
Previous works have introduced different approaches to continuous convolution in various settings
ranging from informatics and graph neural networks to physics and modeling quantum interac-
tions, see for example [39, 41, 4]. Even so, the latter is difficult to generalize, and an analogy
with a discrete CNN filter is not straightforward. To our extent [48, 36] are the closest works in
dario.coscia@sissa.it
laura.meneghetti@sissa.it
nicola.demo@sissa.it
§giovanni.stabile@uniurb.it
gianluigi.rozza@sissa.it
1
arXiv:2210.13416v3 [cs.LG] 25 May 2023
MLP Kernel
01
1
01
1
01
1
a)
b)
c)
Figure 1: Continuous convolutional filter process. The unstructured domain input points falling
into the filter are mapped in the filter domain (a). The filter values are approximated with a MLP
kernel (b). Finally, the convolution between the mapped values and the filter values is performed
(c).
literature to our approach, both approximating the trainable filter function with a feed-forward
neural network and performing continuous convolution. However, [48] and [36] focus on filters with
unbounded domains for convolution. In our work, we instead fix the dimension of the filter, as in
state of the art discrete filters, and learn the approximation function on the filter domain. This
introduces a neat analogy to discrete CNN filters. Furthermore, differently from [48, 36], we also
cover important properties of convolution, such as transposed convolution or different approaches
to multichannel convolution. To summarize, in this work we aim to reproduce as closely as possible
a discrete CNN filter but in a continuous not structured domain setting, in order to exploit the
main deep learning architectures, based on CNNs, to solve problems in not discrete domains. To
the best of the authors’ knowledge, our approach to continuous convolution has not been explored
in literature yet.
The main novelties of this work rely on:
Building a new framework, based on continuous filters, for working with unstructured data
(continuous filter).
Defining a neat analogy between continuous (transposed) convolution and state of the art
discrete (transposed) convolution in CNNs.
Apply continuous convolutional layers in a CNN with partially-completed input.
Exploiting general strategies to work with continuous convolutional autoencoders for dimen-
sionality reduction and system output predictions at unseen time steps.
All this, we highlight, preserving the features of the standard CNNs, which make such an approach
effective even dealing with large datasets. The present contribution is organised as follows: in
Section 2, a small review of deep learning architectures useful for later analysis is done, as well as
introducing the continuous filter for one-dimensional and multi-dimensional channels. In the same
Section, we introduce the main idea to perform transposed continuous convolution. Section 3 is
focused on numerical results. First, we validate the proposed methodology on a discrete domain
problem using a continuous CNN and compare it with its discrete representation. Second, we
show that continuous convolution can also work with partially-completed images. Last, we present
different deep learning architectures using continuous filters to solve the step Navier Stokes problem,
and the multiphase problem. Finally, conclusions follow in Section 4.
2 Methodology
This Section focuses on the various methodologies we rely for building the continuous filter, as well
as the introduction of the framework. First of all, we will describe briefly the feed-forward neural
network and the discrete filter for a CNN in Section 2.1 and Section 2.2 respectively. As already
mentioned in Section 1, one of the main novelty of the work is building a new framework based on
continuous convolution. Hence, Section 2.3 concerns the introduction of our framework in different
settings: single channel, multiple channel and transposed convolution using the continuous filter.
2
2.1 Feed-Forward Neural Network
Feed-forward Neural Network, or multi-layer perceptron (MLP), is the most basic, yet one of the
most important, building block of most current deep learning architectures [16, 13, 5]. Widely
used in deep learning, MLPs have the ability to approximate any continuous function due to the
universal approximation theorem [20, 8, 29]. More technically, given an input vector xRnin and a
.
.
..
.
.
.
.
..
.
.
xnin
x1
x2
ˆy1
ˆynout
Input
layer
Hidden
layer 1
Hidden
layer 2
Output
layer
Figure 2: Schematic structure of Feed-Forward Neural Network.
function to approximate ϕ:Rnin Rnout ; the MLP approximation is done using a parameterised
function class F={fθΘ}, where θare trainable parameters of the network, belonging to the
parameters’ space Θ. A MLP can be represented as a directed acyclic graph, as depicted in Figure
2. In particular, it is composed by an input layer, an output layer and a certain number of hidden
layers, where the processing units of network, called neurons, perform the computation. Each layer
i, with i0, . . . , M , can be thought as a function f(i)belonging to F, and the overall network
function is given by the layers’ composition [9]:
f=f(M)f(M1) · · · f(1) f(0).(2.1.1)
Hence, a single layer i, is a function f(i):RniRni+1 , where ni, represents the number of
neurons in layer i, with n0=nin and nM+1 =nout. Each layer iis composed by θi= (w(i),b(i))
parameters, where w(i)is a real matrix ni+1 ×ni, called weight matrix, and b(i)is a real vector of
dimension ni+1, called bias. The output vector h(i+1) of layer i, corresponding to input vector of
layer i+ 1 (except for the output layer), is then calculated using:
h(i+1) =f(i)(h(i)|θi) = δ(i)(w(i)·h(i)+b(i)),(2.1.2)
where h(0) =x, and h(M+1) =ˆy is the output of the network. The function δ(i):RniRni+1 is
called activation, introducing non-linearity through the network; common choices are represented
by the ReLU function, the sigmoid, the logistic function or the radial activation functions. By
using Equation 2.1.1 and Equation 2.1.2, one can express mathematically a MLP architecture.
During the training process, in which a data-set D={(xi, ϕ(x)i)}n
i=1 composed by nobservation
is fed into the network, the MLP parameters θare modified in order to minimize a loss function
L(θ| D, f). The choice of the loss function depends on the specific problem of application [16, 25,
5]. Hence, the learning phase can be summarised mathematically as:
min
θL(θ| D, f).(2.1.3)
In practice, to solve the minimization problem, different optimization algorithms based on back-
propagation can be used, see [35, 45, 51] for further reference. The optimization phase is done in
multiple training epochs, i.e. a complete repetition of the parameter update involving the complete
training data-set D.
3
2.2 Discrete filter in Convolutional Neural Networks
Convolutional Neural Network (CNN) is a class of deep learning architectures, vastly applied in
computer vision [34, 26, 40, 22]. Over the past years, different CNN architectures have been
presented, for instance AlexNet [27], ResNet [17], Inception [46], VGGNet [42]. Differently from
MLPs, in which affine transformations are performed for learning, a convolutional layer actually
performs the convolution of the input data Iand the so called convolutive filter K, such that
(I ∗ K)(x) = Z
−∞
I(x+τ)K(τ)dτ.(2.2.1)
CNNs perform such convolution1in a discrete setting, using a tensorial representation of the two
functions Iand Kinstead of their continuous formulation. Thus, discrete correlation is computed
as (I ∗K)(x) = P
τ=−∞ I(x+τ)K(τ), with x,τZd(with ddimensions), where the latter infinite
summation can be truncated by discarding the null products. In this way, it is not necessary to
know the original function I, but its evaluation at discrete coordinates. In this context, the filter
Kcan be represented as the tensor KRN1×···×Ndsuch that the element Ki1,...,id≡ K(i1, . . . , id)
with ij∈ {1, . . . , Nj},j∈ {1, . . . , d}. Applying a similar representation also for the input, the
convolution results in the sum of the element-wise multiplication between input and filter, as
sketched in Figure 3. The convolution is of course repeated for all the input components, by
moving the filter across the input in a regularized fashion [16, 5].
0111000
0011100
0001110
0001100
0011000
0110000
1100000
1 0 1
0 1 0
1 0 1
=
1 4 3 4 1
l2 4 3 3
1 2 3 4 1
1 3 3 1 1
3 3 1 1 0
Figure 3: Discrete convolution operation on one dimensional tensor.
The filter components (the so-called weights) represent the trainable parameters of the convo-
lutional layer, which are tuned during the training phase. In general, convolution reduces the size
of a (multidimensional) array, performing downsampling. Conversely, the opposite transformation
to downsampling, called upsampling, used by many deep learning architectures, e.g. autoencoders,
uses transposed convolution. The interested reader might refer to [12, 52] for more information
regarding discrete (transposed) convolution.
2.3 Continuous filter
In contrast to discrete convolution as described in the previous Section, continuous two-dimensional
convolution is mathematically defined as:
Iout(x, y) = ZXZY
I(x+τx, y +τy)· K(τx, τy)xy,(2.3.1)
where K:X × Y Ris the continuous filter function, and I: Ω R2Ris the input function.
The continuous filter function is approximated using a MLP, thus trainable during the training
phase. In order to maintain the parallelism with discrete convolution in CNNs, the definition
adopted for continuous convolution differs from the mathematical one for which X=Y=R. In
fact, the continuous filter presented is defined on a close domain, smaller than the input function
domain, as in the case of the discrete filter. The integral in Equation 2.3.1 can be evaluated
1In many deep learning implementations the term convolution indicates what is known in mathematics as cross-
correlation [16]. In this text, the term convolution will be used to indicate cross-correlation, thus adapting to the
deep learning community convention.
4
摘要:

AContinuousConvolutionalTrainableFilterforModellingUnstructuredDataDarioCoscia∗1,LauraMeneghetti†1,NicolaDemo‡1,GiovanniStabile§2,1,andGianluigiRozza¶11MathematicsArea,mathLab,SISSA,viaBonomea265,I-34136,Trieste,Italy2DepartmentofPureandAppliedSciences,InformaticsandMathematicsSection,UniversityofUr...

展开>> 收起<<
A Continuous Convolutional Trainable Filter for Modelling Unstructured Data Dario Coscia1 Laura Meneghetti1 Nicola Demo1 Giovanni Stabile21 and.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:2.85MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注