MULTIPOD CONVOLUTIONAL NETWORK Hongyi Pan Salih Atici Ahmet Enis Cetin Department of Electrical and Computer Engineering

2025-04-29 0 0 445.05KB 7 页 10玖币
侵权投诉
MULTIPOD CONVOLUTIONAL NETWORK
Hongyi Pan, Salih Atici, Ahmet Enis Cetin
Department of Electrical and Computer Engineering
University of Illinois Chicago
Chicago, Illinois, USA
{hpan21, satici2, aecyy}@uic.edu
ABSTRACT
In this paper, we introduce a convolutional network which we call MultiPodNet consisting of a
combination of two or more convolutional networks which process the input image in parallel to
achieve the same goal. Output feature maps of parallel convolutional networks are fused at the
fully connected layer of the network. We experimentally observed that three parallel pod networks
(TripodNet) produce the best results in commonly used object recognition datasets. Baseline pod
networks can be of any type. In this paper, we use ResNets as baseline networks and their inputs are
augmented image patches. The number of parameters of the TripodNet is about three times that of a
single ResNet. We train the TripodNet using the standard backpropagation type algorithms. In each
individual ResNet, parameters are initialized with different random numbers during training. The
TripodNet achieved state-of-the-art performance on CIFAR-10 and ImageNet datasets. For example,
it improved the accuracy of a single ResNet from 91.66% to 92.47% under the same training process
on the CIFAR-10 dataset.
1 Introduction
In this article, we describe a novel convolutional network which we call MultiPodNet consisting of a combination
of two or more convolutional networks which process the input image or data in parallel. Output feature maps of
parallel convolutional networks are fused at the fully connected layer of the network. In other words, we have a bank
of networks consisting of two more networks forming the main body of the MultiPodNet.
Recent studies involving convolutional neural networks and vision transformers have shown great success in image
classification tasks and they have created different perspectives. The machine learning revolution that started with a
deep convolutional network continues with hierarchical vision Transformer using shifted windows and next-generation
convolutional neural networks [1], [2], [3]. The great success of transformers, Swin Transformers on image classifi-
cation tasks showed that increasing the number of network parameters and using images as a series of patches help to
increase the performance of the model [4]. Recently introduced ConvNeXt network [5] which is also a huge convolu-
tional network is as successful as transformer-type networks. It also has a huge number of parameters and it is trained
using novel and different methods to improve the classical convolutional models.
In this paper, we increase the number of parameters of the deep neural network using a bank of parallel networks
working towards the same goal such as object recognition. The MultiPod network can also process the input image
as image patches in parallel as in transformer networks. The original input image and/or its augmented versions are
fed into convolutional networks forming the multipod network and the output feature maps of parallel convolutional
networks are combined before the fully connected dense layer. Since the convolutional neural networks are used in
parallel, the network is called MultiPodNet. The difference between MultiPodNet and other concatenated networks
is that the same input image is used as input to each of the networks forming the MultipodNet. We also use image
patches and different augmentation techniques to create several instances of the original input. Depending on the
image database and the object recognition problem different augmentation methods and image patches can be used as
input to the MultiPod Network.
We initialize the parallel networks with different random numbers during the training process. This approach leads to
different parameters in parallel networks and that is how we improve the recognition capability of the single network.
arXiv:2210.00689v1 [cs.CV] 3 Oct 2022
Pod networks capture the fundamental and structural information in the image more than a single network . In this
paper, we present experimental results and observed that the Tripod network with three parallel networks achieve
the best results in CIFAR-10 dataset. The Tripod network consisting of three ResNet-20 performs better than a single
ResNet-20. The Tripod network achieves a state-of-the-art performance on CIFAR-10 dataset, obtaining 92.47%. This
result is also better than the accuracy results of ViT and Swin Transformer networks.
We first introduced the concept of two parallel networks working towards the same goal in [6]. There were only
two networks in [6]. One of the networks has binary weights and convolutions in one of the networks was in the
image domain and the second network was a neural network transforming the data to the Hadamard transform domain
and processing it in the transform domain [6–9]. Other parallel networks include Siamese type networks and non-
contrastive learning methods [10, 11]. Xun Huang et al. proposed a DNN with two parallel branches but one of the
networks is trained with low-resolution images and the goal of the structure was salience map prediction [12].
The organization of the paper is as follows. We describe the architecture of Multipod networks in Section 2 and
present experimental results on the CIFAR-10 [13] and ImageNet-1K [14] databases in Section 3. Section 4 describes
the conclusions and future work.
2 Methodology
In this section, we describe the structure of multipod networks along with the different training techniques we used in
this study. Each convolutional network is trained separately and as a result, they have different convolutional weights.
They produce different outputs for a given input image. The output feature maps of every network are combined and
connected to fully connected dense layers in two different ways as shown in Fig. 4. The network is used in image
classification tasks and aims to achieve state-of-the-art performance.
It is also possible to input the original image in the form of overlapping or non-overlapping patches and in augmented
forms as in Swin transformers, BYOL, SimSiam, and ConvNext models.
2.1 Input to the Multipod Network
It is stated that data augmentation boosts the performance of convolutional networks, and it helps to avoid overfitting.
Although many pre-trained networks are trained on huge datasets such as Coco or ImageNet, recent studies make
use of advanced data augmentation techniques such as Mixup [15], Cutmix [16], RandAugment [17] to improve the
performance.
In this paper, data augmentation techniques are used to create image patches from the original input. We propose a
series of input images, each of which is created using different data augmentation techniques. Some examples of the
input images are shown in Fig. 1 and 2. We use Color Jitter, which randomly changes the brightness, contrast, and
saturation of images, to generate different inputs for each pod of the network. Color Jitter’s sample results are shown
in Fig. 3
(a) (b) (c) (d) (e) (f) (g) (h) (i) (j)
Figure 1: Sample images from CIFAR-10 database [13]: (a) airplane, (b) automobile, (c) bird, (d) cat, (e) deer, (f)
dog, (g) frog, (h) horse, (i) ship, and (j) truck.
2.2 Model Structure
The proposed model consists of two or more convolutional baseline networks whose feature maps are concatenated
before the fully connected layers. In this paper, we use ResNets as the baseline network since they are proven to be
one of the most effective in image classification tasks. Their residual learning method allows us to go deeper into
the convolution to capture better features. Different approaches such as concatenation and elementwise multiplication
are also studied to combine the output feature maps in this study. We used two, three, and four ResNet networks for
2
摘要:

MULTIPODCONVOLUTIONALNETWORKHongyiPan,SalihAtici,AhmetEnisCetinDepartmentofElectricalandComputerEngineeringUniversityofIllinoisChicagoChicago,Illinois,USAfhpan21,satici2,aecyyg@uic.eduABSTRACTInthispaper,weintroduceaconvolutionalnetworkwhichwecallMultiPodNetconsistingofacombinationoftwoormoreconvolu...

展开>> 收起<<
MULTIPOD CONVOLUTIONAL NETWORK Hongyi Pan Salih Atici Ahmet Enis Cetin Department of Electrical and Computer Engineering.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:445.05KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注