MULTIPOD CONVOLUTIONAL NETWORK Hongyi Pan Salih Atici Ahmet Enis Cetin Department of Electrical and Computer Engineering

2025-04-29 0 0 445.05KB 7 页 10玖币

侵权投诉

MULTIPOD CONVOLUTIONAL NETWORK

Hongyi Pan, Salih Atici, Ahmet Enis Cetin

Department of Electrical and Computer Engineering

University of Illinois Chicago

Chicago, Illinois, USA

{hpan21, satici2, aecyy}@uic.edu

ABSTRACT

In this paper, we introduce a convolutional network which we call MultiPodNet consisting of a

combination of two or more convolutional networks which process the input image in parallel to

achieve the same goal. Output feature maps of parallel convolutional networks are fused at the

fully connected layer of the network. We experimentally observed that three parallel pod networks

(TripodNet) produce the best results in commonly used object recognition datasets. Baseline pod

networks can be of any type. In this paper, we use ResNets as baseline networks and their inputs are

augmented image patches. The number of parameters of the TripodNet is about three times that of a

single ResNet. We train the TripodNet using the standard backpropagation type algorithms. In each

individual ResNet, parameters are initialized with different random numbers during training. The

TripodNet achieved state-of-the-art performance on CIFAR-10 and ImageNet datasets. For example,

it improved the accuracy of a single ResNet from 91.66% to 92.47% under the same training process

on the CIFAR-10 dataset.

1 Introduction

In this article, we describe a novel convolutional network which we call MultiPodNet consisting of a combination

of two or more convolutional networks which process the input image or data in parallel. Output feature maps of

parallel convolutional networks are fused at the fully connected layer of the network. In other words, we have a bank

of networks consisting of two more networks forming the main body of the MultiPodNet.

Recent studies involving convolutional neural networks and vision transformers have shown great success in image

classiﬁcation tasks and they have created different perspectives. The machine learning revolution that started with a

deep convolutional network continues with hierarchical vision Transformer using shifted windows and next-generation

convolutional neural networks [1], [2], [3]. The great success of transformers, Swin Transformers on image classiﬁ-

cation tasks showed that increasing the number of network parameters and using images as a series of patches help to

increase the performance of the model [4]. Recently introduced ConvNeXt network [5] which is also a huge convolu-

tional network is as successful as transformer-type networks. It also has a huge number of parameters and it is trained

using novel and different methods to improve the classical convolutional models.

In this paper, we increase the number of parameters of the deep neural network using a bank of parallel networks

working towards the same goal such as object recognition. The MultiPod network can also process the input image

as image patches in parallel as in transformer networks. The original input image and/or its augmented versions are

fed into convolutional networks forming the multipod network and the output feature maps of parallel convolutional

networks are combined before the fully connected dense layer. Since the convolutional neural networks are used in

parallel, the network is called MultiPodNet. The difference between MultiPodNet and other concatenated networks

is that the same input image is used as input to each of the networks forming the MultipodNet. We also use image

patches and different augmentation techniques to create several instances of the original input. Depending on the

image database and the object recognition problem different augmentation methods and image patches can be used as

input to the MultiPod Network.

We initialize the parallel networks with different random numbers during the training process. This approach leads to

different parameters in parallel networks and that is how we improve the recognition capability of the single network.

arXiv:2210.00689v1 [cs.CV] 3 Oct 2022

Pod networks capture the fundamental and structural information in the image more than a single network . In this

paper, we present experimental results and observed that the Tripod network with three parallel networks achieve

the best results in CIFAR-10 dataset. The Tripod network consisting of three ResNet-20 performs better than a single

ResNet-20. The Tripod network achieves a state-of-the-art performance on CIFAR-10 dataset, obtaining 92.47%. This

result is also better than the accuracy results of ViT and Swin Transformer networks.

We ﬁrst introduced the concept of two parallel networks working towards the same goal in [6]. There were only

two networks in [6]. One of the networks has binary weights and convolutions in one of the networks was in the

image domain and the second network was a neural network transforming the data to the Hadamard transform domain

and processing it in the transform domain [6–9]. Other parallel networks include Siamese type networks and non-

contrastive learning methods [10, 11]. Xun Huang et al. proposed a DNN with two parallel branches but one of the

networks is trained with low-resolution images and the goal of the structure was salience map prediction [12].

The organization of the paper is as follows. We describe the architecture of Multipod networks in Section 2 and

present experimental results on the CIFAR-10 [13] and ImageNet-1K [14] databases in Section 3. Section 4 describes

the conclusions and future work.

2 Methodology

In this section, we describe the structure of multipod networks along with the different training techniques we used in

this study. Each convolutional network is trained separately and as a result, they have different convolutional weights.

They produce different outputs for a given input image. The output feature maps of every network are combined and

connected to fully connected dense layers in two different ways as shown in Fig. 4. The network is used in image

classiﬁcation tasks and aims to achieve state-of-the-art performance.

It is also possible to input the original image in the form of overlapping or non-overlapping patches and in augmented

forms as in Swin transformers, BYOL, SimSiam, and ConvNext models.

2.1 Input to the Multipod Network

It is stated that data augmentation boosts the performance of convolutional networks, and it helps to avoid overﬁtting.

Although many pre-trained networks are trained on huge datasets such as Coco or ImageNet, recent studies make

use of advanced data augmentation techniques such as Mixup [15], Cutmix [16], RandAugment [17] to improve the

performance.

In this paper, data augmentation techniques are used to create image patches from the original input. We propose a

series of input images, each of which is created using different data augmentation techniques. Some examples of the

input images are shown in Fig. 1 and 2. We use Color Jitter, which randomly changes the brightness, contrast, and

saturation of images, to generate different inputs for each pod of the network. Color Jitter’s sample results are shown

in Fig. 3

(a) (b) (c) (d) (e) (f) (g) (h) (i) (j)

Figure 1: Sample images from CIFAR-10 database [13]: (a) airplane, (b) automobile, (c) bird, (d) cat, (e) deer, (f)

dog, (g) frog, (h) horse, (i) ship, and (j) truck.

2.2 Model Structure

The proposed model consists of two or more convolutional baseline networks whose feature maps are concatenated

before the fully connected layers. In this paper, we use ResNets as the baseline network since they are proven to be

one of the most effective in image classiﬁcation tasks. Their residual learning method allows us to go deeper into

the convolution to capture better features. Different approaches such as concatenation and elementwise multiplication

are also studied to combine the output feature maps in this study. We used two, three, and four ResNet networks for

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MULTIPODCONVOLUTIONALNETWORKHongyiPan,SalihAtici,AhmetEnisCetinDepartmentofElectricalandComputerEngineeringUniversityofIllinoisChicagoChicago,Illinois,USAfhpan21,satici2,aecyyg@uic.eduABSTRACTInthispaper,weintroduceaconvolutionalnetworkwhichwecallMultiPodNetconsistingofacombinationoftwoormoreconvolu...

展开>> 收起<<

MULTIPOD CONVOLUTIONAL NETWORK Hongyi Pan Salih Atici Ahmet Enis Cetin Department of Electrical and Computer Engineering.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

MULTIPOD CONVOLUTIONAL NETWORK Hongyi Pan Salih Atici Ahmet Enis Cetin Department of Electrical and Computer Engineering

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: