Drastically Reducing the Number of Trainable Parameters in Deep CNNs by Inter-layer Kernel-sharing Alireza Azadbakht1 Saeed Reza Kheradpisheh1 Ismail Khalfaoui-Hassani2 and

2025-05-03 2 0 475.43KB 10 页 10玖币

侵权投诉

Drastically Reducing the Number of Trainable Parameters in

Deep CNNs by Inter-layer Kernel-sharing

Alireza Azadbakht1, Saeed Reza Kheradpisheh1,∗, Ismail Khalfaoui-Hassani2, and

Timoth´ee Masquelier3

1Department of Computer Science, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran

2Artiﬁcial and Natural Intelligence Toulouse Institute (ANITI), Toulouse, France

3CerCo UMR 5549, CNRS Universit´e Toulouse 3, Toulouse, France

Abstract

Deep convolutional neural networks (DCNNs) have

become the state-of-the-art (SOTA) approach for

many computer vision tasks: image classiﬁcation,

object detection, semantic segmentation, etc. How-

ever, most SOTA networks are too large for edge

computing. Here, we suggest a simple way to re-

duce the number of trainable parameters and thus

the memory footprint: sharing kernels between

multiple convolutional layers. Kernel-sharing is

only possible between “isomorphic” layers, i.e. lay-

ers having the same kernel size, input and output

channels. This is typically the case inside each

stage of a DCNN. Our experiments on CIFAR-

10 and CIFAR-100, using the ConvMixer and SE-

ResNet architectures show that the number of pa-

rameters of these models can drastically be re-

duced with minimal cost on accuracy. The re-

sulting networks are appealing for certain edge

computing applications that are subject to severe

memory constraints, and even more interesting

if leveraging “frozen weights” hardware accelera-

tors. Kernel-sharing is also an eﬃcient regulariza-

tion method, which can reduce overﬁtting. The

codes are publicly available at https://github.

com/AlirezaAzadbakht/kernel-sharing

∗Corresponding Author

Email addresses:

al.azadbakht@mail.sbu.ac.ir (AA),

s kheradpisheh@sbu.ac.ir (SRK),

ismail.khalfaoui-hassani@univ-tlse3.fr (IKH),

timothee.masquelier@cnrs.fr (TM)

1 Introduction

Modern deep learning is pushing the boundaries

of artiﬁcial intelligence with increasingly complex

models. These, in turn, come at a skyrocketing

cost in terms of data, energy, memory, computing

power, and time to train and use them. Although

empirical model-scaling laws exist, such as the one

proposed in EﬃcientNet [1], the optimal architec-

tures found by these laws still need tens of millions

of parameters to reach reasonable accuracies on es-

tablished tasks and benchmarks [2] [3] [4] [5] [6]. As

a result, and despite the huge success of large deep

models, they are still not easily usable in resource-

constrained systems, such as edge devices and em-

bedded systems [7].

A ﬁrst solution for adapting large CNNs to

resource-limited systems is to use compact archi-

tectures such as MobileNet [8] designed to mini-

mize the number of computational operations and

trainable parameters. Other techniques such as pa-

rameter quantization [9] [10] and pruning [11] [12]

also reduce the memory footprint and computa-

tional demand of large deep models by reducing

the model size.

To drastically reduce the number of trainable pa-

rameters, we propose sharing parameters between

the network layers. Literally, the same set of pa-

rameters is being used in several layers while it

plays a diﬀerent role at each layer. This helps to

have a smaller set of trainable parameters, with-

out down-scaling the network size. In other words,

arXiv:2210.14151v1 [cs.CV] 23 Oct 2022

the network can be folded into layers with shared

parameters and deepened without increasing the

number of parameters.

When used in CNNs, this technique can be used

by sharing kernels between the isomorphic convo-

lutional layers having the same conﬁguration (i.e.

kernel size and the number of input and output

channels). In traditional CNNs, weight sharing was

limited to neurons in the same feature map. With

kernel-sharing, a kernel could be duplicated in fea-

ture maps at diﬀerent layers. To update a shared

kernel with backpropagation, the gradients should

be accumulated across its diﬀerent feature maps

throughout the network.

With kernel-sharing, the learning algorithm does

not search for layer-speciﬁc kernels, rather it looks

for shared kernels that can play diﬀerent roles at

diﬀerent depths of the network. This could be

considered as a form of regularization on the net-

work complexity. In usual regularization tech-

niques such as L2-norm [13], the complexity of over-

parameterized networks is controlled by putting

pressure on the majority of parameters to get near

zero values. While with shared kernels, the num-

ber of parameters is signiﬁcantly reduced and ker-

nels are forced to maximally exploit their learning

capacity.

We applied the proposed kernel-sharing to two

diﬀerent deep CNN architectures of ConvMixer [14]

and SE-ResNet [15] on CIFAR-10 and CIFAR-100

datasets [16]. The ﬁrst layer of ConvMixer per-

forms a convolution with large kernels and stride,

then, it is followed by a cascade of isomorphic inter-

laying depthwise and pointwise convolutional lay-

ers. The SE-ResNet architecture has several con-

volutional stages with squeeze-and-excitation mod-

ules [15] in between, where, each stage is consist of

several isomorphic convolutional layers.

In the extreme case of applying kernel-sharing

to all the isomorphic layers, the classiﬁcation accu-

racy of the ConvMixer model dropped only by 1.8%

and 3.6% on CIFAR-10 and CIFAR-100 datasets,

while the number of its trainable parameters was

reduced by 13.4 and 9.6 times with respect to the

baseline model. Similarly, the accuracy drop in SE-

ResNet with kernel-sharing was negligible, while,

the number of its trainable parameters was drasti-

cally fewer.

Also, our results indicate that kernel-sharing can

reduce the challenge of overﬁtting when we keep

the network depth or the number of trainable pa-

rameters the same as the baseline model.

2 Method

State-of-the-art CNNs are usually made of tens of

convolutional layers stacked on each other. Usually,

some of these layers have similar conﬁgurations,

such as the same number of channels and kernel

size. In multi-stage deep CNNs, usually, the model

has several stages of consecutive convolutional lay-

ers with the same conﬁguration (see Fig. 1a). In

other words, layer conﬁgurations are similar in each

stage and vary across stages. For example, the

ConvMixer model [14] has only one stage of convo-

lutional layers, and the SE-ResNet [15] model has

three or four stages depending on the given dataset.

Even in deep CNNs with multi-block stages, layers

may have diﬀerent conﬁgurations within the same

block but they are similar to layers in other blocks

(see Fig. 1b).

We introduce kernel-sharing or inter-layer

weight-sharing for deep CNNs to reduce the num-

ber of trainable parameters and eventually reduce

the memory footprint, which is especially useful

in memory-constrained situations. In the context

of CNNs, weight-sharing is an intra-layer concept

and refers to using the same kernel at diﬀerent

locations. However, these kernels diﬀer between

the layers. Here, we propose to go beyond and

train deep CNNs with shared kernels (i.e., kernel

weights) among the layers with the same conﬁgura-

tion, which we are going to call “isomorphic layers”

hereafter (see Fig. 2).

We deﬁne a sharing group as a set of isomorphic

layers sharing their kernels, located in the same

(Fig. 1a) or diﬀerent stages (Fig. 1b). Also, one

might partition a set of isomorphic layers into two

or even more sharing groups. For example, niso-

morphic layers in Fig. 1a could be divided into two

sharing groups of n/2 isomorphic layers.

During the forward pass, isomorphic layers in a

sharing group use the same set of trainable pa-

rameters (i.e., shared kernels). Hence, we should

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DrasticallyReducingtheNumberofTrainableParametersinDeepCNNsbyInter-layerKernel-sharingAlirezaAzadbakht1,SaeedRezaKheradpisheh1;*,IsmailKhalfaoui-Hassani2,andTimotheeMasquelier31DepartmentofComputerScience,FacultyofMathematicalSciences,ShahidBeheshtiUniversity,Tehran,Iran2ArticialandNaturalIntellig...

展开>> 收起<<

Drastically Reducing the Number of Trainable Parameters in Deep CNNs by Inter-layer Kernel-sharing Alireza Azadbakht1 Saeed Reza Kheradpisheh1 Ismail Khalfaoui-Hassani2 and.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Drastically Reducing the Number of Trainable Parameters in Deep CNNs by Inter-layer Kernel-sharing Alireza Azadbakht1 Saeed Reza Kheradpisheh1 Ismail Khalfaoui-Hassani2 and

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: