Drastically Reducing the Number of Trainable Parameters in Deep CNNs by Inter-layer Kernel-sharing Alireza Azadbakht1 Saeed Reza Kheradpisheh1 Ismail Khalfaoui-Hassani2 and

2025-05-03 0 0 475.43KB 10 页 10玖币
侵权投诉
Drastically Reducing the Number of Trainable Parameters in
Deep CNNs by Inter-layer Kernel-sharing
Alireza Azadbakht1, Saeed Reza Kheradpisheh1,, Ismail Khalfaoui-Hassani2, and
Timoth´ee Masquelier3
1Department of Computer Science, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
2Artificial and Natural Intelligence Toulouse Institute (ANITI), Toulouse, France
3CerCo UMR 5549, CNRS Universit´e Toulouse 3, Toulouse, France
Abstract
Deep convolutional neural networks (DCNNs) have
become the state-of-the-art (SOTA) approach for
many computer vision tasks: image classification,
object detection, semantic segmentation, etc. How-
ever, most SOTA networks are too large for edge
computing. Here, we suggest a simple way to re-
duce the number of trainable parameters and thus
the memory footprint: sharing kernels between
multiple convolutional layers. Kernel-sharing is
only possible between “isomorphic” layers, i.e. lay-
ers having the same kernel size, input and output
channels. This is typically the case inside each
stage of a DCNN. Our experiments on CIFAR-
10 and CIFAR-100, using the ConvMixer and SE-
ResNet architectures show that the number of pa-
rameters of these models can drastically be re-
duced with minimal cost on accuracy. The re-
sulting networks are appealing for certain edge
computing applications that are subject to severe
memory constraints, and even more interesting
if leveraging “frozen weights” hardware accelera-
tors. Kernel-sharing is also an efficient regulariza-
tion method, which can reduce overfitting. The
codes are publicly available at https://github.
com/AlirezaAzadbakht/kernel-sharing
Corresponding Author
Email addresses:
al.azadbakht@mail.sbu.ac.ir (AA),
s kheradpisheh@sbu.ac.ir (SRK),
ismail.khalfaoui-hassani@univ-tlse3.fr (IKH),
timothee.masquelier@cnrs.fr (TM)
1 Introduction
Modern deep learning is pushing the boundaries
of artificial intelligence with increasingly complex
models. These, in turn, come at a skyrocketing
cost in terms of data, energy, memory, computing
power, and time to train and use them. Although
empirical model-scaling laws exist, such as the one
proposed in EfficientNet [1], the optimal architec-
tures found by these laws still need tens of millions
of parameters to reach reasonable accuracies on es-
tablished tasks and benchmarks [2] [3] [4] [5] [6]. As
a result, and despite the huge success of large deep
models, they are still not easily usable in resource-
constrained systems, such as edge devices and em-
bedded systems [7].
A first solution for adapting large CNNs to
resource-limited systems is to use compact archi-
tectures such as MobileNet [8] designed to mini-
mize the number of computational operations and
trainable parameters. Other techniques such as pa-
rameter quantization [9] [10] and pruning [11] [12]
also reduce the memory footprint and computa-
tional demand of large deep models by reducing
the model size.
To drastically reduce the number of trainable pa-
rameters, we propose sharing parameters between
the network layers. Literally, the same set of pa-
rameters is being used in several layers while it
plays a different role at each layer. This helps to
have a smaller set of trainable parameters, with-
out down-scaling the network size. In other words,
1
arXiv:2210.14151v1 [cs.CV] 23 Oct 2022
the network can be folded into layers with shared
parameters and deepened without increasing the
number of parameters.
When used in CNNs, this technique can be used
by sharing kernels between the isomorphic convo-
lutional layers having the same configuration (i.e.
kernel size and the number of input and output
channels). In traditional CNNs, weight sharing was
limited to neurons in the same feature map. With
kernel-sharing, a kernel could be duplicated in fea-
ture maps at different layers. To update a shared
kernel with backpropagation, the gradients should
be accumulated across its different feature maps
throughout the network.
With kernel-sharing, the learning algorithm does
not search for layer-specific kernels, rather it looks
for shared kernels that can play different roles at
different depths of the network. This could be
considered as a form of regularization on the net-
work complexity. In usual regularization tech-
niques such as L2-norm [13], the complexity of over-
parameterized networks is controlled by putting
pressure on the majority of parameters to get near
zero values. While with shared kernels, the num-
ber of parameters is significantly reduced and ker-
nels are forced to maximally exploit their learning
capacity.
We applied the proposed kernel-sharing to two
different deep CNN architectures of ConvMixer [14]
and SE-ResNet [15] on CIFAR-10 and CIFAR-100
datasets [16]. The first layer of ConvMixer per-
forms a convolution with large kernels and stride,
then, it is followed by a cascade of isomorphic inter-
laying depthwise and pointwise convolutional lay-
ers. The SE-ResNet architecture has several con-
volutional stages with squeeze-and-excitation mod-
ules [15] in between, where, each stage is consist of
several isomorphic convolutional layers.
In the extreme case of applying kernel-sharing
to all the isomorphic layers, the classification accu-
racy of the ConvMixer model dropped only by 1.8%
and 3.6% on CIFAR-10 and CIFAR-100 datasets,
while the number of its trainable parameters was
reduced by 13.4 and 9.6 times with respect to the
baseline model. Similarly, the accuracy drop in SE-
ResNet with kernel-sharing was negligible, while,
the number of its trainable parameters was drasti-
cally fewer.
Also, our results indicate that kernel-sharing can
reduce the challenge of overfitting when we keep
the network depth or the number of trainable pa-
rameters the same as the baseline model.
2 Method
State-of-the-art CNNs are usually made of tens of
convolutional layers stacked on each other. Usually,
some of these layers have similar configurations,
such as the same number of channels and kernel
size. In multi-stage deep CNNs, usually, the model
has several stages of consecutive convolutional lay-
ers with the same configuration (see Fig. 1a). In
other words, layer configurations are similar in each
stage and vary across stages. For example, the
ConvMixer model [14] has only one stage of convo-
lutional layers, and the SE-ResNet [15] model has
three or four stages depending on the given dataset.
Even in deep CNNs with multi-block stages, layers
may have different configurations within the same
block but they are similar to layers in other blocks
(see Fig. 1b).
We introduce kernel-sharing or inter-layer
weight-sharing for deep CNNs to reduce the num-
ber of trainable parameters and eventually reduce
the memory footprint, which is especially useful
in memory-constrained situations. In the context
of CNNs, weight-sharing is an intra-layer concept
and refers to using the same kernel at different
locations. However, these kernels differ between
the layers. Here, we propose to go beyond and
train deep CNNs with shared kernels (i.e., kernel
weights) among the layers with the same configura-
tion, which we are going to call “isomorphic layers”
hereafter (see Fig. 2).
We define a sharing group as a set of isomorphic
layers sharing their kernels, located in the same
(Fig. 1a) or different stages (Fig. 1b). Also, one
might partition a set of isomorphic layers into two
or even more sharing groups. For example, niso-
morphic layers in Fig. 1a could be divided into two
sharing groups of n/2 isomorphic layers.
During the forward pass, isomorphic layers in a
sharing group use the same set of trainable pa-
rameters (i.e., shared kernels). Hence, we should
2
摘要:

DrasticallyReducingtheNumberofTrainableParametersinDeepCNNsbyInter-layerKernel-sharingAlirezaAzadbakht1,SaeedRezaKheradpisheh1;*,IsmailKhalfaoui-Hassani2,andTimotheeMasquelier31DepartmentofComputerScience,FacultyofMathematicalSciences,ShahidBeheshtiUniversity,Tehran,Iran2Arti cialandNaturalIntellig...

展开>> 收起<<
Drastically Reducing the Number of Trainable Parameters in Deep CNNs by Inter-layer Kernel-sharing Alireza Azadbakht1 Saeed Reza Kheradpisheh1 Ismail Khalfaoui-Hassani2 and.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:475.43KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注