
EFFICIENT SIMILARITY-BASED PASSIVE FILTER PRUNING FOR COMPRESSING CNNS
Arshdeep Singh, Mark D. Plumbley
Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, UK
Email: {arshdeep.singh, m.plumbley}@surrey.ac.uk
ABSTRACT
Convolution neural networks (CNNs) have shown great success
in various applications. However, the computational complexity
and memory storage of CNNs is a bottleneck for their deployment
on resource-constrained devices. Recent efforts towards reducing
the computation cost and the memory overhead of CNNs involve
similarity-based passive filter pruning methods. Similarity-based
passive filter pruning methods compute a pairwise similarity ma-
trix for the filters and eliminate a few similar filters to obtain a
small pruned CNN. However, the computational complexity of
computing the pairwise similarity matrix is high, particularly when
a convolutional layer has many filters. To reduce the computational
complexity in obtaining the pairwise similarity matrix, we propose
to use an efficient method where the complete pairwise similarity
matrix is approximated from only a few of its columns by using a
Nystr¨
om approximation method. The proposed efficient similarity-
based passive filter pruning method is 3 times faster and gives same
accuracy at the same reduction in computations for CNNs com-
pared to that of the similarity-based pruning method that computes
a complete pairwise similarity matrix. Apart from this, the proposed
efficient similarity-based pruning method performs similarly or bet-
ter than the existing norm-based pruning methods. The efficacy of
the proposed pruning method is evaluated on CNNs such as DCASE
2021 Task 1A baseline network and a VGGish network designed for
acoustic scene classification.
Index Terms—Acoustic scene classification, pruning, VGGish,
DCASE.
1. INTRODUCTION
Compressing convolutional neural networks (CNNs) is crucial to re-
duce their computational complexity and memory storage for effi-
cient deployment on resource-constrained devices [1], despite state-
of-the-art performances of CNNs in various applications [2]. Typ-
ically, CNNs have redundant parameters such as weights or filters,
which yield only extra computations and storage without contribut-
ing much to the performance of the underlying task [3, 4]. For exam-
ple, Singh et al. [5, 6] found that 73% of the filters in SoundNet that
do not provide discriminative information across different acoustic
scene classes, and eliminating such filters gives similar performance
compared to that of using all filters in SoundNet. Thus, the com-
pression of CNNs has recently drawn significant attention from the
research community.
Recent efforts towards compressing CNNs involve filter pruning
methods [7, 8] that eliminate some of the filters in CNNs based on
their importance. The importance of the CNN filters is measured in
an active or in a passive manner. Active filter pruning methods in-
volve a dataset. For example, some methods [9, 10, 11] use feature
Fig. 1. An illustration of output produced in a convolution layer by
three CNN filters, F1,F2and F3, with a convolution operation on
randomly generated data points, X∈R2×1000.
maps which are outputs produced by the filters corresponding to a
set of examples, and apply metrics such as entropy or the average
percentage of zeros on the feature maps to quantify the filter impor-
tance. On the other hand, passive filter pruning methods [12, 13]
use only parameters of the filters, such as an absolute sum of the
weights in the filters, to quantify the filter importance. The pas-
sive filter pruning methods do not involve a dataset to measure filter
importance and therefore are easier to apply compared to active fil-
ter pruning methods. After eliminating filters from the CNNs, the
pruned network is fine-tuned to regain some of the performance lost
due to the filter elimination.
Previously, passive filter pruning methods used norm-based met-
rics such as l1-norm [12], which is a sum of the absolute values of
each weight in the filter, or l2-distance of the filters from a geomet-
ric median of all filters [13] to quantify the importance of the filters.
These norm-based methods use a “smaller-norm-less-important” cri-
terion to eliminate filters. For example, a filter having a relatively
high l1-norm is considered more important than others. However,
while selecting relatively high-norm filters as important, norm-based
methods may ignore the redundancy among the high-norm filters. To
illustrate this, we show outputs produced by three filters in Figure 1.
Filters F1and F3have similar l1-norm and produce similar outputs.
However, selecting two important filters out of the three filters shown
in Figure 1, the norm-based method selects filters F1and F3as im-
portant due to their relatively high norm, despite producing similar
outputs, while it eliminates filter F2that produces significantly dif-
ferent output than the other filters. Thus the diversity learned in the
arXiv:2210.17416v1 [cs.CV] 27 Oct 2022