IMPROVED PROJECTION LEARNING FOR LOWER DIMENSIONAL FEATURE MAPS Ilan Price Mathematical Institute

2025-05-08 0 0 661.08KB 5 页 10玖币

侵权投诉

IMPROVED PROJECTION LEARNING FOR LOWER DIMENSIONAL FEATURE MAPS

Ilan Price

Mathematical Institute

University of Oxford

& The Alan Turing Institute

Jared Tanner

Mathematical Institute

University of Oxford

ABSTRACT

The requirement to repeatedly move large feature maps off-

and on-chip during inference with convolutional neural net-

works (CNNs) imposes high costs in terms of both energy

and time. In this work we explore an improved method for

compressing all feature maps of pre-trained CNNs to below a

speciﬁed limit. This is done by means of learned projections

trained via end-to-end ﬁnetuning, which can then be folded

and fused into the pre-trained network. We also introduce a

new ‘ceiling compression’ framework in which evaluate such

techniques in view of the future goal of performing inference

fully on-chip.

Index Terms—efﬁcient deep learning, convolutional

neural networks, feature map compression

1. INTRODUCTION

Modern neural network architectures can achieve high accu-

racy while possessing far fewer trainable parameters than had

traditionally been expected. Prototypical examples include

compact weights and shared parameters in convolutional and

recurrent neural networks respectively, as well as sparsifying

and quantizing the weights within such networks, see [1, 2]

and references therein. The efﬁciencies in storing and trans-

mitting these networks stand in stark contrast to their efﬁ-

ciency at inference time which is determined not only by the

size of the model itself (weights, biases, etc), but increas-

ingly importantly by the intermediate feature-maps (represen-

tations) generated as the outputs of successive layers and in-

puts to the following layers. For example, with imagenet res-

olution (224x224) inputs, and even without sparsifying the

networks, the single largest feature map is 7% of the model

size in Resnet18, 2.3% in VGG16, and 34% in MobilenetV2.

The relative model to feature map sizes can be dramatically

exacerbated as these networks can have their number of pa-

rameters reduced to only a tiny fraction of their original size

without loss in classiﬁcation accuracy [1]. This motivates a

line of research to improve efﬁciency by compressing feature

maps too; see the related works Section 2.

Typically, however, not all feature maps need to be stored

simultaneously - once a feature map has been used as the in-

put for the following layer(s), it can be immediately deleted.

Herein we propose an improved method for learning low-rank

projections which can be incorporated into pre-trained CNNs

to reduce their maximal memory requirements. So doing, this

approach seeks to both reduce the memory requirements on

a device, and ideally to eliminate off-chip memory access

mid-forward-pass, which can dominate power usage [3, 4],

a goal which is would enable lower-power, edge-device de-

ployed deep networks.

2. RELATED WORK

One strand of research on feature map compression makes use

of (and often tries to increase) the sparsity of post-activation

feature maps, which endows a natural compressibility. Works

such as [5], [6], and [7, 8], develop accelerators which use

Zero Run-length encoding, compressed column storage, and

zero-value compression, respectively, to leverage the natu-

rally occurring feature map sparsity to shrink memory access.

In [9, 10], the authors leverage both ‘sparsity’ and ‘smooth-

ness’ of feature maps, by decomposing the streamed input

into zero and non-zero stream, and applying run-length en-

coding to compress the the lengthy runs of zeros. They then

compress the non-zero values with bit-plane compression.

Similarly, [11] induces sparse feature maps by adding L1

regularisation when ﬁnetuning pre-trained networks. Further-

more, they use linear (uniform) quantisation of the feature

maps, as well as entropy coding for the resulting sparse fea-

ture maps. Lastly, [12] proposes an alternative method for

inducing sparse activations, based on ﬁnetuning with Hoyer-

sparsity regularisation and a new discontinuous Forced Acti-

vation Threshold ReLU (FATReLU) deﬁned for some thresh-

old Tas 0if x < T and xotherwise. Given that the sparsity

of the feature maps of the aforementioned methods is in-

put dependent, the compressibility and resulting compute

resources needed for inference with this class of methods are

not guaranteed or known ahead of time.

An alternative line of work focuses on transform-based

compression. In [13], they propose applying a 1D Discrete

Cosine Transform (DCT) on the channel dimension of all fea-

ture maps, which are then masked and then zero value coded.

arXiv:2210.15170v1 [cs.LG] 27 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

IMPROVEDPROJECTIONLEARNINGFORLOWERDIMENSIONALFEATUREMAPSIlanPriceMathematicalInstituteUniversityofOxford&TheAlanTuringInstituteJaredTannerMathematicalInstituteUniversityofOxfordABSTRACTTherequirementtorepeatedlymovelargefeaturemapsoff-andon-chipduringinferencewithconvolutionalneuralnet-works(CNNs)im...

展开>> 收起<<

IMPROVED PROJECTION LEARNING FOR LOWER DIMENSIONAL FEATURE MAPS Ilan Price Mathematical Institute.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

IMPROVED PROJECTION LEARNING FOR LOWER DIMENSIONAL FEATURE MAPS Ilan Price Mathematical Institute

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: