PREPRINT 1 IR2Net Information Restriction and Information Recovery for Accurate Binary Neural Networks

2025-05-02 0 0 854.1KB 11 页 10玖币
侵权投诉
PREPRINT 1
IR2Net: Information Restriction and Information
Recovery for Accurate Binary Neural Networks
Ping Xue, Yang Lu, Jingfei Chang, Xing Wei, and Zhen Wei
Abstract—Weight and activation binarization can efficiently
compress deep neural networks and accelerate model inference,
but cause severe accuracy degradation. Existing optimization
methods for binary neural networks (BNNs) focus on fitting full-
precision networks to reduce quantization errors, and suffer from
the trade-off between accuracy and computational complexity. In
contrast, considering the limited learning ability and information
loss caused by the limited representational capability of BNNs, we
propose IR2Net to stimulate the potential of BNNs and improve
the network accuracy by restricting the input information and
recovering the feature information, including: 1) information
restriction: for a BNN, by evaluating the learning ability on
the input information, discarding some of the information it
cannot focus on, and limiting the amount of input information
to match its learning ability; 2) information recovery: due to
the information loss in forward propagation, the output feature
information of the network is not enough to support accurate
classification. By selecting some shallow feature maps with richer
information, and fusing them with the final feature maps to
recover the feature information. In addition, the computational
cost is reduced by streamlining the information recovery method
to strike a better trade-off between accuracy and efficiency.
Experimental results demonstrate that our approach still achieves
comparable accuracy even with 10x floating-point operations
(FLOPs) reduction for ResNet-18. The models and code are
available at https://github.com/pingxue-hfut/IR2Net.
Index Terms—Model compression, information restriction &
recovery, image classification, deep learning.
I. INTRODUCTION
DEEP Convolutional Neural Networks (CNNs) have made
much progress in a wide variety of computer vision
applications [1]–[4]. However, as the research advances, the
depth of the networks has expanded from a few layers to hun-
dreds of layers [5]–[8]. The huge number of parameters and
This work was supported in part by the National Key Research and
Development Program under Grant 2018YFC0604404, in part by the National
Natural Science Foundation of China under Grant 61806067, in part by the
Anhui Provincial Key R&D Program (202004a05020040), and in part by the
Intelligent Network and New Energy Vehicle Special Project of Intelligent
Manufacturing Institute of HFUT (IMIWL2019003). (Corresponding author:
Yang Lu.)
Ping Xue and Jingfei Chang are with the School of Computer Science
and Information Engineering, Hefei University of Technology, Hefei 230009,
China (e-mail: xueping1001@126.com; cjfhfut@mail.hfut.edu.cn).
Yang Lu and Zhen Wei are with the School of Computer Science and
Information Engineering, Hefei University of Technology, Hefei 230009,
China, with the Anhui Mine IOT and Security Monitoring Technology Key
Laboratory, Hefei 230088, China , and also with the Engineering Research
Center of Safety Critical Industrial Measurement and Control Technology,
Ministry of Education, Hefei University of Technology, Hefei 230009, China
(e-mail: luyang.hf@126.com; weizhen@gocom.cn).
Xing Wei is with the School of Computer Science and Information Engi-
neering, Hefei University of Technology, Hefei 230009, China, and also with
the Intelligent Manufacturing Institute of HeFei University of Technology,
Hefei 230009, China (e-mail: weixing@hfut.edu.cn).
FP
BNN
Level 1 Level 2 Level 3 Level 4
Low-level High-level
Input
Fig. 1. The differences of attention maps for full-precision network (FP) and
BNN.
the ultra-high computational complexity of CNNs make their
deployment very constrained, especially under the conditions
of applications with high real-time requirements or limited
storage capacity. To solve this problem, various compression
techniques for CNNs have emerged. Network pruning [9]–[11]
reduces model redundancy by pruning convolutional kernels or
channels, efficient architecture design [12]–[14] replaces con-
ventional convolutional layers with well-designed lightweight
modules to speed up network inference, knowledge distillation
[15], [16] attempts to transfer knowledge from complex net-
works (teachers) to compact networks (students), quantization
[17]–[22] replaces 32-bit weights and activations with low-
bit (e.g., 16-bit) ones to reduce both memory footprint and
computational complexity. The extreme of quantization is
binarization. Compared with 32-bit floating-point networks,
network binarization constrains both the weights and activa-
tions to {-1, +1}, i.e., the parameters of binary neural networks
(BNNs) need only 1-bit representation, which greatly reduces
the storage requirement; furthermore, while binarizing the
network weights and activations, the computationally intensive
matrix multiplication and addition operations in full-precision
networks are replaced with low-cost XNOR and bitcount,
which greatly reduces the network inference delay. Therefore,
benefiting from the high compression ratio, acceleration, and
energy-saving, network binarization is considered as one of
the most promising techniques for network compression and
is the focus of this work.
Network binarization has attracted a lot of attention due
to its advantages in compression and acceleration. Although
much progress has been made, the existing binarization meth-
ods still suffer from a trade-off between accuracy and effi-
ciency. For example, XNOR-Net [23] and Bi-Real Net [24]
have improved the accuracy of BNNs with negligible extra
computation, there remains a large accuracy gap between
arXiv:2210.02637v1 [cs.CV] 6 Oct 2022
2 PREPRINT
them and the full-precision counterparts; whereas Group-Net
[25] and MeliusNet [26] achieve comparable accuracy to that
of full-precision networks, but they introduce a noticeable
additional computational cost, which significantly offsets the
advantages of network binarization. Therefore, one of the
motivations for this work is to strike a better trade-off between
the accuracy and computational complexity for BNNs.
In addition, the performance degradation of BNNs is mainly
caused by their limited representational capability. BNNs
represent weights and activations with 1-bit, which means the
theoretical representation precision is only 1/231 compared to
the full-precision counterparts. The limited representational
capability leads to two drawbacks in BNNs: limited data
information acceptance (i.e., learning ability) and severe infor-
mation loss during forward propagation. As shown in Figure
1, at level 4 of the attention maps [27], it can be seen that the
full-precision network can focus on much larger information
regions of interest (the highlighted regions of the attention
maps) than the BNN do, which is only able to accept limited
information; besides, the information loss during the forward
propagation of the BNN is also evident in the flow of the
attention maps from low to high levels. IR-Net [28] and BBG
[29] reduce the information loss in forward propagation by
balancing and normalizing the weights to achieve maximum
information entropy, which improves the network accuracy
to some extent. However, these methods do not consider the
limited information acceptance of BNNs, while they remain
significant accuracy degradation on large-scale datasets (e.g.,
ImageNet).
To solve the aforementioned problems, from the perspective
of the representational capability of BNNs themselves, we
propose IR2Net, a binarization approach to enhance BNNs via
restricting input information and recovering feature informa-
tion: 1) intuitively, different students (networks) have different
learning abilities, for those with strong learning abilities, more
information can be provided for their learning and refining,
whereas for those with weak learning abilities, discarding
redundant information is needed for better learning. IR2Net
introduces the information restriction method to restrict the
input information and regularize the networks, thus forces
BNNs to focus on the more critical information with their
limited learning abilities; (2) for information loss during for-
ward propagation in BNNs, IR2Net leverages the information
recovery method to fuse the shallow feature information with
the final feature information before the classifier (or other task-
specific modules) to fix the information loss and improve the
accuracy.
With the abovementioned designs, the proposed IR2Net can
effectively force BNNs to focus on important information,
defend against information loss in forward propagation, and
then achieve advanced performance and a good trade-off
between accuracy and efficiency on various networks and
datasets.
The main contributions can be summarized as follows.
1) We propose IR2Net, the first to mitigate the information
loss and the mismatch between learning ability and informa-
tion quantity from the perspective of the limited representa-
tional capability of BNNs caused by quantization.
2) An information restriction method is designed to restrict
the input information by the generated attention masks so that
the amount of input information matches the learning ability
of the network, and then the representational capability of the
network is fully utilized without introducing additional costs.
3) An information recovery method is proposed to resist
the information loss in forward propagation by fusing shallow
and deep information; a compact information recovery method
is also proposed to reduce additional computational cost and
empower the network to trade-off accuracy and computational
complexity.
4) Extensive experimental evaluations demonstrate that the
proposed IR2Net achieves new state-of-the-art performance on
both CIFRA-10 and ImageNet, and also has good versatility.
II. RELATED WORK
A. Network Binarization
The pioneering study of network binarization dates back
to BNN [30], which obtains comparable accuracy on small
datasets (including MNIST, SVHN [31], and CIFAR-10 [32]),
yet encounters severe performance degradation while on large-
scale datasets (e.g., ImageNet [33]). Therefore, substantial
research efforts are invested in minimizing the accuracy gap
between BNNs and full-precision ones. The Enhancement of
BNNs usually requires the introduction of additional compu-
tational effort. Some works focus on using a fractional amount
of real-valued operations in exchange for significant accuracy
gains. For instance, XNOR-Net [23] improves the performance
of BNNs on ImageNet to some extent by introducing real-
valued scaling factors. XNOR-Net++ [34] on top of this by
fusing the separated weights and activation scaling factors into
one, which is learned discriminatively via backpropagation.
Bi-Real Net [24] connects the real-valued activation of adja-
cent layers to enhance the network representational capability.
BBG [29] adds a gated module to the connection. Real-to-Bin
[35] obtains the activation scaling factors via SE [36]. RBNN
[37] further reduces the quantization error from the perspective
of intrinsic angular bias. Whereas some other works relax
the constraints on the additional computational complexity for
higher accuracy. ABC-Net [38] uses linear combinations of
multiple binary bases to approximate the real-valued weights
and activations. HORQ-Net [39] reduces the residual between
real-valued activations and binary activations by utilizing a
high-order approximation scheme. CBCN [40] enhances the
diversity of intermediate feature maps by rotating the weight
matrix. MeliusNet [26] designs Dense Block and Improvement
Block to improve the feature capability and quality, respec-
tively. Group-Net [25] and BENN [41] use multiple BNNs for
combination or ensemble to obtain significant improvement.
Although great progress has been made in the research
of BNNs, the existing methods either remain a significant
accuracy gap compared with full-precision networks, or in-
troduce a large amount of computation for comparable perfor-
mance, which largely offsets the advantages in compression
and acceleration and deviates from the original purpose of
network binarization. Therefore, IR2Net is proposed, aiming
at acquiring higher network accuracy with less computational
PING XUE et al.: IR2NET: INFORMATION RESTRICTION AND INFORMATION RECOVERY FOR ACCURATE BINARY NEURAL NETWORKS 3
Input
MaxPool, S/2
BinConv
BinConv
BinConv
BinConvs
BinConv, S/2
BinConv
BinConv
BinConvs
BinConv, S/2
BinConv
BinConv
BinConvs
BinConv, S/2
BinConv
BinConv
BinConvs
Information
Restriction
Information
Recovery
GlobalAvgPool
FC
Output
7*7Conv, S/2
Last Feature Maps
Fused Information
Shallow Feature Maps
Masked Input
Fig. 2. Illustration of IR2Net. ResNet-18 is used as an example backbone. Batch-normalization layer (BN), nonlinear layer, and shortcut are omitted for
simplicity. The output feature maps before downsampling layers and of the penultimate layer are selected for information recovery to retain more information
with less computational cost.
complexity. Moreover, the trade-off between accuracy and effi-
ciency is pursued by adjusting the hyperparameters introduced
in IR2Net, i.e., to achieve better accuracy with comparable
computational cost, or to obtain comparable accuracy with less
computation complexity.
B. Efficient Architecture Design
The main point of this line is to design compact archi-
tecture for model compression and acceleration. AlexNet [1]
introduces group convolution to overcome the GPU memory
constraints by partitioning input feature channels into mutually
exclusive groups for convolution independently. However,
group operation blocks the information interaction between
different groups, so ShuffleNet [13] introduces channel shuffle
operation on top of group convolution to maintain the con-
nections between groups. IGCNets [42] uses two successive
interleaved group convolutions to achieve complementarity.
Xception [43] proposes a depth-separable convolution, which
factorizes a standard convolution into depthwise convolu-
tion and pointwise convolution. MobileNet [12] uses depth-
separable convolution to lighten the network. Based on the
similarity between feature maps, GhostNet [14] introduces
the Ghost module to replace the conventional convolution to
build compact neural networks. The approach along this line
is orthogonal to the binarization method, whereas inspired
by the lightweight structure design, we propose the compact
information recovery method to empower BNNs with the
ability to trade-off accuracy and efficiency while reducing the
extra computational cost.
III. PRELIMINARIES
In full-precision convolutional neural networks, the basic
operation can be formalized as:
z=ωrAr(1)
where ωrindicates the real-valued weight, Aris the real-
valued input activation, and the real-valued convolution.
During the inference, the real-valued convolution operation
contains a large number of floating-point operations and is
computationally intensive. Network binarization aims to rep-
resent weights and activations with only 1-bit. By constraining
the weights and activations to {-1, +1}, the convolution
operations can be implemented using efficient XNOR and
bitcount, which is given as follows:
ωb=sign(ωr), Ab=sign(Ar)
z=ωbAb
(2)
where ωband Abdenote the binary weight and input activation,
respectively, and the binary convolution. sign(·)is the
binarization function, which is used to convert the real-valued
weights and activations into binary ones, and the function takes
the form as:
sign(x) = +1, if x 0
1, otherwise (3)
Usually, binarization causes performance degradation and
most methods [23], [24], [34], [35], [37], [44] introduce real-
valued scaling factors to reduce the quantization error and the
binary convolution operation is replaced as:
z=αβ(ωbAb)(4)
where αand βare the scaling factors for the weights and acti-
vations, respectively (which may not be used simultaneously).
Unlike these methods, in this paper, considering the property
of the limited representational capability, we optimize BNNs
via information restriction and information recovery, so that
the scaling factors can be safely removed (although they could
also be retained for compatibility with existing optimization
methods).
摘要:

PREPRINT1IR2Net:InformationRestrictionandInformationRecoveryforAccurateBinaryNeuralNetworksPingXue,YangLu,JingfeiChang,XingWei,andZhenWeiAbstract—Weightandactivationbinarizationcanefcientlycompressdeepneuralnetworksandacceleratemodelinference,butcausesevereaccuracydegradation.Existingoptimizationme...

展开>> 收起<<
PREPRINT 1 IR2Net Information Restriction and Information Recovery for Accurate Binary Neural Networks.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:854.1KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注