PREPRINT 1 IR2Net Information Restriction and Information Recovery for Accurate Binary Neural Networks

2025-05-02 2 0 854.1KB 11 页 10玖币

侵权投诉

PREPRINT 1

IR2Net: Information Restriction and Information

Recovery for Accurate Binary Neural Networks

Ping Xue, Yang Lu, Jingfei Chang, Xing Wei, and Zhen Wei

Abstract—Weight and activation binarization can efﬁciently

compress deep neural networks and accelerate model inference,

but cause severe accuracy degradation. Existing optimization

methods for binary neural networks (BNNs) focus on ﬁtting full-

precision networks to reduce quantization errors, and suffer from

the trade-off between accuracy and computational complexity. In

contrast, considering the limited learning ability and information

loss caused by the limited representational capability of BNNs, we

propose IR2Net to stimulate the potential of BNNs and improve

the network accuracy by restricting the input information and

recovering the feature information, including: 1) information

restriction: for a BNN, by evaluating the learning ability on

the input information, discarding some of the information it

cannot focus on, and limiting the amount of input information

to match its learning ability; 2) information recovery: due to

the information loss in forward propagation, the output feature

information of the network is not enough to support accurate

classiﬁcation. By selecting some shallow feature maps with richer

information, and fusing them with the ﬁnal feature maps to

recover the feature information. In addition, the computational

cost is reduced by streamlining the information recovery method

to strike a better trade-off between accuracy and efﬁciency.

Experimental results demonstrate that our approach still achieves

comparable accuracy even with ∼10x ﬂoating-point operations

(FLOPs) reduction for ResNet-18. The models and code are

available at https://github.com/pingxue-hfut/IR2Net.

Index Terms—Model compression, information restriction &

recovery, image classiﬁcation, deep learning.

I. INTRODUCTION

DEEP Convolutional Neural Networks (CNNs) have made

much progress in a wide variety of computer vision

applications [1]–[4]. However, as the research advances, the

depth of the networks has expanded from a few layers to hun-

dreds of layers [5]–[8]. The huge number of parameters and

This work was supported in part by the National Key Research and

Development Program under Grant 2018YFC0604404, in part by the National

Natural Science Foundation of China under Grant 61806067, in part by the

Anhui Provincial Key R&D Program (202004a05020040), and in part by the

Intelligent Network and New Energy Vehicle Special Project of Intelligent

Manufacturing Institute of HFUT (IMIWL2019003). (Corresponding author:

Yang Lu.)

Ping Xue and Jingfei Chang are with the School of Computer Science

and Information Engineering, Hefei University of Technology, Hefei 230009,

China (e-mail: xueping1001@126.com; cjfhfut@mail.hfut.edu.cn).

Yang Lu and Zhen Wei are with the School of Computer Science and

Information Engineering, Hefei University of Technology, Hefei 230009,

China, with the Anhui Mine IOT and Security Monitoring Technology Key

Laboratory, Hefei 230088, China , and also with the Engineering Research

Center of Safety Critical Industrial Measurement and Control Technology,

Ministry of Education, Hefei University of Technology, Hefei 230009, China

(e-mail: luyang.hf@126.com; weizhen@gocom.cn).

Xing Wei is with the School of Computer Science and Information Engi-

neering, Hefei University of Technology, Hefei 230009, China, and also with

the Intelligent Manufacturing Institute of HeFei University of Technology,

Hefei 230009, China (e-mail: weixing@hfut.edu.cn).

BNN

Level 1 Level 2 Level 3 Level 4

Low-level High-level

Input

Fig. 1. The differences of attention maps for full-precision network (FP) and

BNN.

the ultra-high computational complexity of CNNs make their

deployment very constrained, especially under the conditions

of applications with high real-time requirements or limited

storage capacity. To solve this problem, various compression

techniques for CNNs have emerged. Network pruning [9]–[11]

reduces model redundancy by pruning convolutional kernels or

channels, efﬁcient architecture design [12]–[14] replaces con-

ventional convolutional layers with well-designed lightweight

modules to speed up network inference, knowledge distillation

[15], [16] attempts to transfer knowledge from complex net-

works (teachers) to compact networks (students), quantization

[17]–[22] replaces 32-bit weights and activations with low-

bit (e.g., 16-bit) ones to reduce both memory footprint and

computational complexity. The extreme of quantization is

binarization. Compared with 32-bit ﬂoating-point networks,

network binarization constrains both the weights and activa-

tions to {-1, +1}, i.e., the parameters of binary neural networks

(BNNs) need only 1-bit representation, which greatly reduces

the storage requirement; furthermore, while binarizing the

network weights and activations, the computationally intensive

matrix multiplication and addition operations in full-precision

networks are replaced with low-cost XNOR and bitcount,

which greatly reduces the network inference delay. Therefore,

beneﬁting from the high compression ratio, acceleration, and

energy-saving, network binarization is considered as one of

the most promising techniques for network compression and

is the focus of this work.

Network binarization has attracted a lot of attention due

to its advantages in compression and acceleration. Although

much progress has been made, the existing binarization meth-

ods still suffer from a trade-off between accuracy and efﬁ-

ciency. For example, XNOR-Net [23] and Bi-Real Net [24]

have improved the accuracy of BNNs with negligible extra

computation, there remains a large accuracy gap between

arXiv:2210.02637v1 [cs.CV] 6 Oct 2022

2 PREPRINT

them and the full-precision counterparts; whereas Group-Net

[25] and MeliusNet [26] achieve comparable accuracy to that

of full-precision networks, but they introduce a noticeable

additional computational cost, which signiﬁcantly offsets the

advantages of network binarization. Therefore, one of the

motivations for this work is to strike a better trade-off between

the accuracy and computational complexity for BNNs.

In addition, the performance degradation of BNNs is mainly

caused by their limited representational capability. BNNs

represent weights and activations with 1-bit, which means the

theoretical representation precision is only 1/231 compared to

the full-precision counterparts. The limited representational

capability leads to two drawbacks in BNNs: limited data

information acceptance (i.e., learning ability) and severe infor-

mation loss during forward propagation. As shown in Figure

1, at level 4 of the attention maps [27], it can be seen that the

full-precision network can focus on much larger information

regions of interest (the highlighted regions of the attention

maps) than the BNN do, which is only able to accept limited

information; besides, the information loss during the forward

propagation of the BNN is also evident in the ﬂow of the

attention maps from low to high levels. IR-Net [28] and BBG

[29] reduce the information loss in forward propagation by

balancing and normalizing the weights to achieve maximum

information entropy, which improves the network accuracy

to some extent. However, these methods do not consider the

limited information acceptance of BNNs, while they remain

signiﬁcant accuracy degradation on large-scale datasets (e.g.,

ImageNet).

To solve the aforementioned problems, from the perspective

of the representational capability of BNNs themselves, we

propose IR2Net, a binarization approach to enhance BNNs via

restricting input information and recovering feature informa-

tion: 1) intuitively, different students (networks) have different

learning abilities, for those with strong learning abilities, more

information can be provided for their learning and reﬁning,

whereas for those with weak learning abilities, discarding

redundant information is needed for better learning. IR2Net

introduces the information restriction method to restrict the

input information and regularize the networks, thus forces

BNNs to focus on the more critical information with their

limited learning abilities; (2) for information loss during for-

ward propagation in BNNs, IR2Net leverages the information

recovery method to fuse the shallow feature information with

the ﬁnal feature information before the classiﬁer (or other task-

speciﬁc modules) to ﬁx the information loss and improve the

accuracy.

With the abovementioned designs, the proposed IR2Net can

effectively force BNNs to focus on important information,

defend against information loss in forward propagation, and

then achieve advanced performance and a good trade-off

between accuracy and efﬁciency on various networks and

datasets.

The main contributions can be summarized as follows.

1) We propose IR2Net, the ﬁrst to mitigate the information

loss and the mismatch between learning ability and informa-

tion quantity from the perspective of the limited representa-

tional capability of BNNs caused by quantization.

2) An information restriction method is designed to restrict

the input information by the generated attention masks so that

the amount of input information matches the learning ability

of the network, and then the representational capability of the

network is fully utilized without introducing additional costs.

3) An information recovery method is proposed to resist

the information loss in forward propagation by fusing shallow

and deep information; a compact information recovery method

is also proposed to reduce additional computational cost and

empower the network to trade-off accuracy and computational

complexity.

4) Extensive experimental evaluations demonstrate that the

proposed IR2Net achieves new state-of-the-art performance on

both CIFRA-10 and ImageNet, and also has good versatility.

II. RELATED WORK

A. Network Binarization

The pioneering study of network binarization dates back

to BNN [30], which obtains comparable accuracy on small

datasets (including MNIST, SVHN [31], and CIFAR-10 [32]),

yet encounters severe performance degradation while on large-

scale datasets (e.g., ImageNet [33]). Therefore, substantial

research efforts are invested in minimizing the accuracy gap

between BNNs and full-precision ones. The Enhancement of

BNNs usually requires the introduction of additional compu-

tational effort. Some works focus on using a fractional amount

of real-valued operations in exchange for signiﬁcant accuracy

gains. For instance, XNOR-Net [23] improves the performance

of BNNs on ImageNet to some extent by introducing real-

valued scaling factors. XNOR-Net++ [34] on top of this by

fusing the separated weights and activation scaling factors into

one, which is learned discriminatively via backpropagation.

Bi-Real Net [24] connects the real-valued activation of adja-

cent layers to enhance the network representational capability.

BBG [29] adds a gated module to the connection. Real-to-Bin

[35] obtains the activation scaling factors via SE [36]. RBNN

[37] further reduces the quantization error from the perspective

of intrinsic angular bias. Whereas some other works relax

the constraints on the additional computational complexity for

higher accuracy. ABC-Net [38] uses linear combinations of

multiple binary bases to approximate the real-valued weights

and activations. HORQ-Net [39] reduces the residual between

real-valued activations and binary activations by utilizing a

high-order approximation scheme. CBCN [40] enhances the

diversity of intermediate feature maps by rotating the weight

matrix. MeliusNet [26] designs Dense Block and Improvement

Block to improve the feature capability and quality, respec-

tively. Group-Net [25] and BENN [41] use multiple BNNs for

combination or ensemble to obtain signiﬁcant improvement.

Although great progress has been made in the research

of BNNs, the existing methods either remain a signiﬁcant

accuracy gap compared with full-precision networks, or in-

troduce a large amount of computation for comparable perfor-

mance, which largely offsets the advantages in compression

and acceleration and deviates from the original purpose of

network binarization. Therefore, IR2Net is proposed, aiming

at acquiring higher network accuracy with less computational

PING XUE et al.: IR2NET: INFORMATION RESTRICTION AND INFORMATION RECOVERY FOR ACCURATE BINARY NEURAL NETWORKS 3

Input

MaxPool, S/2

BinConv

BinConvs

BinConv, S/2

BinConv

BinConvs

BinConv, S/2

BinConv

BinConvs

BinConv, S/2

BinConv

BinConvs

Information

Restriction

Information

Recovery

GlobalAvgPool

Output

7*7Conv, S/2

Last Feature Maps

Fused Information

Shallow Feature Maps

Masked Input

Fig. 2. Illustration of IR2Net. ResNet-18 is used as an example backbone. Batch-normalization layer (BN), nonlinear layer, and shortcut are omitted for

simplicity. The output feature maps before downsampling layers and of the penultimate layer are selected for information recovery to retain more information

with less computational cost.

complexity. Moreover, the trade-off between accuracy and efﬁ-

ciency is pursued by adjusting the hyperparameters introduced

in IR2Net, i.e., to achieve better accuracy with comparable

computational cost, or to obtain comparable accuracy with less

computation complexity.

B. Efﬁcient Architecture Design

The main point of this line is to design compact archi-

tecture for model compression and acceleration. AlexNet [1]

introduces group convolution to overcome the GPU memory

constraints by partitioning input feature channels into mutually

exclusive groups for convolution independently. However,

group operation blocks the information interaction between

different groups, so ShufﬂeNet [13] introduces channel shufﬂe

operation on top of group convolution to maintain the con-

nections between groups. IGCNets [42] uses two successive

interleaved group convolutions to achieve complementarity.

Xception [43] proposes a depth-separable convolution, which

factorizes a standard convolution into depthwise convolu-

tion and pointwise convolution. MobileNet [12] uses depth-

separable convolution to lighten the network. Based on the

similarity between feature maps, GhostNet [14] introduces

the Ghost module to replace the conventional convolution to

build compact neural networks. The approach along this line

is orthogonal to the binarization method, whereas inspired

by the lightweight structure design, we propose the compact

information recovery method to empower BNNs with the

ability to trade-off accuracy and efﬁciency while reducing the

extra computational cost.

III. PRELIMINARIES

In full-precision convolutional neural networks, the basic

operation can be formalized as:

z=ωr⊗Ar(1)

where ωrindicates the real-valued weight, Aris the real-

valued input activation, and ⊗the real-valued convolution.

During the inference, the real-valued convolution operation

contains a large number of ﬂoating-point operations and is

computationally intensive. Network binarization aims to rep-

resent weights and activations with only 1-bit. By constraining

the weights and activations to {-1, +1}, the convolution

operations can be implemented using efﬁcient XNOR and

bitcount, which is given as follows:

ωb=sign(ωr), Ab=sign(Ar)

z=ωb⊕Ab

(2)

where ωband Abdenote the binary weight and input activation,

respectively, and ⊕the binary convolution. sign(·)is the

binarization function, which is used to convert the real-valued

weights and activations into binary ones, and the function takes

the form as:

sign(x) = +1, if x ≥0

−1, otherwise (3)

Usually, binarization causes performance degradation and

most methods [23], [24], [34], [35], [37], [44] introduce real-

valued scaling factors to reduce the quantization error and the

binary convolution operation is replaced as:

z=αβ(ωb⊕Ab)(4)

where αand βare the scaling factors for the weights and acti-

vations, respectively (which may not be used simultaneously).

Unlike these methods, in this paper, considering the property

of the limited representational capability, we optimize BNNs

via information restriction and information recovery, so that

the scaling factors can be safely removed (although they could

also be retained for compatibility with existing optimization

methods).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PREPRINT1IR2Net:InformationRestrictionandInformationRecoveryforAccurateBinaryNeuralNetworksPingXue,YangLu,JingfeiChang,XingWei,andZhenWeiAbstractWeightandactivationbinarizationcanefcientlycompressdeepneuralnetworksandacceleratemodelinference,butcausesevereaccuracydegradation.Existingoptimizationme...

展开>> 收起<<

PREPRINT 1 IR2Net Information Restriction and Information Recovery for Accurate Binary Neural Networks.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

PREPRINT 1 IR2Net Information Restriction and Information Recovery for Accurate Binary Neural Networks

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: