2 PREPRINT
them and the full-precision counterparts; whereas Group-Net
[25] and MeliusNet [26] achieve comparable accuracy to that
of full-precision networks, but they introduce a noticeable
additional computational cost, which significantly offsets the
advantages of network binarization. Therefore, one of the
motivations for this work is to strike a better trade-off between
the accuracy and computational complexity for BNNs.
In addition, the performance degradation of BNNs is mainly
caused by their limited representational capability. BNNs
represent weights and activations with 1-bit, which means the
theoretical representation precision is only 1/231 compared to
the full-precision counterparts. The limited representational
capability leads to two drawbacks in BNNs: limited data
information acceptance (i.e., learning ability) and severe infor-
mation loss during forward propagation. As shown in Figure
1, at level 4 of the attention maps [27], it can be seen that the
full-precision network can focus on much larger information
regions of interest (the highlighted regions of the attention
maps) than the BNN do, which is only able to accept limited
information; besides, the information loss during the forward
propagation of the BNN is also evident in the flow of the
attention maps from low to high levels. IR-Net [28] and BBG
[29] reduce the information loss in forward propagation by
balancing and normalizing the weights to achieve maximum
information entropy, which improves the network accuracy
to some extent. However, these methods do not consider the
limited information acceptance of BNNs, while they remain
significant accuracy degradation on large-scale datasets (e.g.,
ImageNet).
To solve the aforementioned problems, from the perspective
of the representational capability of BNNs themselves, we
propose IR2Net, a binarization approach to enhance BNNs via
restricting input information and recovering feature informa-
tion: 1) intuitively, different students (networks) have different
learning abilities, for those with strong learning abilities, more
information can be provided for their learning and refining,
whereas for those with weak learning abilities, discarding
redundant information is needed for better learning. IR2Net
introduces the information restriction method to restrict the
input information and regularize the networks, thus forces
BNNs to focus on the more critical information with their
limited learning abilities; (2) for information loss during for-
ward propagation in BNNs, IR2Net leverages the information
recovery method to fuse the shallow feature information with
the final feature information before the classifier (or other task-
specific modules) to fix the information loss and improve the
accuracy.
With the abovementioned designs, the proposed IR2Net can
effectively force BNNs to focus on important information,
defend against information loss in forward propagation, and
then achieve advanced performance and a good trade-off
between accuracy and efficiency on various networks and
datasets.
The main contributions can be summarized as follows.
1) We propose IR2Net, the first to mitigate the information
loss and the mismatch between learning ability and informa-
tion quantity from the perspective of the limited representa-
tional capability of BNNs caused by quantization.
2) An information restriction method is designed to restrict
the input information by the generated attention masks so that
the amount of input information matches the learning ability
of the network, and then the representational capability of the
network is fully utilized without introducing additional costs.
3) An information recovery method is proposed to resist
the information loss in forward propagation by fusing shallow
and deep information; a compact information recovery method
is also proposed to reduce additional computational cost and
empower the network to trade-off accuracy and computational
complexity.
4) Extensive experimental evaluations demonstrate that the
proposed IR2Net achieves new state-of-the-art performance on
both CIFRA-10 and ImageNet, and also has good versatility.
II. RELATED WORK
A. Network Binarization
The pioneering study of network binarization dates back
to BNN [30], which obtains comparable accuracy on small
datasets (including MNIST, SVHN [31], and CIFAR-10 [32]),
yet encounters severe performance degradation while on large-
scale datasets (e.g., ImageNet [33]). Therefore, substantial
research efforts are invested in minimizing the accuracy gap
between BNNs and full-precision ones. The Enhancement of
BNNs usually requires the introduction of additional compu-
tational effort. Some works focus on using a fractional amount
of real-valued operations in exchange for significant accuracy
gains. For instance, XNOR-Net [23] improves the performance
of BNNs on ImageNet to some extent by introducing real-
valued scaling factors. XNOR-Net++ [34] on top of this by
fusing the separated weights and activation scaling factors into
one, which is learned discriminatively via backpropagation.
Bi-Real Net [24] connects the real-valued activation of adja-
cent layers to enhance the network representational capability.
BBG [29] adds a gated module to the connection. Real-to-Bin
[35] obtains the activation scaling factors via SE [36]. RBNN
[37] further reduces the quantization error from the perspective
of intrinsic angular bias. Whereas some other works relax
the constraints on the additional computational complexity for
higher accuracy. ABC-Net [38] uses linear combinations of
multiple binary bases to approximate the real-valued weights
and activations. HORQ-Net [39] reduces the residual between
real-valued activations and binary activations by utilizing a
high-order approximation scheme. CBCN [40] enhances the
diversity of intermediate feature maps by rotating the weight
matrix. MeliusNet [26] designs Dense Block and Improvement
Block to improve the feature capability and quality, respec-
tively. Group-Net [25] and BENN [41] use multiple BNNs for
combination or ensemble to obtain significant improvement.
Although great progress has been made in the research
of BNNs, the existing methods either remain a significant
accuracy gap compared with full-precision networks, or in-
troduce a large amount of computation for comparable perfor-
mance, which largely offsets the advantages in compression
and acceleration and deviates from the original purpose of
network binarization. Therefore, IR2Net is proposed, aiming
at acquiring higher network accuracy with less computational