is the classical sparse data modeling that, as shown in a pioneering work of [
2
], can easily discover
meaningful structures from natural image patches. Backed by its ability in learning interpretable
representations and strong theoretical guarantees [
3
,
4
,
5
,
6
,
7
,
8
] (e.g. for handling corrupted
data), sparse modeling has been used broadly in many signal and image processing applications
[
9
]. However, the empirical performance of sparse methods have been surpassed by deep learning
methods for classification of modern image datasets.
Because of the complementary benefits of sparse modeling and deep learning, there exist many
efforts that leverage sparse modeling to gain theoretical insights into ConvNets and/or to develop
computational methods that further improve upon existing ConvNets. One of the pioneering works is
[
10
] which interpreted a ConvNet as approximately solving a multi-layer convolutional sparse coding
model. Based on this interpretation, the work [
10
] and its follow-ups [
11
,
12
,
13
,
14
] presented
alternative algorithms and models in order to further enhance the practical performance of such
learning systems. However, there has been no empirical evidence that such systems can handle modern
image datasets such as ImageNet and obtain comparable performance to deep learning. The only
exception to the best of our knowledge is the work of [
15
,
16
] which exhibited a performance on par
to (on ImageNet) or better than (on CIFAR-10) ResNet. However, the method in [
15
,
16
] 1) requires a
dedicated design of network architecture that may limit its applicability, 2) is computationally orders
of magnitude slower to train, and 3) does not demonstrate benefits in terms of interpretability and
robustness. In a nutshell, sparse modeling is yet to demonstrate practicality that enables its broad
applications.
Paper contributions.
In this paper, we revisit sparse modeling for image classification and demon-
strate through a simple design that sparse modeling can be combined with deep learning to obtain
performance on par with standard ConvNets but with better layer-wise interpretability and stability.
Our method encapsulates the sparse modeling into an implicit layer [
17
,
18
,
19
] and uses it as a
drop-in replacement for any convolutional layer in standard ConvNets. The layer implements the
convolutional sparse coding (CSC) model of [
20
], and is referred to as a CSC-layer, where the input
signal is approximated by a sparse linear combination of atoms from a convolutional dictionary. Such
a convolutional dictionary is treated as the parameters of the CSC-layer that are amenable to training
via back-propagation. Then, the overall network with the CSC-layers may be trained in an end-to-end
fashion from labeled data by minimizing the cross-entropy loss as usual. This paper demonstrates
that such a learning framework has the following benefits:
•Performance on standard datasets.
We demonstrate that our network obtains better (on CIFAR-
100) or on par (on CIFAR-10 and ImageNet) performance with similar training time compared with
standard architectures such as ResNet [
21
]. This provides the first evidence on the strong empirical
performance of sparse modeling for deep learning to the best of our knowledge. Compared
to previous sparse methods [
15
] that obtained similar performance, our method is of orders of
magnitude faster.
•Robustness to input perturbations.
The stable recovery property of sparse convolution model
equips the CSC-layers with the ability to remove perturbation in the layer input and to recover clean
sparse code. As a result, our networks with CSC-layers are more robust to perturbations in the
input images compared with classical neural networks. Unlike existing approaches for obtaining
robustness that require heavy data augmentation [
22
] or additional training techniques [
23
], our
method is light-weight and does not require modifying the training procedure at all.
2 Related Work
Implicit layers.
The idea of trainable layers defined from implicit functions can be traced back at
least to the work of [
24
]. Recently, there is a revival of interests in implicit layers [
17
,
18
,
19
,
25
,
26
,
27
,
28
,
29
] as an attractive alternative to explicit layers in existing neural networks. However, a
majority of the cited works above define an implicit layer by a fixed point iteration, typically motivated
from existing explicit layers such as residual layers, therefore they do not have clear interpretation
in terms of modeling of the layer input. Consequently, such models do not have the ability to deal
with input perturbations. The only exceptions are differentiable optimization layers [
30
,
17
,
18
,
31
]
that incorporate complex dependencies between hidden layers through the formulation of convex
optimization. Nevertheless, most of the above works focus on differentiating through the convex
optimization layers (such as disciplined parametrized programming [
18
]) without specializing in any
2