Theoretical contributions
. In this work, we systematically investigate how to design expressive
Lipschitz neural networks (w.r.t.
`∞
-norm) through the novel lens of representing discrete Boolean
functions, which provides a deep understanding on the aforementioned problems. Specifically, we
first figure out a fundamental limitation of standard Lipschitz networks in representing a class of
logical operations called symmetric Boolean functions (SBF), which comprises the basic logical
AND/OR as special cases. We prove that for any non-constant SBF of
d
variables, there exists a finite
dataset of size
O(d)
such that the certified
`∞
robust radius must vanish as
O(1/d)
for any classifier
induced by a standard Lipschitz network. Remarkably, since logical AND/OR operations correspond
to perhaps the most basic classifiers, our result indicates an intrinsic difficulty of such networks in
fitting high-dimensional real-world datasets with guaranteed certified `∞robustness.
Our analysis can be readily extended into the Lipschitz function approximation setting. We point out
the relationship between monotonic SBF and the order statistics (which are 1-Lipschitz functions),
and then prove that any
d
-dimensional order statistic (including the max/min function) on a compact
domain cannot be approximated by standard Lipschitz networks with error
O(1 −1/d)
, regardless
of the network size. This impossibility result is significant in that:
(i)
it applies to all Lipschitz
activations (thus extending prior works [
2
,
24
]),
(ii)
it resolves an open problem raised recently in
[43], and (iii) aquantitative lower bound of approximation error is established.
Equipped by the above impossibility results, we proceed to examine two advanced Lipschitz ar-
chitectures: the GroupSort network [
2
] and the recently proposed
`∞
-distance net [
73
,
74
]. We
find that besides the linear operation, both networks incorporate other Lipschitz aggregation oper-
ations into the neuron design, especially the order statistic functions, thus shedding light on how
they work. However, for the MaxMin network [
2
] — a computationally efficient version of the
GroupSort network implemented in practice, representing Boolean functions and order statistics
is possible only when the network is very deep. In particular, we prove that representing certain
d
-dimensional Boolean functions requires a depth of
Ω(d)
, implying that shallow MaxMin networks
are not Lipschitz-universal function approximators. In contrast, we show a two-layer
`∞
-distance net
suffices to represent any order statistic function on a compact domain or even all Boolean functions.
This strongly justifies the empirical success of
`∞
-distance net over GroupSort (MaxMin) networks.
Practical contributions
. Our theoretical insights can also guide in designing better Lipschitz network
architectures. Inspired by the importance of order statistics, we propose a general form of Lipschitz
network, called SortNet, that extends both GroupSort and
`∞
distance networks and incorporates
them into a unified framework. Yet, the full-sort operation is computationally expensive and leads to
optimization difficulties (as with the GroupSort network). We further propose a specialized SortNet
that can be efficiently trained, by assigning each weight vector
w
using geometric series, i.e.
wi
proportional to
ρi
for some
0≤ρ < 1
. This leads to a restricted version of SortNet but still covers
`∞
-distance net as a special case. For this particular SortNet, we skillfully derive a stochastic
estimation that gives an unbiased approximation of the neuron output without performing sorting
operations explicitly. This eventually yields an efficient training strategy with similar cost as training
standard networks, thus making certified robust training free. Extensive experiments demonstrate that
the proposed SortNet is scalable, efficient, and consistently achieves better certified robustness than
prior Lipschitz networks across multiple datasets and perturbation radii. In particular, our approach
even scales on a variant of ImageNet, and surpasses the best-known result [
69
] with a 22-fold decrease
in training time thanks to our “free” certified training approach.
The contribution and organization of this paper can be summarized as follows:
•
We develop a systematic study for the expressive power of Lipschitz neural networks using
the tools of Boolean function theory. We prove the impossibility results of standard Lipschitz
networks in two settings: a) certified
`∞
robustness on discrete datasets (Section 3.2); b)
Lipschitz function approximation (Section 3.3).
•
We provide insights into how recently proposed networks can bypass the impossibility results.
In particular, we show that a two-layer
`∞
-distance net can precisely represent any Boolean
functions, while shallow GroupSort networks cannot (Section 3.4).
•
We propose SortNet, a Lipschitz network that generalizes GroupSort and
`∞
-distance net. For
a special type of SortNet, we derive a stochastic training approach that bypasses the difficulties
in calculating sorting operations explicitly and makes certified training free (Section 4).
•
Extensive experiments demonstrate that SortNet exhibits better certified robustness on several
benchmark datasets over baseline methods with high training efficiency (Section 5).
2