
Table 2: Overview of ZC proxy evaluations in
NAS-Bench-Suite-Zero
.
∗
Note that EPE-NAS is
only defined for classification tasks [20].
Search space Tasks Num. ZC proxies Num. architectures Total ZC proxy evaluations
NAS-Bench-101 1 13 10 000 130 000
NAS-Bench-201 3 13 15 625 609 375
NAS-Bench-301 1 13 11 221 145 873
TransNAS-Bench-101-Micro 7 12∗3 256 273 504
TransNAS-Bench-101-Macro 7 12∗4 096 344 064
Add’l. 201, 301, TNB-Micro 9 13 600 23400
Total 28 13 44 798 1 526 216
while ProxyBO uses three, the algorithm dynamically chooses one in each iteration (so individual
predictions are made using a single ZC proxy at a time). Recently, NAS-Bench-Zero was introduced
[
2
], a new benchmark based on popular computer vision models ResNet [
12
] and MobileNetV2
[
30
], which includes 10 ZC proxies. However, the NAS-Bench-Zero dataset is currently not publicly
available. For more related work details, see Appendix B.
Only two prior works combine the information of multiple ZC proxies together in architecture
predictions [
1
,
2
] and both only use the voting strategy to combine at most four ZC proxies. Our
work is the first to publicly release ZC proxy values, combine ZC proxies in a nontrivial way, and
exploit the complementary information of 13 ZC proxies simultaneously.
3 Overview of NAS-Bench-Suite-Zero
In this section, we give an overview of the
NAS-Bench-Suite-Zero
codebase and dataset, which
allows researchers to quickly develop ZC proxies, compare against existing ZC proxies across diverse
datasets, and integrate them into NAS algorithms, as shown in Sections 4 and 5.
We implement all ZC proxies from Table 1 in the same codebase (
NASLib
[
29
]). For all ZC proxies,
we use the default implementation from the original work. While this list covers 13 ZC proxies,
the majority of ZC proxies released to date, we did not yet include a few other ZC proxies, for
example, due to requiring a trained supernetwork to make evaluations [
4
,
34
] (therefore needing to
implement a supernetwork on 28 benchmarks), implementation in TensorFlow rather than PyTorch
[
25
], or unreleased code. Our modular framework easily allows additional ZC proxies to be added to
NAS-Bench-Suite-Zero in the future.
To build
NAS-Bench-Suite-Zero
, we extend the collection of
NASLib
’s publicly available bench-
marks, known as NAS-Bench-Suite [
21
]. This allows us to evaluate and fairly compare all ZC
proxies in the same framework without confounding factors stemming from different implemen-
tations, software versions or training pipelines. Specifically, for the search spaces and tasks, we
use NAS-Bench-101 (CIFAR-10), NAS-Bench-201 (CIFAR-10, CIFAR-100, and ImageNet16-120),
NAS-Bench-301 (CIFAR-10), and TransNAS-Bench-101 Micro and Macro (Jigsaw, Object Classifi-
cation, Scene Classification, Autoencoder) from NAS-Bench-Suite. We add the remaining tasks from
TransNAS-Bench-101 (Room Layout, Surface Normal, Semantic Segmentation), and three tasks each
for NAS-Bench-201, NAS-Bench-301, and TransNAS-Bench-101-Micro: Spherical-CIFAR-100,
NinaPro, and SVHN. This yields a total of 28 benchmarks in our analysis. For all NAS-Bench-201
and TransNAS-Bench-101 tasks, we evaluate all ZC proxy values and the respective runtimes, for
all architectures. For NAS-Bench-301, we evaluate on all 11 221 randomly sampled architectures
from the NAS-Bench-301 dataset, due to the computational infeasibility of exhaustively evaluating
the full set of
1018
architectures. Similarly, we evaluate 10 000 architectures from NAS-Bench-101.
Finally, for Spherical-CIFAR-100, NinaPro, and SVHN, we evaluate 200 architectures per search
space, since only 200 architectures are fully trained for each of these tasks. See Table 2.
We run all ZC proxies from Table 1 on Intel Xeon Gold 6242 CPUs and save their evaluations in
order to create a queryable table with these pre-computed values. We use a batch size of 64 for all ZC
proxy evaluations, except for the case of TransNAS-Bench-101: due to the extreme memory usage of
the Taskonomy tasks (
>30
GB memory), we used a batch size of 32. The total computation time for
all 1.5M evaluations was 1100 CPU hours.
4