
NAS-based Recursive Stage Partial Network (RSPNet) for Light-Weight Semantic
Segmentation
Anonymous Authors
Abstract
Current NAS-based semantic segmentation methods focus
on accuracy improvements rather than light weight design.
In this paper, we propose a two-stage framework to design
our NAS-based RSPNet model for light-weight semantic seg-
mentation. The first architecture search determines the inner
cell structure, and the second architecture search considers
exponentially growing paths to finalize the outer structure of
the network. It was shown in the literature that the fusion
of high- and low-resolution feature maps produces stronger
representations. To find the expected macro structure with-
out manual design, we adopt a new path-attention mecha-
nism to efficiently search for suitable paths to fuse useful
information for better segmentation. Our search for repeat-
able micro-structures from cells leads to a superior network
architecture in semantic segmentation. In addition, we pro-
pose an RSP (recursive Stage Partial) architecture to search
a light-weight design for NAS-based semantic segmentation.
The proposed architecture is very efficient, simple, and effec-
tive that both the macro- and micro- structure searches can be
completed in five days of computation on two V100 GPUs.
The light-weight NAS architecture with only 1/4 parameter
size of SoTA architectures can achieve SoTA performance on
semantic segmentation on the Cityscapes dataset without us-
ing any backbones.
Introduction
Network Architecture Search (NAS) (Elsken, Metzen, and
Hutter 2019b) is a computational approach for automating
the optimization of the neural architecture design. As deep
learning has been widely used for medical image segmenta-
tion, the most common deep networks used in practice are
still designed manually. In this work, we focus on applying
NAS for medical image segmentation as the targeted appli-
cation. To optimize the NAS that looks for the best archi-
tecture for image segmentation, the search task can be de-
composed into three parts: (i) a supernet to generate all pos-
sible architecture candidates, (ii) a global search of neural
architecture paths from the supernet, and (iii) a local search
of the cell architectures, namely operations including the
conv/deconv kernels and the pooling parameters. The NAS
space to explore is exponentially large w.r.t. the number of
generated candidates, the paths between nodes, the number
Copyright © 2022, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
of depths, and the available cell operations to choose from.
The computational burden of NAS for image segmentation
is much higher than other tasks such as image classification,
so each architecture verification step takes longer to com-
plete. As a result, there exist fewer NAS methods that work
successfully for image segmentation. In addition, none is
designed for light-weight semantic segmentation, which is
very important for AV(Automobile Vehicle)-related applica-
tions.
The main challenge of NAS is on how to deal with the
exponentially large search space when exploring and eval-
uating neural architectures. We tackle this problem based
on a formulation regarding what needs to be considered
in priority and how to effectively reduce search complex-
ity. Most segmentation network designs (Ronneberger, Fis-
cher, and Brox 2015; Fourure et al. 2017; Weng et al. 2019;
Liu et al. 2019) use U-Nets to achieve better accuracies
in image segmentation. For example, AutoDeepLab (Liu
et al. 2019) designs a level-based U-Net as the supernet
whose search space grows exponentially according to its
level number Land depth parameter D. Joining the search
for network-level and cell-level architectures creates huge
challenges and inefficiency in determining the best architec-
ture. To avoid exponential growth in the cell search space,
only one path in AutoDeepLab is selected and sent to the
next node. This limit is unreasonable since more input to the
next node can generate richer features for image segmenta-
tion.
A “repeatable” concept is adopted in this paper to con-
struct our model. Similar to repeatable cell architecture de-
sign, our model contains repeated units that share the same
structure. The proposed model architecture for image seg-
mentation is shown in Fig. 1. It is based on differential learn-
ing and is as efficient as the DARTS (Differentiable AR-
chiTecture Searching) method (Liu, Simonyan, and Yang
2019), compared to the other NAS methods based on RL and
EV. In addition, we modify the concept of CSPNet (Wang
et al. 2020) to recursively use only half of the channels to
pass through the cell we searched for. The RSPNet (Re-
cursive Stage Partial Network) makes our search procedure
much more efficient and results in a light weight architec-
ture for semantic segmentation. The proposed architecture is
simple, efficient, and effective in image segmentation. Both
the macro- and micro- structure searches can be completed
arXiv:2210.00698v1 [cs.CV] 3 Oct 2022