2 Related Work
Automatic Machine Learning (AutoML) is the cornerstone of discovering state-of-the-art model
designs without costing massive human efforts. We introduce four types of related works below.
Sample-based AutoML methods. Existing sample-based approaches explore the search space via
sampling candidate designs, which includes heuristic search algorithms, e.g., Simulated Annealing,
Bayesian Optimization approaches [
22
,
25
,
27
], evolutionary- [
28
,
29
] and reinforcement-based
methods [
5
,
30
,
8
]. However, they tend to train thousands of models from scratch, which results in the
low sample efficiency. For example, [
5
,
8
] usually involve training hundreds of GPUs for several days,
hindering the development of AutoML in real-world applications [
3
]. Some hyper-parameter search
methods aim to reduce the computational cost. For example, Successive Halving [
31
] allocates the
training resources to more potentially valuable models based on the early-stage training information.
Li et al. [
32
] further extend it using different budgets to find the best configurations to avoid the
trade-off between selecting the configuration number and allocating the budget. Jaderberg et al. [
33
]
combine parallel search and sequential optimisation methods to conduct fast search. However, their
selective mechanisms are only based on the model performance and lack of deep knowledge, which
draws less insight into the relation of design variables and limits the sample efficiency.
One-shot AutoML methods. The one-shot approaches [
1
,
2
,
34
,
3
,
35
] have been popular for the
high search efficiency. Specifically, they involve training a super-net representing the design space,
i.e., containing every candidate design, and shares the weights for the same computational cell.
Nevertheless, weight sharing degrades the reliability of design ranking, as it fails to reflect the true
performance of the candidate designs [36].
Graph-based AutoML methods. The key insight of our work is to construct the design space as a
design graph, where nodes are candidate designs and edges denote design similarities, and deploy a
Graph Neural Network, i.e., meta-GNN, to predict the design performance. Graph HyperNetwork [
37
]
directly generates weights for each node in a computation graph representation. [
21
] study network
generators that output relational graphs and analyze the link between their predictive performance
and the graph structure. Recently, [
38
] considers both the micro- (i.e., a single block) and macro-
architecture (i.e., block connections) of each design in graph domain. AutoGML [
39
] designs a
meta-graph to capture the relations among models and graphs and take a meta-learning approach to
estimate the relevance of models to different graphs. Notably, none of these works model the search
space as a design graph.
Design performance predictor. Previous works predict the performance of a design using the
learning curves [
40
], layer-wise features [
41
], computational graph structure [
37
,
25
,
42
,
27
,
43
,
44
],
or combining dataset information [
44
] via a dataset encoder. To highlight, FALCON explicitly
models the relations among model designs. Moreover, it leverages the performance information on
training instances to provide task-specific information besides the design features, which is differently
motivated compared with [
45
] that employs meta-learning techniques and incorporate hardware
features to rapidly adapt to unseen devices. Besides, meta-GNN is applicable for both images and
graphs, compared with [44].
3 Proposed Method
This section introduces our proposed approach FALCON for sample-based AutoML. In Section 3.1,
we introduce the construction of design graph, and formulate the AutoML goal as a search on the
design graph for the node with the best task performance. In Section 3.2, we introduce our novel
neural predictor consisting of a task-agnostic module and a task-specific module, which predicts the
performances of unknown designs. Finally, we detail our search strategy in Section 3.3. We refer the
reader to Figure 1 (b) for a high-level overview of FALCON.
3.1 Design Space as a Graph
Motivation. Previous works generally consider each design choice as isolated from other designs.
However, it is often observed that some designs that share the same design features, e.g., graph neural
networks (GNNs) that are more than 3 layers and have batch normalization layers, may have similar
performances. Moreover, the inductive bias of the relations between design choices can provide
valuable information for navigating the design space for the best design. For example, suppose we
find that setting batch normalization of a 3-layer GCN [
46
] and a 4-layer GIN [
47
] to false both
3