Efficient Automatic Machine Learning via Design Graphs Shirley Wu Jiaxuan You Jure Leskovec Rex Ying

2025-05-03 0 0 4.18MB 20 页 10玖币
侵权投诉
Efficient Automatic Machine Learning via Design
Graphs
Shirley Wu Jiaxuan You Jure Leskovec Rex Ying
Department of Computer Science, Stanford University
{shirwu, jiaxuan, jure, rexy}@cs.stanford.edu
Abstract
Despite the success of automated machine learning (AutoML), which aims to find
the best design, including the architecture of deep networks and hyper-parameters,
conventional AutoML methods are computationally expensive and hardly provide
insights into the relations of different model design choices. To tackle the chal-
lenges, we propose FALCON, an efficient sample-based method to search for the
optimal model design. Our key insight is to model the design space of possible
model designs as a design graph, where the nodes represent design choices, and
the edges denote design similarities. FALCON features 1) a task-agnostic module,
which performs message passing on the design graph via a Graph Neural Network
(GNN), and 2) a task-specific module, which conducts label propagation of the
known model performance information on the design graph. Both modules are com-
bined to predict the design performances in the design space, navigating the search
direction. We conduct extensive experiments on 27 node and graph classification
tasks from various application domains, and an image classification task on the
CIFAR-10 dataset. We empirically show that FALCON can efficiently obtain the
well-performing designs for each task using only 30 explored nodes. Specifically,
FALCON has a comparable time cost with the one-shot approaches while achieving
an average improvement of 3.3% compared with the best baselines.
1 Introduction
Automated machine learning (AutoML) [
1
,
2
,
3
,
4
,
5
,
6
,
7
,
8
,
9
,
10
] has demonstrated great success in
various domains including computer vision [
11
,
12
,
13
], language modeling [
5
,
14
], and recommender
systems [
15
]. It is an essential component for the state-of-the-art deep learning models [
16
,
17
,
18
,
19
].
Given a machine learning task, e.g., a node/graph classification task on graphs or an image classifica-
tion task, our goal of AutoML is to search for a model architecture and hyper-parameter setting from
a design space that results in the best test performance on the task. Following previous works [
9
],
we define design as a set of architecture and hyper-parameter choices (e.g., 3 layer, 64 embedding
dimensions, batch normalization, skip connection between consecutive layers), and define design
space as the space of all possible designs for a given task.
However, AutoML is very computationally intensive. The design space of interest often involves
millions of possible designs [
20
,
21
]. Sample-based AutoML [
5
,
8
,
22
,
23
,
24
] has been used
to perform search via sampling candidate designs from the design space to explore. One central
challenge of existing sample-based AutoML solutions is its sample efficiency: it needs to train as few
models as possible to identify the best-performing model in the vast design space. To improve the
efficiency, existing research focuses on developing good search algorithms to navigate in the design
space [25, 26, 27].
However, these methods do not consider modeling the effect of model design choices, which provides
strong inductive biases in searching for the best-performing model. For example, human often relies
on two types of intuitions when designing architectures and hyper-parameters: (1) boundedness:
NeurIPS 2022 New Frontiers in Graph Learning Workshop (NeurIPS GLFrontiers 2022).
arXiv:2210.12257v2 [cs.LG] 23 May 2023
Inference Exploration
Sample
0.78
0.88
0.75
0.92
= 0.95
0.82
0.80
(b) Search strategy of our method
Navigation
(a) Design graph example
#pre #mp #post stage agg
[1, 2] [2, 4, 6, 8] [2, 3] skipconcat
skipsum
min, max, add
Figure 1: Overview of FALCON. (a) Design graph example. We present a design graph on
TU-COX2 graph classification dataset. The design choices are shown in the table, #pre, #mp, #post
denotes the numbers of pre-processing, message passing, and post-processing layers, respectively.
The better design performance, the darker node colors. (b) FALCON search strategy. Red: Explored
nodes. Green: Candidate nodes to be sampled from. Blue: The best node. Gray: Other nodes.
Locally, FALCON extends the design subgraph via a search strategy detailed in Section 3.3. Globally,
FALCON approaches the optimal design navigated by the inductive bias of the design relations.
“higher embedding dimensions result in higher performance until reaching a threshold”; (2) depen-
dency: “using a large number of layers without batch normalization degrades the performance” to
find the best design. Thus, an efficient search strategy should rapidly rule out a large subset of the
design space leveraging such learned inductive bias.
Proposed approach. To overcome the limitations, we propose FALCON, an AutoML framework
that achieves state-of-the-art sample efficiency and performance by leveraging model design insights.
Our key insight is to build a design graph over the design space of architecture and hyper-parameter
choices. FALCON extracts model design insights by learning a meta-model that captures the relation
between the design graph and model performance and uses it to inform a sample-efficient search
strategy. FALCON consists of the following two novel components.
Design space as a graph. Previous works view the model design space as a high-dimensional space
with isolated design choices [
9
], which offer few insights regarding the relations between different
design choices. For example, through trial runs if we find the models with more than 3 layers do
not work well without batch normalization, this knowledge can help us reduce the search space by
excluding all model designs of more than 3 layers with batch normalization set to false. While such
insights are hardly obtained with existing AutoML algorithms [
1
,
2
,
8
,
5
,
6
], FALCON achieves it via
constructing a graph representation, design graph, among all the design choices. Figure 1(a) shows a
visualization of a design graph, where each node represents a candidate design, and edges denote the
similarity between the designs. See Section 3.1 for details on the similarity and graph construction.
Search by navigating on the design graph. Given the design graph, FALCON deploys a Graph Neu-
ral Network predictor, short for meta-GNN, which is supervised by the explored nodes’ performances
and learns to predict the performance of a specific design given the corresponding node in the design
graph. The meta-GNN is designed with 1) a task-agnostic module, which performs message passing
on the design graph, and 2) a task-specific module, which conducts label propagation of the known
model performance information on the design graph. Furthermore, we propose a search strategy that
uses meta-GNN predictions to navigate the search in the design graph efficiently.
Experiments. Without loss of generality, we mainly focus on AutoML for graph representation
learning in this paper. We conduct extensive experiments on 27 graph datasets, covering node- and
graph-level tasks with distinct distributions. Moreover, FALCON can work well on other datasets
such as the CIFAR-10 image dataset. Our code is available at
https://github.com/Wuyxin/
FALCON-GLFrontiers.
2
2 Related Work
Automatic Machine Learning (AutoML) is the cornerstone of discovering state-of-the-art model
designs without costing massive human efforts. We introduce four types of related works below.
Sample-based AutoML methods. Existing sample-based approaches explore the search space via
sampling candidate designs, which includes heuristic search algorithms, e.g., Simulated Annealing,
Bayesian Optimization approaches [
22
,
25
,
27
], evolutionary- [
28
,
29
] and reinforcement-based
methods [
5
,
30
,
8
]. However, they tend to train thousands of models from scratch, which results in the
low sample efficiency. For example, [
5
,
8
] usually involve training hundreds of GPUs for several days,
hindering the development of AutoML in real-world applications [
3
]. Some hyper-parameter search
methods aim to reduce the computational cost. For example, Successive Halving [
31
] allocates the
training resources to more potentially valuable models based on the early-stage training information.
Li et al. [
32
] further extend it using different budgets to find the best configurations to avoid the
trade-off between selecting the configuration number and allocating the budget. Jaderberg et al. [
33
]
combine parallel search and sequential optimisation methods to conduct fast search. However, their
selective mechanisms are only based on the model performance and lack of deep knowledge, which
draws less insight into the relation of design variables and limits the sample efficiency.
One-shot AutoML methods. The one-shot approaches [
1
,
2
,
34
,
3
,
35
] have been popular for the
high search efficiency. Specifically, they involve training a super-net representing the design space,
i.e., containing every candidate design, and shares the weights for the same computational cell.
Nevertheless, weight sharing degrades the reliability of design ranking, as it fails to reflect the true
performance of the candidate designs [36].
Graph-based AutoML methods. The key insight of our work is to construct the design space as a
design graph, where nodes are candidate designs and edges denote design similarities, and deploy a
Graph Neural Network, i.e., meta-GNN, to predict the design performance. Graph HyperNetwork [
37
]
directly generates weights for each node in a computation graph representation. [
21
] study network
generators that output relational graphs and analyze the link between their predictive performance
and the graph structure. Recently, [
38
] considers both the micro- (i.e., a single block) and macro-
architecture (i.e., block connections) of each design in graph domain. AutoGML [
39
] designs a
meta-graph to capture the relations among models and graphs and take a meta-learning approach to
estimate the relevance of models to different graphs. Notably, none of these works model the search
space as a design graph.
Design performance predictor. Previous works predict the performance of a design using the
learning curves [
40
], layer-wise features [
41
], computational graph structure [
37
,
25
,
42
,
27
,
43
,
44
],
or combining dataset information [
44
] via a dataset encoder. To highlight, FALCON explicitly
models the relations among model designs. Moreover, it leverages the performance information on
training instances to provide task-specific information besides the design features, which is differently
motivated compared with [
45
] that employs meta-learning techniques and incorporate hardware
features to rapidly adapt to unseen devices. Besides, meta-GNN is applicable for both images and
graphs, compared with [44].
3 Proposed Method
This section introduces our proposed approach FALCON for sample-based AutoML. In Section 3.1,
we introduce the construction of design graph, and formulate the AutoML goal as a search on the
design graph for the node with the best task performance. In Section 3.2, we introduce our novel
neural predictor consisting of a task-agnostic module and a task-specific module, which predicts the
performances of unknown designs. Finally, we detail our search strategy in Section 3.3. We refer the
reader to Figure 1 (b) for a high-level overview of FALCON.
3.1 Design Space as a Graph
Motivation. Previous works generally consider each design choice as isolated from other designs.
However, it is often observed that some designs that share the same design features, e.g., graph neural
networks (GNNs) that are more than 3 layers and have batch normalization layers, may have similar
performances. Moreover, the inductive bias of the relations between design choices can provide
valuable information for navigating the design space for the best design. For example, suppose we
find that setting batch normalization of a 3-layer GCN [
46
] and a 4-layer GIN [
47
] to false both
3
degrade the performance. Then we can reasonably infer that a 3-layer GraphSAGE [
48
] with batch
normalization outperforms the one without. We could leverage such knowledge and only search
for the designs that are more likely to improve the task performance. To the best of our knowledge,
FALCON is the first method to explicitly consider such relational information among model designs.
Design graph. We denote the design graph as
G(N,E)
, where the nodes
N
include the candidate
designs, and edges Edenote the similarities between the candidate designs. Specifically, we use the
notion of design distance to decide the graph connectivity, and we elaborate on them below.
Design distance. For each numerical design dimension, two design choices have a distance
1
if they
are adjacent in the ordered list of design choices. For example, if the hidden dimension size can take
values
[16,32,64,128]
, then the distance between
16
and
32
is
1
, and the distance between
32
and
128
is
2
. For each categorical design dimension, any two distinct design choices have a distance
1
.
We then define the connectivity of the design graph in terms of the design distance:
Definition 1 (Design Graph Connectivity).The design graph can be expressed as
G(N,E)
, where
the nodes
N={d1, . . . , dn}
are model designs, and
(di, dj)∈ E
iff the design distance between
di
and djis 1.
Structure of the design graph. The definition of edges implies that the design graph is highly
structured, with the following properties: (1) All designs that are the same except for one categorical
design dimension form a clique subgraph. (2) All designs that are the same except
k
numerical
design dimensions form a grid graph structure. Moreover, we use a special calculation for the design
distance with a combination of design dimensions that have dependencies. For example, the design
dimensions of pooling operations, pooling layers, and the number of layers can depend on each other,
thus the design graph structure becomes more complex. See the details in Appendix A.2.
Design subgraph. The design graph may contain millions of nodes. Therefore, directly applying
the meta-model to the design graph is computationally intensive. Moreover, a reliable performance
estimation for an unknown node depends on its similarity between the nodes already explored by the
search algorithm. Therefore, we focus on using a meta-model to predict performance for a dynamic
subgraph, i.e., design subgraph, containing the explored nodes in the current search stage and the
candidate nodes to be sampled in the next step. The candidate set can be constructed by selecting the
multi-hop neighbors of explored nodes on the design graph. The design subgraph is defined as:
Definition 2 (Design Subgraph).During a search, suppose the explored node set is
Ne
and the
candidate set is
Nc
. The design subgraph is formulated as
Gs(Ns,Es)
, where
Ns=Ne∪ Nc
are the
nodes and Es={(u, v)|u∈ Ns, v ∈ Ns,(u, v) N } are the edges.
Given the design subgraph, we formulate the AutoML problem as searching for the node, i.e., design
choice, with the best task performance.
3.2 Meta-GNN for Performance Prediction
Here we introduce a meta-model, named meta-GNN, to predict the performance of model designs,
i.e., nodes of the design subgraph. The goal of meta-GNN is learning the inductive bias of design
relations, which is used to navigate the search path on the design graph. As is illustrated in Figure 2,
the meta-GNN comprises a task-agnostic module and a task-specific module, used to capture the
knowledge of model design and task performance, respectively.
Task-agnostic module. The task-agnostic module uses a design encoder to encode the design
features on nodes of the design subgraph, and a relation encoder to capture the design similarities and
differences on edges of the design subgraph. After that, it performs message passing on the design
subgraph. We introduce each component below:
Design encoder: it computes the node features of design subgraph by the concatenation of the
feature encoding of each design dimension. For numerical design dimensions,we conduct min-max
normalization on their values as the node features. For categorical design dimensions such as
aggregation operator which takes one of (SUM, MAX, MEAN), we encode it as a one-hot feature.
Relation encoder: it captures the similarity relationships between the connecting designs. For each
(di, dj)∈ E, we encode the design dimension where diand djdiffer by a one-hot encoding.
4
摘要:

EfficientAutomaticMachineLearningviaDesignGraphsShirleyWuJiaxuanYouJureLeskovecRexYingDepartmentofComputerScience,StanfordUniversity{shirwu,jiaxuan,jure,rexy}@cs.stanford.eduAbstractDespitethesuccessofautomatedmachinelearning(AutoML),whichaimstofindthebestdesign,includingthearchitectureofdeepnetwork...

展开>> 收起<<
Efficient Automatic Machine Learning via Design Graphs Shirley Wu Jiaxuan You Jure Leskovec Rex Ying.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:4.18MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注