Efficient Automatic Machine Learning via Design Graphs Shirley Wu Jiaxuan You Jure Leskovec Rex Ying

2025-05-03 1 0 4.18MB 20 页 10玖币

侵权投诉

Efﬁcient Automatic Machine Learning via Design

Graphs

Shirley Wu Jiaxuan You Jure Leskovec Rex Ying

Department of Computer Science, Stanford University

{shirwu, jiaxuan, jure, rexy}@cs.stanford.edu

Abstract

Despite the success of automated machine learning (AutoML), which aims to ﬁnd

the best design, including the architecture of deep networks and hyper-parameters,

conventional AutoML methods are computationally expensive and hardly provide

insights into the relations of different model design choices. To tackle the chal-

lenges, we propose FALCON, an efﬁcient sample-based method to search for the

optimal model design. Our key insight is to model the design space of possible

model designs as a design graph, where the nodes represent design choices, and

the edges denote design similarities. FALCON features 1) a task-agnostic module,

which performs message passing on the design graph via a Graph Neural Network

(GNN), and 2) a task-speciﬁc module, which conducts label propagation of the

known model performance information on the design graph. Both modules are com-

bined to predict the design performances in the design space, navigating the search

direction. We conduct extensive experiments on 27 node and graph classiﬁcation

tasks from various application domains, and an image classiﬁcation task on the

CIFAR-10 dataset. We empirically show that FALCON can efﬁciently obtain the

well-performing designs for each task using only 30 explored nodes. Speciﬁcally,

FALCON has a comparable time cost with the one-shot approaches while achieving

an average improvement of 3.3% compared with the best baselines.

1 Introduction

Automated machine learning (AutoML) [

] has demonstrated great success in

various domains including computer vision [

], language modeling [

], and recommender

systems [

]. It is an essential component for the state-of-the-art deep learning models [

Given a machine learning task, e.g., a node/graph classiﬁcation task on graphs or an image classiﬁca-

tion task, our goal of AutoML is to search for a model architecture and hyper-parameter setting from

a design space that results in the best test performance on the task. Following previous works [

we deﬁne design as a set of architecture and hyper-parameter choices (e.g., 3 layer, 64 embedding

dimensions, batch normalization, skip connection between consecutive layers), and deﬁne design

space as the space of all possible designs for a given task.

However, AutoML is very computationally intensive. The design space of interest often involves

millions of possible designs [

]. Sample-based AutoML [

] has been used

to perform search via sampling candidate designs from the design space to explore. One central

challenge of existing sample-based AutoML solutions is its sample efﬁciency: it needs to train as few

models as possible to identify the best-performing model in the vast design space. To improve the

efﬁciency, existing research focuses on developing good search algorithms to navigate in the design

space [25, 26, 27].

However, these methods do not consider modeling the effect of model design choices, which provides

strong inductive biases in searching for the best-performing model. For example, human often relies

on two types of intuitions when designing architectures and hyper-parameters: (1) boundedness:

NeurIPS 2022 New Frontiers in Graph Learning Workshop (NeurIPS GLFrontiers 2022).

arXiv:2210.12257v2 [cs.LG] 23 May 2023

Inference Exploration

Sample

0.78

0.88

0.75

0.92

�

= 0.95

0.82

0.80

(b) Search strategy of our method

Navigation

(a) Design graph example

#pre #mp #post stage agg

[1, 2] [2, 4, 6, 8] [2, 3] skipconcat

skipsum

min, max, add

Figure 1: Overview of FALCON. (a) Design graph example. We present a design graph on

TU-COX2 graph classiﬁcation dataset. The design choices are shown in the table, #pre, #mp, #post

denotes the numbers of pre-processing, message passing, and post-processing layers, respectively.

The better design performance, the darker node colors. (b) FALCON search strategy. Red: Explored

nodes. Green: Candidate nodes to be sampled from. Blue: The best node. Gray: Other nodes.

Locally, FALCON extends the design subgraph via a search strategy detailed in Section 3.3. Globally,

FALCON approaches the optimal design navigated by the inductive bias of the design relations.

“higher embedding dimensions result in higher performance until reaching a threshold”; (2) depen-

dency: “using a large number of layers without batch normalization degrades the performance” to

ﬁnd the best design. Thus, an efﬁcient search strategy should rapidly rule out a large subset of the

design space leveraging such learned inductive bias.

Proposed approach. To overcome the limitations, we propose FALCON, an AutoML framework

that achieves state-of-the-art sample efﬁciency and performance by leveraging model design insights.

Our key insight is to build a design graph over the design space of architecture and hyper-parameter

choices. FALCON extracts model design insights by learning a meta-model that captures the relation

between the design graph and model performance and uses it to inform a sample-efﬁcient search

strategy. FALCON consists of the following two novel components.

Design space as a graph. Previous works view the model design space as a high-dimensional space

with isolated design choices [

], which offer few insights regarding the relations between different

design choices. For example, through trial runs if we ﬁnd the models with more than 3 layers do

not work well without batch normalization, this knowledge can help us reduce the search space by

excluding all model designs of more than 3 layers with batch normalization set to false. While such

insights are hardly obtained with existing AutoML algorithms [

], FALCON achieves it via

constructing a graph representation, design graph, among all the design choices. Figure 1(a) shows a

visualization of a design graph, where each node represents a candidate design, and edges denote the

similarity between the designs. See Section 3.1 for details on the similarity and graph construction.

Search by navigating on the design graph. Given the design graph, FALCON deploys a Graph Neu-

ral Network predictor, short for meta-GNN, which is supervised by the explored nodes’ performances

and learns to predict the performance of a speciﬁc design given the corresponding node in the design

graph. The meta-GNN is designed with 1) a task-agnostic module, which performs message passing

on the design graph, and 2) a task-speciﬁc module, which conducts label propagation of the known

model performance information on the design graph. Furthermore, we propose a search strategy that

uses meta-GNN predictions to navigate the search in the design graph efﬁciently.

Experiments. Without loss of generality, we mainly focus on AutoML for graph representation

learning in this paper. We conduct extensive experiments on 27 graph datasets, covering node- and

graph-level tasks with distinct distributions. Moreover, FALCON can work well on other datasets

such as the CIFAR-10 image dataset. Our code is available at

https://github.com/Wuyxin/

FALCON-GLFrontiers.

2 Related Work

Automatic Machine Learning (AutoML) is the cornerstone of discovering state-of-the-art model

designs without costing massive human efforts. We introduce four types of related works below.

Sample-based AutoML methods. Existing sample-based approaches explore the search space via

sampling candidate designs, which includes heuristic search algorithms, e.g., Simulated Annealing,

Bayesian Optimization approaches [

], evolutionary- [

] and reinforcement-based

methods [

]. However, they tend to train thousands of models from scratch, which results in the

low sample efﬁciency. For example, [

] usually involve training hundreds of GPUs for several days,

hindering the development of AutoML in real-world applications [

]. Some hyper-parameter search

methods aim to reduce the computational cost. For example, Successive Halving [

] allocates the

training resources to more potentially valuable models based on the early-stage training information.

Li et al. [

] further extend it using different budgets to ﬁnd the best conﬁgurations to avoid the

trade-off between selecting the conﬁguration number and allocating the budget. Jaderberg et al. [

]

combine parallel search and sequential optimisation methods to conduct fast search. However, their

selective mechanisms are only based on the model performance and lack of deep knowledge, which

draws less insight into the relation of design variables and limits the sample efﬁciency.

One-shot AutoML methods. The one-shot approaches [

] have been popular for the

high search efﬁciency. Speciﬁcally, they involve training a super-net representing the design space,

i.e., containing every candidate design, and shares the weights for the same computational cell.

Nevertheless, weight sharing degrades the reliability of design ranking, as it fails to reﬂect the true

performance of the candidate designs [36].

Graph-based AutoML methods. The key insight of our work is to construct the design space as a

design graph, where nodes are candidate designs and edges denote design similarities, and deploy a

Graph Neural Network, i.e., meta-GNN, to predict the design performance. Graph HyperNetwork [

]

directly generates weights for each node in a computation graph representation. [

] study network

generators that output relational graphs and analyze the link between their predictive performance

and the graph structure. Recently, [

] considers both the micro- (i.e., a single block) and macro-

architecture (i.e., block connections) of each design in graph domain. AutoGML [

] designs a

meta-graph to capture the relations among models and graphs and take a meta-learning approach to

estimate the relevance of models to different graphs. Notably, none of these works model the search

space as a design graph.

Design performance predictor. Previous works predict the performance of a design using the

learning curves [

], layer-wise features [

], computational graph structure [

or combining dataset information [

] via a dataset encoder. To highlight, FALCON explicitly

models the relations among model designs. Moreover, it leverages the performance information on

training instances to provide task-speciﬁc information besides the design features, which is differently

motivated compared with [

] that employs meta-learning techniques and incorporate hardware

features to rapidly adapt to unseen devices. Besides, meta-GNN is applicable for both images and

graphs, compared with [44].

3 Proposed Method

This section introduces our proposed approach FALCON for sample-based AutoML. In Section 3.1,

we introduce the construction of design graph, and formulate the AutoML goal as a search on the

design graph for the node with the best task performance. In Section 3.2, we introduce our novel

neural predictor consisting of a task-agnostic module and a task-speciﬁc module, which predicts the

performances of unknown designs. Finally, we detail our search strategy in Section 3.3. We refer the

reader to Figure 1 (b) for a high-level overview of FALCON.

3.1 Design Space as a Graph

Motivation. Previous works generally consider each design choice as isolated from other designs.

However, it is often observed that some designs that share the same design features, e.g., graph neural

networks (GNNs) that are more than 3 layers and have batch normalization layers, may have similar

performances. Moreover, the inductive bias of the relations between design choices can provide

valuable information for navigating the design space for the best design. For example, suppose we

ﬁnd that setting batch normalization of a 3-layer GCN [

] and a 4-layer GIN [

] to false both

degrade the performance. Then we can reasonably infer that a 3-layer GraphSAGE [

] with batch

normalization outperforms the one without. We could leverage such knowledge and only search

for the designs that are more likely to improve the task performance. To the best of our knowledge,

FALCON is the ﬁrst method to explicitly consider such relational information among model designs.

Design graph. We denote the design graph as

G(N,E)

, where the nodes

include the candidate

designs, and edges Edenote the similarities between the candidate designs. Speciﬁcally, we use the

notion of design distance to decide the graph connectivity, and we elaborate on them below.

Design distance. For each numerical design dimension, two design choices have a distance

if they

are adjacent in the ordered list of design choices. For example, if the hidden dimension size can take

values

[16,32,64,128]

, then the distance between

and

, and the distance between

and

128

. For each categorical design dimension, any two distinct design choices have a distance

We then deﬁne the connectivity of the design graph in terms of the design distance:

Deﬁnition 1 (Design Graph Connectivity).The design graph can be expressed as

G(N,E)

, where

the nodes

N={d1, . . . , dn}

are model designs, and

(di, dj)∈ E

iff the design distance between

and djis 1.

Structure of the design graph. The deﬁnition of edges implies that the design graph is highly

structured, with the following properties: (1) All designs that are the same except for one categorical

design dimension form a clique subgraph. (2) All designs that are the same except

numerical

design dimensions form a grid graph structure. Moreover, we use a special calculation for the design

distance with a combination of design dimensions that have dependencies. For example, the design

dimensions of pooling operations, pooling layers, and the number of layers can depend on each other,

thus the design graph structure becomes more complex. See the details in Appendix A.2.

Design subgraph. The design graph may contain millions of nodes. Therefore, directly applying

the meta-model to the design graph is computationally intensive. Moreover, a reliable performance

estimation for an unknown node depends on its similarity between the nodes already explored by the

search algorithm. Therefore, we focus on using a meta-model to predict performance for a dynamic

subgraph, i.e., design subgraph, containing the explored nodes in the current search stage and the

candidate nodes to be sampled in the next step. The candidate set can be constructed by selecting the

multi-hop neighbors of explored nodes on the design graph. The design subgraph is deﬁned as:

Deﬁnition 2 (Design Subgraph).During a search, suppose the explored node set is

and the

candidate set is

. The design subgraph is formulated as

Gs(Ns,Es)

, where

Ns=Ne∪ Nc

are the

nodes and Es={(u, v)|u∈ Ns, v ∈ Ns,(u, v)∈ N } are the edges.

Given the design subgraph, we formulate the AutoML problem as searching for the node, i.e., design

choice, with the best task performance.

3.2 Meta-GNN for Performance Prediction

Here we introduce a meta-model, named meta-GNN, to predict the performance of model designs,

i.e., nodes of the design subgraph. The goal of meta-GNN is learning the inductive bias of design

relations, which is used to navigate the search path on the design graph. As is illustrated in Figure 2,

the meta-GNN comprises a task-agnostic module and a task-speciﬁc module, used to capture the

knowledge of model design and task performance, respectively.

Task-agnostic module. The task-agnostic module uses a design encoder to encode the design

features on nodes of the design subgraph, and a relation encoder to capture the design similarities and

differences on edges of the design subgraph. After that, it performs message passing on the design

subgraph. We introduce each component below:

•

Design encoder: it computes the node features of design subgraph by the concatenation of the

feature encoding of each design dimension. For numerical design dimensions,we conduct min-max

normalization on their values as the node features. For categorical design dimensions such as

aggregation operator which takes one of (SUM, MAX, MEAN), we encode it as a one-hot feature.

•

Relation encoder: it captures the similarity relationships between the connecting designs. For each

(di, dj)∈ E, we encode the design dimension where diand djdiffer by a one-hot encoding.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EfficientAutomaticMachineLearningviaDesignGraphsShirleyWuJiaxuanYouJureLeskovecRexYingDepartmentofComputerScience,StanfordUniversity{shirwu,jiaxuan,jure,rexy}@cs.stanford.eduAbstractDespitethesuccessofautomatedmachinelearning(AutoML),whichaimstofindthebestdesign,includingthearchitectureofdeepnetwork...

展开>> 收起<<

Efficient Automatic Machine Learning via Design Graphs Shirley Wu Jiaxuan You Jure Leskovec Rex Ying.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Efficient Automatic Machine Learning via Design Graphs Shirley Wu Jiaxuan You Jure Leskovec Rex Ying

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: