Multi-step Planning for Automated Hyperparameter Optimization with OptFormer Lucio M. Dery

2025-05-02 0 0 880.08KB 10 页 10玖币

侵权投诉

Multi-step Planning for Automated Hyperparameter

Optimization with OptFormer

Lucio M. Dery∗

Carnegie Mellon University

Abram L. Friesen

Deepmind

Nando De Freitas

Deepmind

Marc’Aurelio Ranzato

Deepmind

Yutian Chen

Deepmind

Abstract

As machine learning permeates more industries and models become more expen-

sive and time consuming to train, the need for efﬁcient automated hyperparameter

optimization (HPO) has never been more pressing. Multi-step planning based

approaches to hyperparameter optimization promise improved efﬁciency over my-

opic alternatives by more effectively balancing out exploration and exploitation.

However, the potential of these approaches has not been fully realized due to

their technical complexity and computational intensity. In this work, we leverage

recent advances in Transformer-based, natural-language-interfaced hyperparameter

optimization to circumvent these barriers. We build on top of the recently proposed

OptFormer which casts both hyperparameter suggestion and target function approx-

imation as autoregressive generation thus making planning via rollouts simple and

efﬁcient. We conduct extensive exploration of different strategies for performing

multi-step planning on top of the OptFormer model to highlight its potential for

use in constructing non-myopic HPO strategies.

1 Introduction

The performance of a machine learning (ML) model is deeply tied to the choice of hyperparameters

used for its training and deployment. As such, the problem of automated hyperparameter optimization

(HPO) has become increasingly relevant as ML models have become more ubiquitous [

]. Given

a function that describes a model’s performance,

f(x)1

, and a space of hyperparameters

, the goal

of HPO is to ﬁnd

x∗= argmax{x∈X }f(x)

in ﬁnite time (

) by making a sequence of decisions

S={x1,...,xT}such that x∗∈ S.

Popular approaches to HPO are based on Bayesian optimization (BO) [

]. In BO, one builds a time-

dependent surrogate model

ft(x)

using evidence collected from past evaluations of the true function

{y1=f(x1), . . . , yt−1=f(xt−1)}

. Most methods proposed under this framework proceed as

follows: at any timestep

, they suggest

xt≈argmaxx∈X α(x;˜

ft)

where

is called an acquisition

function that measures the expected utility after one additional evaluation of

[

]. These

methods are myopic. They make decisions exclusively based on the estimated function at the current

timestep

and can therefore be sub-optimal since they do not account for the impact of the current

decision on future evaluations. Planning-based HPO attempts to avoid this by appropriately balancing

exploration and exploitation during decision making. They attempt to maximize the long term reward

∗

Work done whilst interning at Deepmind. Correspondence to

<ldery@andrew.cmu.edu>

<yutianc@deepmind.com>

1Here, performance captures the metric that we wish to optimize, such as accuracy, runtime, etc.

Foundation Models for Decision Making Workshop at Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.04971v2 [cs.LG] 16 Nov 2022

over a time horizon,

, by considering the inﬂuence of the choice of

xt:t+h−1

ft+h

through

multistep lookahead [

]. Such methods promise improved query efﬁciency – needing fewer

conﬁguration evaluations to achieve threshold performance – and improved performance of the model

after all evaluations are considered when compared to myopic approaches. As recent ML models

like pre-trained language and vision models become larger and thus both more expensive and time

consuming to train [11, 12, 13], the above promises have become more enticing.

Despite these promises, planning-based HPO has not been widely adopted. This is because suggested

approaches have been both technically complex to implement and computationally intensive to

execute. In this paper, we circumvent both these challenges by leveraging recent advances in

Transformer based [

], natural language interfaced HPO. Speciﬁcally, Chen et al.

[2]

recently

proposed an approach dubbed OptFormer which presents a uniﬁed, natural language interface for

hyperparameter optimization and thus allows transfer learning between HPO problems. Relevant to

us is the structure of the OptFormer model: it jointly performs both hyperparameter suggestion

and function prediction ˜

ft(xt)as autoregressive generation.

By leveraging the ease and speed of autoregressive generation with OptFormer, we present a novel

planning-based HPO algorithm that is simple and scalable; thus bypassing the typical challenges of

traditional methods. We incorporate multi-step lookahead into both candidate generation (instead

of doing a global search as typical of most HPO approaches) and candidate evaluation (we pro-

vide a planning-based acquisition function

) since performing rollouts with OptFormer is simply

multistep generation for

timesteps. In Section 4, we present our method in detail and in Section

6 we empirically validate our approach under two settings: the BBOB dataset [

], which is a

synthetic benchmark, and the RealWorldData dataset [

], which is constructed from aggregating

actual hyperparameter tuning experiments. Our results highlight that incorporating planning into

transformer-based hyperparameter optimization models is a promising research direction and we

invite the community to explore this setting further.

2 Related Work

As already discussed, most approaches to hyperparameter optimization leverage Bayesian optimiza-

tion (BO). These methods typically build a Gaussian process (GP) model [

] as the surrogate

ft(x)

and proceed to suggest

xt≈argmaxx∈X α(x;˜

ft)

where

is an acquisition function. An

acquisition function measures the expected utility after one additional evaluation of

. Past work

has explored a variety of acquisition functions such as probability of improvement (PI) [

], expected

improvement (EI) [

] and upper conﬁdence bound (UCB) [

]. These acquisition functions

are myopic since they depend solely on the surrogate at the current timestep and so do not directly

account for the impact of a greedy choice on downstream evaluations.

Unlike myopic HPO methods, planning based approaches fundamentally require building models of

the future to assess the impact of a current decision on later timesteps. Though these methods also

rely on a GP as a surrogate model, each point in multi-step planning involves fantasizing/imagining an

updated GP posterior

˜

ft+1 |˜

xt,...,˜

ft+h|˜

xt,˜

xt+1,..., ˜

xt+h−1

based on simulated choices from

lookaheads

{(˜

xt,˜yt),...,(˜

xt+h−1,˜yt+h−1)}

[

]. Note that we use

to represent a fantasized

decision, while

is the actual choice made at timestep

. Whilst multi-step planning is promising,

constructing the posterior of a GP model requires matrix inversion which is a compute-intensive

operation [

]. Even outside of this limitation, traditional planning based approaches are compute

intensive due to (i) poor scaling behavior of the search tree—

O(qh)

where

is the number of choices

at each decision point for each lookahead step [

]—which forces most methods to explore

short horizons, typically

h∈ {1,2}

, and (ii) nested expectation and maximization: marginalizing

future observation

˜yt+j, j < h

and global search on the acquisition function to obtain query

xt+j

at every lookahead step. Another issue is complexity of implementation. In an attempt to reduce

the computational burdens described above, complex approaches have been suggested that require

signiﬁcant expertise to execute properly. For example Jiang et al.

[22]

attempt to overcome the

problem of nested expectation and maximization by leveraging the reparameterization trick [

] to

couple sampling and optimization. They build a differentiable objective over a multi-step tree and

then optimize over this objective to get a proposal for the next suggestion

; a far from simple

method. Rollout policy [

] provides a tractable alternative to approximate the nested expectation by

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Multi-stepPlanningforAutomatedHyperparameterOptimizationwithOptFormerLucioM.DeryCarnegieMellonUniversityAbramL.FriesenDeepmindNandoDeFreitasDeepmindMarc'AurelioRanzatoDeepmindYutianChenDeepmindAbstractAsmachinelearningpermeatesmoreindustriesandmodelsbecomemoreexpen-siveandtimeconsumingtotrain,thene...

展开>> 收起<<

Multi-step Planning for Automated Hyperparameter Optimization with OptFormer Lucio M. Dery.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Multi-step Planning for Automated Hyperparameter Optimization with OptFormer Lucio M. Dery

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: