Multi-step Planning for Automated Hyperparameter Optimization with OptFormer Lucio M. Dery

2025-05-02 0 0 880.08KB 10 页 10玖币
侵权投诉
Multi-step Planning for Automated Hyperparameter
Optimization with OptFormer
Lucio M. Dery
Carnegie Mellon University
Abram L. Friesen
Deepmind
Nando De Freitas
Deepmind
Marc’Aurelio Ranzato
Deepmind
Yutian Chen
Deepmind
Abstract
As machine learning permeates more industries and models become more expen-
sive and time consuming to train, the need for efficient automated hyperparameter
optimization (HPO) has never been more pressing. Multi-step planning based
approaches to hyperparameter optimization promise improved efficiency over my-
opic alternatives by more effectively balancing out exploration and exploitation.
However, the potential of these approaches has not been fully realized due to
their technical complexity and computational intensity. In this work, we leverage
recent advances in Transformer-based, natural-language-interfaced hyperparameter
optimization to circumvent these barriers. We build on top of the recently proposed
OptFormer which casts both hyperparameter suggestion and target function approx-
imation as autoregressive generation thus making planning via rollouts simple and
efficient. We conduct extensive exploration of different strategies for performing
multi-step planning on top of the OptFormer model to highlight its potential for
use in constructing non-myopic HPO strategies.
1 Introduction
The performance of a machine learning (ML) model is deeply tied to the choice of hyperparameters
used for its training and deployment. As such, the problem of automated hyperparameter optimization
(HPO) has become increasingly relevant as ML models have become more ubiquitous [
1
,
2
,
3
]. Given
a function that describes a model’s performance,
f(x)1
, and a space of hyperparameters
X
, the goal
of HPO is to find
x= argmax{x∈X }f(x)
in finite time (
T
) by making a sequence of decisions
S={x1,...,xT}such that x∈ S.
Popular approaches to HPO are based on Bayesian optimization (BO) [
4
,
5
]. In BO, one builds a time-
dependent surrogate model
˜
ft(x)
using evidence collected from past evaluations of the true function
{y1=f(x1), . . . , yt1=f(xt1)}
. Most methods proposed under this framework proceed as
follows: at any timestep
t
, they suggest
xtargmaxx∈X α(x;˜
ft)
where
α
is called an acquisition
function that measures the expected utility after one additional evaluation of
f
at
x
[
5
,
6
,
7
]. These
methods are myopic. They make decisions exclusively based on the estimated function at the current
timestep
˜
ft
and can therefore be sub-optimal since they do not account for the impact of the current
decision on future evaluations. Planning-based HPO attempts to avoid this by appropriately balancing
exploration and exploitation during decision making. They attempt to maximize the long term reward
Work done whilst interning at Deepmind. Correspondence to
<ldery@andrew.cmu.edu>
or
<yutianc@deepmind.com>
1Here, performance captures the metric that we wish to optimize, such as accuracy, runtime, etc.
Foundation Models for Decision Making Workshop at Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.04971v2 [cs.LG] 16 Nov 2022
over a time horizon,
h
, by considering the influence of the choice of
xt:t+h1
on
˜
ft+h
through
multistep lookahead [
8
,
9
,
10
]. Such methods promise improved query efficiency – needing fewer
configuration evaluations to achieve threshold performance – and improved performance of the model
after all evaluations are considered when compared to myopic approaches. As recent ML models
like pre-trained language and vision models become larger and thus both more expensive and time
consuming to train [11, 12, 13], the above promises have become more enticing.
Despite these promises, planning-based HPO has not been widely adopted. This is because suggested
approaches have been both technically complex to implement and computationally intensive to
execute. In this paper, we circumvent both these challenges by leveraging recent advances in
Transformer based [
14
], natural language interfaced HPO. Specifically, Chen et al.
[2]
recently
proposed an approach dubbed OptFormer which presents a unified, natural language interface for
hyperparameter optimization and thus allows transfer learning between HPO problems. Relevant to
us is the structure of the OptFormer model: it jointly performs both hyperparameter suggestion
xt
and function prediction ˜
ft(xt)as autoregressive generation.
By leveraging the ease and speed of autoregressive generation with OptFormer, we present a novel
planning-based HPO algorithm that is simple and scalable; thus bypassing the typical challenges of
traditional methods. We incorporate multi-step lookahead into both candidate generation (instead
of doing a global search as typical of most HPO approaches) and candidate evaluation (we pro-
vide a planning-based acquisition function
α
) since performing rollouts with OptFormer is simply
multistep generation for
h
timesteps. In Section 4, we present our method in detail and in Section
6 we empirically validate our approach under two settings: the BBOB dataset [
15
], which is a
synthetic benchmark, and the RealWorldData dataset [
2
], which is constructed from aggregating
actual hyperparameter tuning experiments. Our results highlight that incorporating planning into
transformer-based hyperparameter optimization models is a promising research direction and we
invite the community to explore this setting further.
2 Related Work
As already discussed, most approaches to hyperparameter optimization leverage Bayesian optimiza-
tion (BO). These methods typically build a Gaussian process (GP) model [
5
,
16
] as the surrogate
˜
ft(x)
and proceed to suggest
xtargmaxx∈X α(x;˜
ft)
where
α
is an acquisition function. An
acquisition function measures the expected utility after one additional evaluation of
f
at
x
. Past work
has explored a variety of acquisition functions such as probability of improvement (PI) [
5
], expected
improvement (EI) [
17
,
18
] and upper confidence bound (UCB) [
19
,
20
]. These acquisition functions
are myopic since they depend solely on the surrogate at the current timestep and so do not directly
account for the impact of a greedy choice on downstream evaluations.
Unlike myopic HPO methods, planning based approaches fundamentally require building models of
the future to assess the impact of a current decision on later timesteps. Though these methods also
rely on a GP as a surrogate model, each point in multi-step planning involves fantasizing/imagining an
updated GP posterior
˜
ft+1 |˜
xt,...,˜
ft+h|˜
xt,˜
xt+1,..., ˜
xt+h1
based on simulated choices from
lookaheads
{(˜
xt,˜yt),...,(˜
xt+h1,˜yt+h1)}
[
21
,
22
]. Note that we use
˜
xt
to represent a fantasized
decision, while
xt
is the actual choice made at timestep
t
. Whilst multi-step planning is promising,
constructing the posterior of a GP model requires matrix inversion which is a compute-intensive
operation [
23
]. Even outside of this limitation, traditional planning based approaches are compute
intensive due to (i) poor scaling behavior of the search tree—
O(qh)
where
q
is the number of choices
at each decision point for each lookahead step [
9
,
21
]—which forces most methods to explore
short horizons, typically
h∈ {1,2}
, and (ii) nested expectation and maximization: marginalizing
future observation
˜yt+j, j < h
and global search on the acquisition function to obtain query
˜
xt+j
at every lookahead step. Another issue is complexity of implementation. In an attempt to reduce
the computational burdens described above, complex approaches have been suggested that require
significant expertise to execute properly. For example Jiang et al.
[22]
attempt to overcome the
problem of nested expectation and maximization by leveraging the reparameterization trick [
24
] to
couple sampling and optimization. They build a differentiable objective over a multi-step tree and
then optimize over this objective to get a proposal for the next suggestion
xt
; a far from simple
method. Rollout policy [
25
] provides a tractable alternative to approximate the nested expectation by
2
摘要:

Multi-stepPlanningforAutomatedHyperparameterOptimizationwithOptFormerLucioM.DeryCarnegieMellonUniversityAbramL.FriesenDeepmindNandoDeFreitasDeepmindMarc'AurelioRanzatoDeepmindYutianChenDeepmindAbstractAsmachinelearningpermeatesmoreindustriesandmodelsbecomemoreexpen-siveandtimeconsumingtotrain,thene...

展开>> 收起<<
Multi-step Planning for Automated Hyperparameter Optimization with OptFormer Lucio M. Dery.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:880.08KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注