Meta-Learning Priors for Safe Bayesian Optimization Jonas Rothfuss ETH Zurich

2025-05-02 0 0 1.44MB 29 页 10玖币

侵权投诉

Meta-Learning Priors for Safe Bayesian Optimization

Jonas Rothfuss

ETH Zurich

Switzerland

rojonas@ethz.ch

Christopher Koenig

inspire AG, ETH

Switzerland

chkoenig@ethz.ch

Alisa Rupenyan

inspire AG, ETH

Switzerland

ralisa@ethz.ch

Andreas Krause

ETH Zurich

Switzerland

krausea@ethz.ch

Abstract: In robotics, optimizing controller parameters under safety constraints

is an important challenge. Safe Bayesian optimization (BO) quantiﬁes uncertainty

in the objective and constraints to safely guide exploration in such settings. Hand-

designing a suitable probabilistic model can be challenging however. In the pres-

ence of unknown safety constraints, it is crucial to choose reliable model hyper-

parameters to avoid safety violations. Here, we propose a data-driven approach to

this problem by meta-learning priors for safe BO from ofﬂine data. We build on

a meta-learning algorithm, F-PACOH, capable of providing reliable uncertainty

quantiﬁcation in settings of data scarcity. As core contribution, we develop a novel

framework for choosing safety-compliant priors in a data-riven manner via empir-

ical uncertainty metrics and a frontier search algorithm. On benchmark functions

and a high-precision motion system, we demonstrate that our meta-learned priors

accelerate the convergence of safe BO approaches while maintaining safety.

Keywords: Meta-learning, Safety, Controller tuning, Bayesian Optimization

1 Introduction

Optimizing a black-box function with as few queries as possible is a ubiquitous problem in science

and engineering. Bayesian Optimization (BO) is a promising paradigm, which learns a probabilistic

surrogate model (often a Gaussian process, GP) of the unknown function to guide exploration. BO

has been successfully applied for optimizing sensor conﬁgurations [1,2] or tuning the parameters of

robotic controllers [3,4,5,6,7]. However, such real-world applications are often subject to safety

constraints which must not be violated in the process of optimization, e.g., the robot not getting dam-

aged. Often, the dependence of the safety constraints on the query inputs is a priori unknown and

can only be observed by measurement. To cope with these requirements, safe BO methods [8,9,10]

model both the objective and constraint functions with GPs. The uncertainty in the constraint is

used to approximate the feasible region from within, to guarantee that no safety violations occur.

Therefore, a critical requirement for safe BO is the reliability of the uncertainty estimates of our

models. Typically, it is assumed that a correct GP prior, upon which the uncertainty estimates hinge,

is exogenously given [e.g., 8,9,10]. In practice, however, appropriate choices for the kernel variance

and lengthscale are unknown and typically have to be chosen very conservatively or hand-tuned by

trial and error, a problematic endeavour in a safety-critical setting. A too conservative choice dra-

matically reduces sample efﬁciency, whereas overestimating smoothness may risk safety violations.

Addressing these shortcomings, we develop an approach for meta-learning informative, but safe,

GP priors in a data-driven way from related ofﬂine data. We build on the F-PACOH meta-learning

method [11] which is capable of providing reliable uncertainty estimates, even in the face of

data-scarcity and out-of-distribution data. However, their approach still relies on an appropriate

kernel choice on which it falls back in the absence of sufﬁcient data. We propose a novel framework

for choosing safety-compliant kernel hyper-parameters in a data-driven manner based on calibration

and sharpness metrics of the conﬁdence intervals. To optimize these uncertainty metrics, we devise a

frontier search algorithm that efﬁciently exploits the monotone structure of the problem. The result-

6th Conference on Robot Learning (CoRL 2022), Auckland, New Zealand.

arXiv:2210.00762v3 [cs.LG] 12 Jun 2023

ing Safe Meta-Bayesian Optimization (SAMBO) approach can be instantiated with existing safe BO

methods and utilize the improved, meta-learned GPs to perform safe optimization more efﬁciently.

In our experiments, we evaluate and compare our proposed approach on benchmark functions as

well as controller tuning for a high-precision motion system. Throughout, SAMBO signiﬁcantly

improves the query efﬁciency of popular safe BO methods, without compromising safety.

2 Related Work

Safe BO aims to efﬁciently optimize a black-box function under safety-critical conditions, where un-

known safety constraints must not be violated. Constrained variants of standard BO methods [12,13,

14] return feasible solutions, but do not reliably exclude unsafe queries. In contrast, SAFEOPT [8,9]

and related variants [15] guarantee safety at all times and have been used to safely tune controllers

in various applications [e.g., 16,17]. While SAFEOPT explores in an undirected manner, GOOSE

[10,18] does so by expanding the safe set in a goal-oriented fashion. All mentioned methods rely on

GPs to model the target and constraint function and assume that correct kernel hyper-parameters are

given. Our work is complementary: We show how to use related ofﬂine data to obtain informative

and safe GP priors in a data-driven way that makes the downstream safe BO more query efﬁcient.

Meta-Learning. Common approaches in meta-learning amortize inference [19,20,21], learn a

shared embedding space [22,23,24,25] or a good neural network initialization [26,27,28,29].

However, when the available data is limited, these approaches a prone to overﬁt on the meta-

level. A body of work studies meta-regularization to prevent overﬁtting in meta-learning

[30,31,32,33,34,35]. Such meta-regularization methods prevent meta-overﬁtting for the mean

predictions, but not for the uncertainty estimates. Recent meta-learning methods aim at providing

reliable conﬁdence intervals even when data is scarce and non-i.i.d [11,36,37]. These methods ex-

tend meta-learning to interactive and life-long settings. However, they either make unrealistic model

assumptions or hinge on hyper-parameters whose improper choice is critical in safety constraint set-

tings. Our work uses F-PACOH [11] to meta-learn reliable GP priors, but removes the need for hand-

specifying a correct hyper-prior by choosing its parameters in a data-driven, safety-aware manner.

3 Problem Statement and Background

3.1 Problem Statement

We consider the problem of safe Bayesian Optimization (safe BO), seeking the global minimizer

x∗= arg min

x∈X

f(x)s.t. q(x)≤0(1)

of a function f:X → Rover a bounded domain Xsubject to a safety constraint q(x)≤0with

constraint function q:X → R. For instance, we may want to optimize the controller parameter of a

robot without subjecting it to potentially damaging vibrations or collisions. During the optimization,

we iteratively make queries x1, ..., xT∈ X and observe noisy feedback ˜

f1, ..., ˜

fTand ˜q1, ..., ˜qT,

e.g., via ˜

ft=f(xt) + ϵfand ˜qt=q(xt) + ϵqwhere ϵf, ϵqis σ2sub-Gaussian noise [38,39].

In our setting, performing a query is assumed to be costly, e.g., running the robot and observing

relevant measurements. Hence, we want to ﬁnd a solution as close to the global minimum in as few

iterations as possible without violating the safety constraint with any query we make.

Additionally, we assume to have access to datasets D1,T1, ..., Dn,Tnwith observations from nsta-

tistically similar but distinct data-generating systems, e.g., data of the same robotic platform under

different conditions. Each dataset Di,Ti={(xi,1,˜

fi,1,˜qi,1), ..., (xi,Ti,˜

fi,Ti,˜qi,Ti)}consists of Ti

measurement triples, where ˜

fi,t =fi(xi,t) + ϵfiand ˜qi,t =qi(xi,t) + ϵqiare the noisy target and

constraints observations. We assume that the underlying functions f1, ..., fnand q1, ..., qnwhich

generated the data are representative of our current target and constraint functions fand q, e.g., that

they are all i.i.d. draws from the same stochastic process. However, the data within each dataset

may be highly dependent (i.e., non-i.i.d.). For instance, each Di,Timay correspond to the queries

and observations from previous safe BO sessions with the same robot under different conditions.

In this paper, we ask the question of how we can harness such related data sources to make safe BO

on our current problem of interest more query efﬁcient, without compromising safety.

3.2 Safe Bayesian Optimization Methods

Safe BO methods construct a Bayesian surrogate model of the functions fand qbased on previous

observations Dt={(xt′,˜

ft′,˜qt′)}t′<t. Typically, a Gaussian Process GP (f(x)|m(x), k(x,x′))

with mean m(x)and kernel function k(x,x′)is employed to form posterior beliefs p(f(x)|Dt) =

N(µf

t(x),(σf

t(x))2)and p(q(x)|Dt) = N(µq

t(x),(σq

t(x))2)over function values [40]. Based on

the predictive posterior, we can form conﬁdence intervals (CIs) to the conﬁdence level α∈[0,1]

CIf

α(x|Dt) := hµf

t(x)±βf(α)σf

t(x)i(2)

where βf(α), the scaling of the standard deviation, is set such that f(x)is in the CI with probability

α. For BO, we often employ a shift invariant kernel k(x,x′) = νϕ(||x−x′||/l), where νis its vari-

ance, lthe lengthscale and ϕa positive function, e.g., squared exponential (SE) ϕ(z) = exp(−z2).

BO methods typically choose their query points by maximizing an acquisition function based on the

posterior p(f(x)|Dt)which trades-off between exploration and exploitation [41,42,43,44]. When

we have additional safety constraints, we need to maintain a safe set

St(α) = {x∈ X|µq

t(x) + βq(α)σq

t(x)<0}(3)

which contains parts of the domain we know to be safe with high-probability α. To maintain safety,

we can only query points within the current St(α). In addition, we need to explore w.r.t. qso that

we can expand St. E.g., SAFEOPT [8,9,16] computes a safe query candidate that optimizes the

acquisition function for fas well as a query candidate that promises the best expansion of St. Then,

it selects the one with the highest uncertainty. While SAFEOPT expands Stundirectedly, GOOSE

[10,18] does so in a more directed manner to avoid unnecessary expansion queries, irrelevant for

minimizing f. In addition to the pessimistic St, it also maintains an optimistic safe set which is used

to calculate a query candidate xopt

tthat maximizes the acquisition function for f. If xopt

tis outside

of St, it chooses safe query points aiming at expanding Stin the direction of xopt

t. See Appx, Afor

more details.

3.3 Meta-Learning GP Priors

In meta-learning [45,46], we aim to extract prior knowledge (i.e., inductive bias) from a set of

related learning tasks. Typically, the meta-learner is given such learning tasks in the form of n

datasets D1,T1, ..., Dn,Tnwith Di,Ti={(xi,t, yi,t)}Ti

t=1 and outputs a reﬁned prior distribution or

hypothesis space which can then be used to accelerate inference on a new, but related, learning task.

Prior work proposes to meta-learning GP priors [47,48,34], tough, fails to maintain reliable

uncertainty estimates when data is scarce and/or non-i.i.d.. The recently introduced F-PACOH

method [11] overcomes this issue, by using a regularization approach in the function space. As

previous work [e.g., 49,48], they use a learnable GP prior ρθ(h) = GP(h(x)|mθ(x), kθ(x,x′))

where the mean and kernel function are neural networks with parameters θand employ the marginal

log-likelihood to ﬁt the ρθ(h)to the meta-training data. However, during the meta-learning, they

regularize ρθ(h)towards a Vanilla GP hyper-prior ρ(h) = GP(h(x)|0, k(x,x′)) with the SE kernel.

To do so, they uniformly sample random measurement sets X= [x1, ..., xm]i.i.d.

∼ U(X)from the

domain and compare the GPs ﬁnite marginals ρθ(hX) = pθ(h(x1), ...., h(xm)) and ρ(hX)through

their KL-divergence. The resulting meta-learning loss with the functional KL regularizer

L(θ):= 1

i=1 −1

ln Z(Di,Ti, ρθ)

| {z }

marginal log-likelihood

+1

√n+1

nTiEXKL[ρθ(hX)||ρ(hX)]

| {z }

functional KL-divergence

(4)

makes sure that, in the absence of sufﬁcient meta-training data, the learned GP behaves like a Vanilla

GP. Overall, this allows us to meta-learn a more informative GP prior which still yields reliable

conﬁdence intervals, even if the meta-training data was collected via safe BO and is thus not i.i.d.

4 Choosing the Safe Kernel Hyper-Parameters

Important for safe BO is the reliability of the uncertainty estimates of our objective and constraint

models. For GPs, the kernel hyper-parameters with the biggest inﬂuence on the CIs. If the kernel

variance νis chosen too small and/or the lengthscale lto large, our models become over-conﬁdent

and the corresponding BO unsafe. In the reverse case, the CIs become too conservative and safe BO

requires many queries to progress. Despite the assumption commonly made in earlier work, [e.g., 8,

9,10], appropriate choices for νand lare unknown. In practice, they are typically chosen conserva-

tively or hand-tuned by trial and error, problematic in safety-critical settings. Aiming to address this

issue, we develop a framework for choosing the kernel hyper-parameters in a data-driven manner.

4.1 Assessing kernel hyper-parameters: Calibration and sharpness

Our approach is based on the calibration and sharpness of uncertainty estimates [see e.g. 50,51,52,

53]. Naturally, if we construct CIs to the conﬁdence level α, we want that at least an αpercentage of

(unseen) observations to fall within these CIs. If this holds in expectation, we say that the uncertainty

estimates are calibrated. If the empirical percentage is less than α, it indicates that our model’s un-

certainty estimates are over-conﬁdent and we are likely to underestimate the risk of safety violations.

To empirically assess how calibrated a probabilistic regression model with hyper-parameters ω, con-

ditioned on a training dataset Dtr, is, we compute its calibration frequency on a test dataset Dtest:

calib-freq(Dtr,Dtest,ω) := 1

|A|X

α∈A



1

|Dtest|X

(x,y)∈Dtest  y∈CIf

αx|Dtr,ω≥α

.(5)

Here, A⊂[0,1] is a set of relevant conﬁdence levels (in our case 20 values equally spaced between

0.8and 1.0). Since the CIs of our model need to be calibrated at any iteration tduring the BO

and for any task we may face, we choose the best empirical estimate we can. We compute the

average calibration frequency across all meta-training datasets and for any sub-sequence of points

within a dataset. In particular, for any task i= 1, ..., n and t= 1, ..., Ti−1we condition/train

our model on the data points Di,≤t={(xi,t′, yi,t′)}t

t′=1 and use the remaining data points

Di,>t ={(xi,t′, yi,t′)}Ti

t′=t+1 to compute the calibration frequency. Overall, this gives us

avg-calib({Di,Ti}n

i=1,ω) := 1

i=1

Ti−1

t=1

calib-freq(Di,≤t,Di,>t,ω).(6)

While the calibration captures how reliable the uncertainty estimates are, it does not reﬂect how

useful the conﬁdence intervals are for narrowing down the range of possible function values. For

instance, a predictor that always outputs a mean with sufﬁciently wide conﬁdence intervals is

calibrated, but useless for BO. Hence, we also consider the sharpness of the uncertainty estimates

which we empirically quantify through the average predictive standard deviation. Similar to (6),

we average over all tasks and data sub-sequences within each task:

avg-std({Di,Ti}n

i=1,ω) := 1

i=1

Ti−1

t=1

|Di,>t|X

(x,y)∈Di,>t

σ(x|Di,≤t,ω).(7)

The avg-std measures how concentrated the uncertainty estimates are and, thus, constitutes a natural

complement to calibration which can be simply achieved by wide/loose conﬁdence intervals.

4.2 Choosing good hyper-parameters via Frontier search

Based on the two empirical quantities introduced above, we can optimize the hyper-parameters ωof

our model as to maximize sharpness (i.e., minimize the avg-std) subject to calibration [50]:

min

ωavg-std({Di,Ti}n

i=1,ω)s.t. avg-calib({Di,Ti}n

i=1,ω)≥1(8)

Since computing avg-std({Di,Ti}n

i=1,ω)and avg-calib({Di,Ti}n

i=1,ω)requires solving the the GP

inference problem many times, each query is computationally demanding. Hence, we need an opti-

mization algorithm for (8) that requires as few queries as possible to get close to the optimal solution.

Algorithm 1 FRONTIERSEARCH (details in Appendix C)

Input: Domain bounds zl,zus.t. zl≤z∗≤zu

1: Qu← {zu},Ql← {zl}

2: for k= 1, ..., K do

3: (zr,z′

r)←LARGESTMAXMINRECT(Ql,Qu)// Largest max-min rect betw. frontiers

4: zq←BESTWORSTCASEQUERY(zr,z′

r,Ql,Qu)// Best query point to split rectangle

5: if c(zq)≥1then

6: Qu←PRUNE(Qu∪ {zq})

7: else

8: Ql←PRUNE(Ql∪ {zq})

Return: arg minz∈Qus(z)

Figure 1: Frontier search (FS) on the kernel lengthscale and variance for the constraint model Argus

robot. Red: areas ruled out, because unsafe. Green: Safe areas that are ruled out since dominated by

better safe queries. After a few iterations, FS has already shrunk the set of possible optima (white

area between fronts) to points close to the safety boarder and picked nearly optimal solution (cross).

We develop an efﬁcient frontier search (FS) algorithm that exploits the monotonicity properties

of this optimization problem. Both avg-std and avg-calib are monotonically increasing in the

kernel variance νand decreasing in the lengthscale l1. By setting z= (−l, ν)and writing

s(z) = avg-std({Di,Ti}n

i=1, l, ν)and c(z) = avg-calib({Di,Ti}n

i=1, l, ν), we can turn (8) into

min

zs(z)s.t. c(z)≥1where s(z) : R27→ Rand c(z) : R27→ Rare monotone. (9)

We presume an upper and lower bound (zu,zl)such that resulting search domain Z= [zl

1, zu

1]×

[zl

2, zu

2]contains the optimal solution z∗= arg minz:c(z)≥1s(z). Since both s(z)and c(z)are mono-

tone we know that z∗must lie on or directly above the constraint boundary c(z) = 1 (see Lemma 2).

In each iteration kof Algorithm 1we query a point zq

k∈ Z and observe the corresponding objective

and constraint values s(zq

k)and c(zq

k). We separate the queries into two sets Quand Qlbased on

whether they lie above or below the constraint boundary. That is, we add zq

kto Quif c(zq

k)≥1and

to Qlotherwise. Since the optimal solution lies on the constraint boundary and c(z)is monotone,

we can rule out entire corners of the search domain: For each zq∈ Quwe can rule all points z′>zq

as candidates for the optimal solution and, similarly for all zq∈ Ql, we can rule out all z′≤zq

This also allows us to prune the sets Quand Qlby removing all the points from them that can be

ruled out by a new query results. To keep track which parts of Zhave not been ruled out yet, we

construct an upper and lower frontiers, here expressed as functions z17→ z2and z27→ z1,

2(z1;Qu) = min{z′

2|z1≥z′

1,z′∈ Qu}, F u

1(z2;Qu) = min{z′

1|z2≥z′

2,z′∈ Qu}(10)

2(z1;Ql) = max{z′

2|z1≤z′

1,z′∈ Ql}, F l

1(z2;Ql) = max{z′

1|z2≤z′

2,z′∈ Ql}(11)

such that the points Γ = {(z1, z2)∈ Z | Fl

2(z1;Ql)≤z2≤Fu

2(z1;Qu)} ⊆ Z between the

frontiers are still plausible candidates for the optimal solution. For notational brevity, we deﬁne

Fu={z∈ Z|Fu

1(z2;Qu) = z1∨Fu

2(z1;Qu) = z2}and Flanalogously as the set of points that

lie on the upper and lower frontier respectively.

1Note that the monotonicity of the calibration frequency in lis only an empirical heuristic that holds in

almost all cases if νis at least as big as the variance of the targets yin a dataset.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Meta-LearningPriorsforSafeBayesianOptimizationJonasRothfussETHZurichSwitzerlandrojonas@ethz.chChristopherKoeniginspireAG,ETHSwitzerlandchkoenig@ethz.chAlisaRupenyaninspireAG,ETHSwitzerlandralisa@ethz.chAndreasKrauseETHZurichSwitzerlandkrausea@ethz.chAbstract:Inrobotics,optimizingcontrollerparameters...

展开>> 收起<<

Meta-Learning Priors for Safe Bayesian Optimization Jonas Rothfuss ETH Zurich.pdf

共29页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Meta-Learning Priors for Safe Bayesian Optimization Jonas Rothfuss ETH Zurich

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: