Meta-Learning Priors for Safe Bayesian Optimization Jonas Rothfuss ETH Zurich

2025-05-02 0 0 1.44MB 29 页 10玖币
侵权投诉
Meta-Learning Priors for Safe Bayesian Optimization
Jonas Rothfuss
ETH Zurich
Switzerland
rojonas@ethz.ch
Christopher Koenig
inspire AG, ETH
Switzerland
chkoenig@ethz.ch
Alisa Rupenyan
inspire AG, ETH
Switzerland
ralisa@ethz.ch
Andreas Krause
ETH Zurich
Switzerland
krausea@ethz.ch
Abstract: In robotics, optimizing controller parameters under safety constraints
is an important challenge. Safe Bayesian optimization (BO) quantifies uncertainty
in the objective and constraints to safely guide exploration in such settings. Hand-
designing a suitable probabilistic model can be challenging however. In the pres-
ence of unknown safety constraints, it is crucial to choose reliable model hyper-
parameters to avoid safety violations. Here, we propose a data-driven approach to
this problem by meta-learning priors for safe BO from offline data. We build on
a meta-learning algorithm, F-PACOH, capable of providing reliable uncertainty
quantification in settings of data scarcity. As core contribution, we develop a novel
framework for choosing safety-compliant priors in a data-riven manner via empir-
ical uncertainty metrics and a frontier search algorithm. On benchmark functions
and a high-precision motion system, we demonstrate that our meta-learned priors
accelerate the convergence of safe BO approaches while maintaining safety.
Keywords: Meta-learning, Safety, Controller tuning, Bayesian Optimization
1 Introduction
Optimizing a black-box function with as few queries as possible is a ubiquitous problem in science
and engineering. Bayesian Optimization (BO) is a promising paradigm, which learns a probabilistic
surrogate model (often a Gaussian process, GP) of the unknown function to guide exploration. BO
has been successfully applied for optimizing sensor configurations [1,2] or tuning the parameters of
robotic controllers [3,4,5,6,7]. However, such real-world applications are often subject to safety
constraints which must not be violated in the process of optimization, e.g., the robot not getting dam-
aged. Often, the dependence of the safety constraints on the query inputs is a priori unknown and
can only be observed by measurement. To cope with these requirements, safe BO methods [8,9,10]
model both the objective and constraint functions with GPs. The uncertainty in the constraint is
used to approximate the feasible region from within, to guarantee that no safety violations occur.
Therefore, a critical requirement for safe BO is the reliability of the uncertainty estimates of our
models. Typically, it is assumed that a correct GP prior, upon which the uncertainty estimates hinge,
is exogenously given [e.g., 8,9,10]. In practice, however, appropriate choices for the kernel variance
and lengthscale are unknown and typically have to be chosen very conservatively or hand-tuned by
trial and error, a problematic endeavour in a safety-critical setting. A too conservative choice dra-
matically reduces sample efficiency, whereas overestimating smoothness may risk safety violations.
Addressing these shortcomings, we develop an approach for meta-learning informative, but safe,
GP priors in a data-driven way from related offline data. We build on the F-PACOH meta-learning
method [11] which is capable of providing reliable uncertainty estimates, even in the face of
data-scarcity and out-of-distribution data. However, their approach still relies on an appropriate
kernel choice on which it falls back in the absence of sufficient data. We propose a novel framework
for choosing safety-compliant kernel hyper-parameters in a data-driven manner based on calibration
and sharpness metrics of the confidence intervals. To optimize these uncertainty metrics, we devise a
frontier search algorithm that efficiently exploits the monotone structure of the problem. The result-
6th Conference on Robot Learning (CoRL 2022), Auckland, New Zealand.
arXiv:2210.00762v3 [cs.LG] 12 Jun 2023
ing Safe Meta-Bayesian Optimization (SAMBO) approach can be instantiated with existing safe BO
methods and utilize the improved, meta-learned GPs to perform safe optimization more efficiently.
In our experiments, we evaluate and compare our proposed approach on benchmark functions as
well as controller tuning for a high-precision motion system. Throughout, SAMBO significantly
improves the query efficiency of popular safe BO methods, without compromising safety.
2 Related Work
Safe BO aims to efficiently optimize a black-box function under safety-critical conditions, where un-
known safety constraints must not be violated. Constrained variants of standard BO methods [12,13,
14] return feasible solutions, but do not reliably exclude unsafe queries. In contrast, SAFEOPT [8,9]
and related variants [15] guarantee safety at all times and have been used to safely tune controllers
in various applications [e.g., 16,17]. While SAFEOPT explores in an undirected manner, GOOSE
[10,18] does so by expanding the safe set in a goal-oriented fashion. All mentioned methods rely on
GPs to model the target and constraint function and assume that correct kernel hyper-parameters are
given. Our work is complementary: We show how to use related offline data to obtain informative
and safe GP priors in a data-driven way that makes the downstream safe BO more query efficient.
Meta-Learning. Common approaches in meta-learning amortize inference [19,20,21], learn a
shared embedding space [22,23,24,25] or a good neural network initialization [26,27,28,29].
However, when the available data is limited, these approaches a prone to overfit on the meta-
level. A body of work studies meta-regularization to prevent overfitting in meta-learning
[30,31,32,33,34,35]. Such meta-regularization methods prevent meta-overfitting for the mean
predictions, but not for the uncertainty estimates. Recent meta-learning methods aim at providing
reliable confidence intervals even when data is scarce and non-i.i.d [11,36,37]. These methods ex-
tend meta-learning to interactive and life-long settings. However, they either make unrealistic model
assumptions or hinge on hyper-parameters whose improper choice is critical in safety constraint set-
tings. Our work uses F-PACOH [11] to meta-learn reliable GP priors, but removes the need for hand-
specifying a correct hyper-prior by choosing its parameters in a data-driven, safety-aware manner.
3 Problem Statement and Background
3.1 Problem Statement
We consider the problem of safe Bayesian Optimization (safe BO), seeking the global minimizer
x= arg min
x∈X
f(x)s.t. q(x)0(1)
of a function f:X Rover a bounded domain Xsubject to a safety constraint q(x)0with
constraint function q:X R. For instance, we may want to optimize the controller parameter of a
robot without subjecting it to potentially damaging vibrations or collisions. During the optimization,
we iteratively make queries x1, ..., xT∈ X and observe noisy feedback ˜
f1, ..., ˜
fTand ˜q1, ..., ˜qT,
e.g., via ˜
ft=f(xt) + ϵfand ˜qt=q(xt) + ϵqwhere ϵf, ϵqis σ2sub-Gaussian noise [38,39].
In our setting, performing a query is assumed to be costly, e.g., running the robot and observing
relevant measurements. Hence, we want to find a solution as close to the global minimum in as few
iterations as possible without violating the safety constraint with any query we make.
Additionally, we assume to have access to datasets D1,T1, ..., Dn,Tnwith observations from nsta-
tistically similar but distinct data-generating systems, e.g., data of the same robotic platform under
different conditions. Each dataset Di,Ti={(xi,1,˜
fi,1,˜qi,1), ..., (xi,Ti,˜
fi,Ti,˜qi,Ti)}consists of Ti
measurement triples, where ˜
fi,t =fi(xi,t) + ϵfiand ˜qi,t =qi(xi,t) + ϵqiare the noisy target and
constraints observations. We assume that the underlying functions f1, ..., fnand q1, ..., qnwhich
generated the data are representative of our current target and constraint functions fand q, e.g., that
they are all i.i.d. draws from the same stochastic process. However, the data within each dataset
may be highly dependent (i.e., non-i.i.d.). For instance, each Di,Timay correspond to the queries
and observations from previous safe BO sessions with the same robot under different conditions.
2
In this paper, we ask the question of how we can harness such related data sources to make safe BO
on our current problem of interest more query efficient, without compromising safety.
3.2 Safe Bayesian Optimization Methods
Safe BO methods construct a Bayesian surrogate model of the functions fand qbased on previous
observations Dt={(xt,˜
ft,˜qt)}t<t. Typically, a Gaussian Process GP (f(x)|m(x), k(x,x))
with mean m(x)and kernel function k(x,x)is employed to form posterior beliefs p(f(x)|Dt) =
N(µf
t(x),(σf
t(x))2)and p(q(x)|Dt) = N(µq
t(x),(σq
t(x))2)over function values [40]. Based on
the predictive posterior, we can form confidence intervals (CIs) to the confidence level α[0,1]
CIf
α(x|Dt) := hµf
t(x)±βf(α)σf
t(x)i(2)
where βf(α), the scaling of the standard deviation, is set such that f(x)is in the CI with probability
α. For BO, we often employ a shift invariant kernel k(x,x) = νϕ(||xx||/l), where νis its vari-
ance, lthe lengthscale and ϕa positive function, e.g., squared exponential (SE) ϕ(z) = exp(z2).
BO methods typically choose their query points by maximizing an acquisition function based on the
posterior p(f(x)|Dt)which trades-off between exploration and exploitation [41,42,43,44]. When
we have additional safety constraints, we need to maintain a safe set
St(α) = {x∈ X|µq
t(x) + βq(α)σq
t(x)<0}(3)
which contains parts of the domain we know to be safe with high-probability α. To maintain safety,
we can only query points within the current St(α). In addition, we need to explore w.r.t. qso that
we can expand St. E.g., SAFEOPT [8,9,16] computes a safe query candidate that optimizes the
acquisition function for fas well as a query candidate that promises the best expansion of St. Then,
it selects the one with the highest uncertainty. While SAFEOPT expands Stundirectedly, GOOSE
[10,18] does so in a more directed manner to avoid unnecessary expansion queries, irrelevant for
minimizing f. In addition to the pessimistic St, it also maintains an optimistic safe set which is used
to calculate a query candidate xopt
tthat maximizes the acquisition function for f. If xopt
tis outside
of St, it chooses safe query points aiming at expanding Stin the direction of xopt
t. See Appx, Afor
more details.
3.3 Meta-Learning GP Priors
In meta-learning [45,46], we aim to extract prior knowledge (i.e., inductive bias) from a set of
related learning tasks. Typically, the meta-learner is given such learning tasks in the form of n
datasets D1,T1, ..., Dn,Tnwith Di,Ti={(xi,t, yi,t)}Ti
t=1 and outputs a refined prior distribution or
hypothesis space which can then be used to accelerate inference on a new, but related, learning task.
Prior work proposes to meta-learning GP priors [47,48,34], tough, fails to maintain reliable
uncertainty estimates when data is scarce and/or non-i.i.d.. The recently introduced F-PACOH
method [11] overcomes this issue, by using a regularization approach in the function space. As
previous work [e.g., 49,48], they use a learnable GP prior ρθ(h) = GP(h(x)|mθ(x), kθ(x,x))
where the mean and kernel function are neural networks with parameters θand employ the marginal
log-likelihood to fit the ρθ(h)to the meta-training data. However, during the meta-learning, they
regularize ρθ(h)towards a Vanilla GP hyper-prior ρ(h) = GP(h(x)|0, k(x,x)) with the SE kernel.
To do so, they uniformly sample random measurement sets X= [x1, ..., xm]i.i.d.
∼ U(X)from the
domain and compare the GPs finite marginals ρθ(hX) = pθ(h(x1), ...., h(xm)) and ρ(hX)through
their KL-divergence. The resulting meta-learning loss with the functional KL regularizer
L(θ):= 1
n
n
X
i=1 1
Ti
ln Z(Di,Ti, ρθ)
| {z }
marginal log-likelihood
+1
n+1
nTiEXKL[ρθ(hX)||ρ(hX)]
| {z }
functional KL-divergence
(4)
makes sure that, in the absence of sufficient meta-training data, the learned GP behaves like a Vanilla
GP. Overall, this allows us to meta-learn a more informative GP prior which still yields reliable
confidence intervals, even if the meta-training data was collected via safe BO and is thus not i.i.d.
3
4 Choosing the Safe Kernel Hyper-Parameters
Important for safe BO is the reliability of the uncertainty estimates of our objective and constraint
models. For GPs, the kernel hyper-parameters with the biggest influence on the CIs. If the kernel
variance νis chosen too small and/or the lengthscale lto large, our models become over-confident
and the corresponding BO unsafe. In the reverse case, the CIs become too conservative and safe BO
requires many queries to progress. Despite the assumption commonly made in earlier work, [e.g., 8,
9,10], appropriate choices for νand lare unknown. In practice, they are typically chosen conserva-
tively or hand-tuned by trial and error, problematic in safety-critical settings. Aiming to address this
issue, we develop a framework for choosing the kernel hyper-parameters in a data-driven manner.
4.1 Assessing kernel hyper-parameters: Calibration and sharpness
Our approach is based on the calibration and sharpness of uncertainty estimates [see e.g. 50,51,52,
53]. Naturally, if we construct CIs to the confidence level α, we want that at least an αpercentage of
(unseen) observations to fall within these CIs. If this holds in expectation, we say that the uncertainty
estimates are calibrated. If the empirical percentage is less than α, it indicates that our model’s un-
certainty estimates are over-confident and we are likely to underestimate the risk of safety violations.
To empirically assess how calibrated a probabilistic regression model with hyper-parameters ω, con-
ditioned on a training dataset Dtr, is, we compute its calibration frequency on a test dataset Dtest:
calib-freq(Dtr,Dtest,ω) := 1
|A|X
αA
1
|Dtest|X
(x,y)∈Dtest  yCIf
αx|Dtr,ωα
.(5)
Here, A[0,1] is a set of relevant confidence levels (in our case 20 values equally spaced between
0.8and 1.0). Since the CIs of our model need to be calibrated at any iteration tduring the BO
and for any task we may face, we choose the best empirical estimate we can. We compute the
average calibration frequency across all meta-training datasets and for any sub-sequence of points
within a dataset. In particular, for any task i= 1, ..., n and t= 1, ..., Ti1we condition/train
our model on the data points Di,t={(xi,t, yi,t)}t
t=1 and use the remaining data points
Di,>t ={(xi,t, yi,t)}Ti
t=t+1 to compute the calibration frequency. Overall, this gives us
avg-calib({Di,Ti}n
i=1,ω) := 1
n
n
X
i=1
1
Ti1
Ti1
X
t=1
calib-freq(Di,t,Di,>t,ω).(6)
While the calibration captures how reliable the uncertainty estimates are, it does not reflect how
useful the confidence intervals are for narrowing down the range of possible function values. For
instance, a predictor that always outputs a mean with sufficiently wide confidence intervals is
calibrated, but useless for BO. Hence, we also consider the sharpness of the uncertainty estimates
which we empirically quantify through the average predictive standard deviation. Similar to (6),
we average over all tasks and data sub-sequences within each task:
avg-std({Di,Ti}n
i=1,ω) := 1
n
n
X
i=1
1
Ti1
Ti1
X
t=1
1
|Di,>t|X
(x,y)∈Di,>t
σ(x|Di,t,ω).(7)
The avg-std measures how concentrated the uncertainty estimates are and, thus, constitutes a natural
complement to calibration which can be simply achieved by wide/loose confidence intervals.
4.2 Choosing good hyper-parameters via Frontier search
Based on the two empirical quantities introduced above, we can optimize the hyper-parameters ωof
our model as to maximize sharpness (i.e., minimize the avg-std) subject to calibration [50]:
min
ωavg-std({Di,Ti}n
i=1,ω)s.t. avg-calib({Di,Ti}n
i=1,ω)1(8)
Since computing avg-std({Di,Ti}n
i=1,ω)and avg-calib({Di,Ti}n
i=1,ω)requires solving the the GP
inference problem many times, each query is computationally demanding. Hence, we need an opti-
mization algorithm for (8) that requires as few queries as possible to get close to the optimal solution.
4
Algorithm 1 FRONTIERSEARCH (details in Appendix C)
Input: Domain bounds zl,zus.t. zlzzu
1: Qu← {zu},Ql← {zl}
2: for k= 1, ..., K do
3: (zr,z
r)LARGESTMAXMINRECT(Ql,Qu)// Largest max-min rect betw. frontiers
4: zqBESTWORSTCASEQUERY(zr,z
r,Ql,Qu)// Best query point to split rectangle
5: if c(zq)1then
6: QuPRUNE(Qu∪ {zq})
7: else
8: QlPRUNE(Ql∪ {zq})
Return: arg minz∈Qus(z)
Figure 1: Frontier search (FS) on the kernel lengthscale and variance for the constraint model Argus
robot. Red: areas ruled out, because unsafe. Green: Safe areas that are ruled out since dominated by
better safe queries. After a few iterations, FS has already shrunk the set of possible optima (white
area between fronts) to points close to the safety boarder and picked nearly optimal solution (cross).
We develop an efficient frontier search (FS) algorithm that exploits the monotonicity properties
of this optimization problem. Both avg-std and avg-calib are monotonically increasing in the
kernel variance νand decreasing in the lengthscale l1. By setting z= (l, ν)and writing
s(z) = avg-std({Di,Ti}n
i=1, l, ν)and c(z) = avg-calib({Di,Ti}n
i=1, l, ν), we can turn (8) into
min
zs(z)s.t. c(z)1where s(z) : R27→ Rand c(z) : R27→ Rare monotone. (9)
We presume an upper and lower bound (zu,zl)such that resulting search domain Z= [zl
1, zu
1]×
[zl
2, zu
2]contains the optimal solution z= arg minz:c(z)1s(z). Since both s(z)and c(z)are mono-
tone we know that zmust lie on or directly above the constraint boundary c(z) = 1 (see Lemma 2).
In each iteration kof Algorithm 1we query a point zq
k∈ Z and observe the corresponding objective
and constraint values s(zq
k)and c(zq
k). We separate the queries into two sets Quand Qlbased on
whether they lie above or below the constraint boundary. That is, we add zq
kto Quif c(zq
k)1and
to Qlotherwise. Since the optimal solution lies on the constraint boundary and c(z)is monotone,
we can rule out entire corners of the search domain: For each zq∈ Quwe can rule all points z>zq
k
as candidates for the optimal solution and, similarly for all zq∈ Ql, we can rule out all zzq
k.
This also allows us to prune the sets Quand Qlby removing all the points from them that can be
ruled out by a new query results. To keep track which parts of Zhave not been ruled out yet, we
construct an upper and lower frontiers, here expressed as functions z17→ z2and z27→ z1,
Fu
2(z1;Qu) = min{z
2|z1z
1,z∈ Qu}, F u
1(z2;Qu) = min{z
1|z2z
2,z∈ Qu}(10)
Fl
2(z1;Ql) = max{z
2|z1z
1,z∈ Ql}, F l
1(z2;Ql) = max{z
1|z2z
2,z∈ Ql}(11)
such that the points Γ = {(z1, z2) Z | Fl
2(z1;Ql)z2Fu
2(z1;Qu)} Z between the
frontiers are still plausible candidates for the optimal solution. For notational brevity, we define
Fu={z∈ Z|Fu
1(z2;Qu) = z1Fu
2(z1;Qu) = z2}and Flanalogously as the set of points that
lie on the upper and lower frontier respectively.
1Note that the monotonicity of the calibration frequency in lis only an empirical heuristic that holds in
almost all cases if νis at least as big as the variance of the targets yin a dataset.
5
摘要:

Meta-LearningPriorsforSafeBayesianOptimizationJonasRothfussETHZurichSwitzerlandrojonas@ethz.chChristopherKoeniginspireAG,ETHSwitzerlandchkoenig@ethz.chAlisaRupenyaninspireAG,ETHSwitzerlandralisa@ethz.chAndreasKrauseETHZurichSwitzerlandkrausea@ethz.chAbstract:Inrobotics,optimizingcontrollerparameters...

展开>> 收起<<
Meta-Learning Priors for Safe Bayesian Optimization Jonas Rothfuss ETH Zurich.pdf

共29页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:29 页 大小:1.44MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 29
客服
关注