ing Safe Meta-Bayesian Optimization (SAMBO) approach can be instantiated with existing safe BO
methods and utilize the improved, meta-learned GPs to perform safe optimization more efficiently.
In our experiments, we evaluate and compare our proposed approach on benchmark functions as
well as controller tuning for a high-precision motion system. Throughout, SAMBO significantly
improves the query efficiency of popular safe BO methods, without compromising safety.
2 Related Work
Safe BO aims to efficiently optimize a black-box function under safety-critical conditions, where un-
known safety constraints must not be violated. Constrained variants of standard BO methods [12,13,
14] return feasible solutions, but do not reliably exclude unsafe queries. In contrast, SAFEOPT [8,9]
and related variants [15] guarantee safety at all times and have been used to safely tune controllers
in various applications [e.g., 16,17]. While SAFEOPT explores in an undirected manner, GOOSE
[10,18] does so by expanding the safe set in a goal-oriented fashion. All mentioned methods rely on
GPs to model the target and constraint function and assume that correct kernel hyper-parameters are
given. Our work is complementary: We show how to use related offline data to obtain informative
and safe GP priors in a data-driven way that makes the downstream safe BO more query efficient.
Meta-Learning. Common approaches in meta-learning amortize inference [19,20,21], learn a
shared embedding space [22,23,24,25] or a good neural network initialization [26,27,28,29].
However, when the available data is limited, these approaches a prone to overfit on the meta-
level. A body of work studies meta-regularization to prevent overfitting in meta-learning
[30,31,32,33,34,35]. Such meta-regularization methods prevent meta-overfitting for the mean
predictions, but not for the uncertainty estimates. Recent meta-learning methods aim at providing
reliable confidence intervals even when data is scarce and non-i.i.d [11,36,37]. These methods ex-
tend meta-learning to interactive and life-long settings. However, they either make unrealistic model
assumptions or hinge on hyper-parameters whose improper choice is critical in safety constraint set-
tings. Our work uses F-PACOH [11] to meta-learn reliable GP priors, but removes the need for hand-
specifying a correct hyper-prior by choosing its parameters in a data-driven, safety-aware manner.
3 Problem Statement and Background
3.1 Problem Statement
We consider the problem of safe Bayesian Optimization (safe BO), seeking the global minimizer
x∗= arg min
x∈X
f(x)s.t. q(x)≤0(1)
of a function f:X → Rover a bounded domain Xsubject to a safety constraint q(x)≤0with
constraint function q:X → R. For instance, we may want to optimize the controller parameter of a
robot without subjecting it to potentially damaging vibrations or collisions. During the optimization,
we iteratively make queries x1, ..., xT∈ X and observe noisy feedback ˜
f1, ..., ˜
fTand ˜q1, ..., ˜qT,
e.g., via ˜
ft=f(xt) + ϵfand ˜qt=q(xt) + ϵqwhere ϵf, ϵqis σ2sub-Gaussian noise [38,39].
In our setting, performing a query is assumed to be costly, e.g., running the robot and observing
relevant measurements. Hence, we want to find a solution as close to the global minimum in as few
iterations as possible without violating the safety constraint with any query we make.
Additionally, we assume to have access to datasets D1,T1, ..., Dn,Tnwith observations from nsta-
tistically similar but distinct data-generating systems, e.g., data of the same robotic platform under
different conditions. Each dataset Di,Ti={(xi,1,˜
fi,1,˜qi,1), ..., (xi,Ti,˜
fi,Ti,˜qi,Ti)}consists of Ti
measurement triples, where ˜
fi,t =fi(xi,t) + ϵfiand ˜qi,t =qi(xi,t) + ϵqiare the noisy target and
constraints observations. We assume that the underlying functions f1, ..., fnand q1, ..., qnwhich
generated the data are representative of our current target and constraint functions fand q, e.g., that
they are all i.i.d. draws from the same stochastic process. However, the data within each dataset
may be highly dependent (i.e., non-i.i.d.). For instance, each Di,Timay correspond to the queries
and observations from previous safe BO sessions with the same robot under different conditions.
2