(2003), Amaioua et al. (2018), Audet et al. (2015) extended the popular mesh adaptive direct search to this
setting. Projection-free methods based on Frank-Wolfe methods have been considered in Balasubramanian
and Ghadimi (2018), Sahu et al. (2019) for the case when the constraint set is a convex subset of
Rn
.
Furthermore, Li et al. (2020) considered the case when the constraint set is a Riemannian submanifold
embedded in
Rn
(and the function is defined only over the manifold). None of the above works are directly
applicable to the case of unknown constraints that we consider in this work.
We now discuss some existing methods for solving (variants of) problem
(1)
in the zeroth-order setting.
For solving
(1)
in the deterministic setting (i.e., we could obtain exact evaluations of the objective and the
constraint functions at a given point), filter methods which reduce the objective function while trying to
reduce constraint violations were proposed and analyzed in Audet and Dennis Jr (2004), Echebest et al.
(2017), Pourmohamad and Lee (2020). Barrier methods in the zeroth-order setting were considered in Audet
and Dennis Jr (2006, 2009), Liuzzi and Lucidi (2009), Gratton and Vicente (2014), Fasano et al. (2014),
Liuzzi et al. (2010), Dzahini et al. (2020), with some works also developing line search approaches for setting
the tuning parameters. Model based approaches were considered in the works of Müller and Woodbury
(2017), Tröltzsch (2016), Augustin and Marzouk (2014), Gramacy et al. (2016), Conn and Le Digabel (2013).
Furthermore, B˝urmen et al. (2006), Audet and Tribes (2018) developed extensions of Nelder–Mead algorithm
to the constrained setting.
Several works in the statistical machine learning community also considered Bayesian optimization
methods in the constrained setting, in both the noiseless and noisy setting. We refer the reader, for example,
to Gardner et al. (2014), Gelbart et al. (2016), Ariafar et al. (2019), Balandat et al. (2020), Bachoc et al.
(2020), Greenhill et al. (2020), Eriksson and Poloczek (2020), Letham et al. (2019), Hernández-Lobato et al.
(2015), Lam and Willcox (2017), Picheny et al. (2016), Acerbi and Ma (2017). On one hand, the above works
demonstrate the interest in the optimization and machine learning communities for developing algorithms for
constrained zeroth-order optimization problems. On the other hand, most of the above works are not designed
to handle stochastic zeroth-order constrained optimization that we consider. Furthermore, a majority of the
above works are methodological, and the few works that develop convergence analysis do so only in the
asymptotic setting. A recent work by Usmanova et al. (2019) considered the case when the constraints are
linear functions (but unknown), and provided a Frank-Wolfe based algorithm with estimated constraints.
However, the proposed approach is limited to only linear constraints, and the oracle complexities established
are highly sub-optimal. To the best of our knowledge, there is no rigorous non-asymptotic analysis of the
oracle complexity of stochastic zeroth-optimization when the constraints and the objective values are available
only via noisy function evaluations.
1.4 Methodology and Main Contributions:
Our methodology is based on a novel constraint extrapolation technique developed for the zeroth-order
setting, extending the work of Boob et al. (2022) in the first-order setting, and the Gaussian smoothing based
zeroth-order stochastic gradient estimators. Specifically, we propose the
SZO-ConEX
method in Algorithm 1
for solving problems of the form in
(1)
. We theoretically characterize how to set the tuning parameters of the
algorithm so as to mitigate the issues caused by the bias in the stochastic zeroth-order gradient estimates and
obtain the oracle complexity of our algorithm. More specifically, we make the following main contributions:
•
When the functions
fi
,
i= 0, . . . , m
, are convex, in Theorem 3.1, we show that the number of calls
to the stochastic zeroth-order oracle to achieve an appropriately defined
-optimal solution of
(1)
(see
Definition 3.1) is of order O((m+ 1)n/2).
•
When the functions are nonconvex, in Proposition 3.1, we show that the number of calls to the stochastic
zeroth-order oracle to achieve an appropriately defined
-optimal KKT solution of
(1)
(see Definition 3.2)
is of order O((m+ 1)n/3).
3