decision set grows exponentially with the natural expression of the problem. Several efficient algorithms do
exist, even for uncountably infinite decision sets, when the loss functions have certain special structure (such
as linearity (Kalai & Vempala, 2005) or convexity (Zinkevich, 2003)). However, such structure is often absent
in the above applications of interests.
Notice that the efficiency of the above specialized methods is usually made possible by assuming that the
corresponding offline optimization problem (i.e., minimizing the (averaged) loss) can be solved efficiently.
This observation motivates the oracle-efficient online learning problem (Hazan & Koren, 2016). In this setting,
the learner has access to a black-box offline oracle, which, given a real-weighted dataset
S
=
{
(
w(j), y(j)
)
}n
j=1
,
can efficiently return the solution to the following problem:
argmin
x∈X
n
X
j=1
w(j)f(x, y(j)).(1)
The goal is to design oracle-efficient algorithms which can query the offline-oracle
O
(1) times each round.
Concrete examples of such an oracle include algorithms for empirical risk minimization (Bishop, 2007),
data-driven market design (Nisan & Ronen, 2007), and dynamic programming (Bertsekas, 2019).
As pointed out by Hazan & Koren (2016), the design of oracle-efficient algorithms is extremely challenging
and such an algorithm does not exist in the worst case. Nevertheless, recent work (Daskalakis & Syrgkanis,
2016; Syrgkanis et al., 2016; Dud´ık et al., 2020) has introduced a series of algorithms which are oracle-efficient
when certain sufficient conditions are met. Among them, the state-of-the-art method is the generalized-
follow-the-perturbed-leader algorithm (GFTPL, Dud´ık et al., 2020), which is a variant of the classical
follow-the-perturbed-leader (FTPL) algorithm (Kalai & Vempala, 2005). Similar to FTPL, GFTPL perturbs
the cumulative loss of each decision by adding a random variable, and chooses the decision with the smallest
perturbed loss as
xt
. However, the vanilla FTPL perturbs each decision independently, which requires to
generate
K
independent random variables in total. Moreover, the oracle in
(1)
can not be applied here since
as it cannot handle the perturbation term. To address these limitations, GFTPL only generates a noise
vector of low dimension (in particular, much smaller dimension than the size of the decision set) in the
beginning, and constructs
K
dependent perturbations based on the multiplication between the noise vector
and a perturbation translation matrix (PTM). Therefore, the PTM critically ensures that the computational
complexity for the noise generation itself is largely reduced. Furthermore, oracle-efficiency can be achieved
by setting the elements in the PTM as carefully designed synthetic losses. Dud´ık et al. (2020) show that
a worst-case optimal regret bound can be obtained when the PTM is admissible, i.e., every two rows are
substantially distinct. This serves as a sufficient condition for achieving oracle-efficiency.
While these results form a solid foundation for general worst-case oracle-efficient online learning, it
remains unclear whether problem-dependent, or data-adaptive bounds are achievable in conjunction with
oracle-efficiency. In other words, the design of a generally applicable oracle-efficient and adaptive online
learning algorithm has remained open. In this paper, we provide an affirmative answer to this problem, and
make the following contributions.
•
We propose a variant of the GFTPL algorithm (Dud´ık et al., 2020), and derive a new sufficient condition
for ensuring oracle-efficiency while achieving the small-loss bound. Our key observation is that while
the admissibility condition of the PTM in GFTPL successfully stabilizes the algorithm (by ensuring
that
P
[
xt6
=
xt+1
] is small), it does not always enable adaptation. We address this challenge via a new
condition for PTM, called approximability. This condition ensures a stronger stability measure, i.e.,
the ratio of
P
[
xt
=
x(i)
] and
P
[
xt+1
=
x(i)
] is upper-bounded by a universal constant for any
i∈
[
K
],
which is critical for proving the small-loss bound. In summary, we obtain the small-loss bound by
equipping GFTPL with an approximable PTM, a data-dependent step-size and Laplace distribution for
the perturbation noise. As a result of these changes, our analysis path differs significantly from that
of Dud´ık et al. (2020). Our new condition of approximability is simple and interpretable, and can be
easily verified for an arbitrary PTM. It shares both similarities and differences from the admissibility
condition proposed in Dud´ık et al. (2020). We demonstrate this through several examples where one of
the sufficient conditions holds, but not the other.
2