Adaptive Oracle-Ecient Online Learning Guanghui WangyZihao HuyVidya MuthukumarzJacob Abernethyy College of Computingy

2025-04-30 0 0 810.38KB 38 页 10玖币
侵权投诉
Adaptive Oracle-Efficient Online Learning
Guanghui WangZihao HuVidya MuthukumarJacob Abernethy
College of Computing
School of Electrical and Computer Engineering
School of Industrial and Systems Engineering
Georgia Institute of Technology
Atlanta, GA 30339
{gwang369,zihaohu,vmuthukumar8,prof}@gatech.edu
Abstract
The classical algorithms for online learning and decision-making have the benefit of achieving the optimal
performance guarantees, but suffer from computational complexity limitations when implemented at
scale. More recent sophisticated techniques, which we refer to as oracle-efficient methods, address this
problem by dispatching to an offline optimization oracle that can search through an exponentially-large
(or even infinite) space of decisions and select that which performed the best on any dataset. But despite
the benefits of computational feasibility, oracle-efficient algorithms exhibit one major limitation: while
performing well in worst-case settings, they do not adapt well to friendly environments. In this paper
we consider two such friendly scenarios, (a) “small-loss” problems and (b) IID data. We provide a new
framework for designing follow-the-perturbed-leader algorithms that are oracle-efficient and adapt well to
the small-loss environment, under a particular condition which we call approximability (which is spiritually
related to sufficient conditions provided in (Dud´ık et al., 2020)). We identify a series of real-world settings,
including online auctions and transductive online classification, for which approximability holds. We
also extend the algorithm to an IID data setting and establish a “best-of-both-worlds” bound in the
oracle-efficient setting.
1 Introduction
Online learning is a fundamental paradigm for modeling sequential decision making problems (Cesa-Bianchi
& Lugosi, 2006; Shalev-Shwartz, 2011; Hazan, 2016). Online learning is usually formulated as a zero-sum
game between a learner and an adversary. In each round
t
= 1
, . . . , T
, the learner first picks an action
xt
from
a (finite) set
X
=
{x(1), . . . , x(K)}
with cardinality equal to
K
. In the meantime, an adversary reveals its
action
yt∈ Y
. As a consequence, the learner observes
yt
, and suffers a loss
f
(
xt, yt
), where
f
:
X ×Y 7→
[0
,
1].
The goal is to minimize the regret, which is defined as the difference between the cumulative loss of the
learner PT
t=1 f(xt, yt), and the cumulative loss of the best action in hindsight L
T= minx∈X PT
t=1 f(x, yt).
A wide variety of algorithms have been proposed for the goal of minimizing worst-case regret (without any
consideration of computational complexity per iteration); see (Cesa-Bianchi & Lugosi, 2006; Shalev-Shwartz,
2011; Hazan, 2016) for representative surveys of this literature. These algorithms all obtain a worst-case
regret bound of the order
O
(
Tlog K
), which is known to be minimax-optimal (Cesa-Bianchi & Lugosi,
2006). Over the last two decades, sophisticated adaptive algorithms have been designed that additionally
enjoy problem-dependent performance guarantees, which can automatically lead to better results in friendly
environments. One of the most important example for this kind of guarantees is the so-called “small-loss”
bound (Hutter & Poland, 2005; Cesa-Bianchi & Lugosi, 2006; Van Erven et al., 2014). Such a bound depends
on the best cumulative loss in hindsight (i.e.
L
T
) instead of the total number of rounds (
T
). Thus, this
bound is much tighter than the worst-case bound, especially when the best decision performs well in the sense
of incurring a very small loss. Another example is the “best-of-both-worlds” bound (Van Erven et al., 2014),
which results in an even tighter regret bound for independent and identically distributed (IID) loss functions.
However, all of these algorithms applied out-of-the-box suffer a linear dependence on the number of
decisions
K
. This is prohibitively expensive, especially in problems such as network routing (Awerbuch &
Kleinberg, 2008) and combinatorial market design (Cesa-Bianchi et al., 2014), where the cardinality of the
1
arXiv:2210.09385v1 [cs.LG] 17 Oct 2022
decision set grows exponentially with the natural expression of the problem. Several efficient algorithms do
exist, even for uncountably infinite decision sets, when the loss functions have certain special structure (such
as linearity (Kalai & Vempala, 2005) or convexity (Zinkevich, 2003)). However, such structure is often absent
in the above applications of interests.
Notice that the efficiency of the above specialized methods is usually made possible by assuming that the
corresponding offline optimization problem (i.e., minimizing the (averaged) loss) can be solved efficiently.
This observation motivates the oracle-efficient online learning problem (Hazan & Koren, 2016). In this setting,
the learner has access to a black-box offline oracle, which, given a real-weighted dataset
S
=
{
(
w(j), y(j)
)
}n
j=1
,
can efficiently return the solution to the following problem:
argmin
x∈X
n
X
j=1
w(j)f(x, y(j)).(1)
The goal is to design oracle-efficient algorithms which can query the offline-oracle
O
(1) times each round.
Concrete examples of such an oracle include algorithms for empirical risk minimization (Bishop, 2007),
data-driven market design (Nisan & Ronen, 2007), and dynamic programming (Bertsekas, 2019).
As pointed out by Hazan & Koren (2016), the design of oracle-efficient algorithms is extremely challenging
and such an algorithm does not exist in the worst case. Nevertheless, recent work (Daskalakis & Syrgkanis,
2016; Syrgkanis et al., 2016; Dud´ık et al., 2020) has introduced a series of algorithms which are oracle-efficient
when certain sufficient conditions are met. Among them, the state-of-the-art method is the generalized-
follow-the-perturbed-leader algorithm (GFTPL, Dud´ık et al., 2020), which is a variant of the classical
follow-the-perturbed-leader (FTPL) algorithm (Kalai & Vempala, 2005). Similar to FTPL, GFTPL perturbs
the cumulative loss of each decision by adding a random variable, and chooses the decision with the smallest
perturbed loss as
xt
. However, the vanilla FTPL perturbs each decision independently, which requires to
generate
K
independent random variables in total. Moreover, the oracle in
(1)
can not be applied here since
as it cannot handle the perturbation term. To address these limitations, GFTPL only generates a noise
vector of low dimension (in particular, much smaller dimension than the size of the decision set) in the
beginning, and constructs
K
dependent perturbations based on the multiplication between the noise vector
and a perturbation translation matrix (PTM). Therefore, the PTM critically ensures that the computational
complexity for the noise generation itself is largely reduced. Furthermore, oracle-efficiency can be achieved
by setting the elements in the PTM as carefully designed synthetic losses. Dud´ık et al. (2020) show that
a worst-case optimal regret bound can be obtained when the PTM is admissible, i.e., every two rows are
substantially distinct. This serves as a sufficient condition for achieving oracle-efficiency.
While these results form a solid foundation for general worst-case oracle-efficient online learning, it
remains unclear whether problem-dependent, or data-adaptive bounds are achievable in conjunction with
oracle-efficiency. In other words, the design of a generally applicable oracle-efficient and adaptive online
learning algorithm has remained open. In this paper, we provide an affirmative answer to this problem, and
make the following contributions.
We propose a variant of the GFTPL algorithm (Dud´ık et al., 2020), and derive a new sufficient condition
for ensuring oracle-efficiency while achieving the small-loss bound. Our key observation is that while
the admissibility condition of the PTM in GFTPL successfully stabilizes the algorithm (by ensuring
that
P
[
xt6
=
xt+1
] is small), it does not always enable adaptation. We address this challenge via a new
condition for PTM, called approximability. This condition ensures a stronger stability measure, i.e.,
the ratio of
P
[
xt
=
x(i)
] and
P
[
xt+1
=
x(i)
] is upper-bounded by a universal constant for any
i
[
K
],
which is critical for proving the small-loss bound. In summary, we obtain the small-loss bound by
equipping GFTPL with an approximable PTM, a data-dependent step-size and Laplace distribution for
the perturbation noise. As a result of these changes, our analysis path differs significantly from that
of Dud´ık et al. (2020). Our new condition of approximability is simple and interpretable, and can be
easily verified for an arbitrary PTM. It shares both similarities and differences from the admissibility
condition proposed in Dud´ık et al. (2020). We demonstrate this through several examples where one of
the sufficient conditions holds, but not the other.
2
We identify a series of real-world applications for which we can construct approximable PTMs: (a) a
series of online auctions problems (Dud´ık et al., 2020); (b) problems with a small adversary action
space
|Y|
(Daskalakis & Syrgkanis, 2016); and (c) transductive online classification (Syrgkanis et al.,
2016; Dud´ık et al., 2020). This is the first-time that the small-loss bound is obtained in all of these
applications. To achieve this, we introduce novel PTMs and analysis for showing the approximability
condition on these PTMs.
We achieve the “best-of-both-worlds” bound, which enjoys even tighter results when the data is IID or
the number of leader changes is small. The main idea is to combine our proposed algorithm with vanilla
FTL leveraging ideas from a meta-algorithm called FlipFlop introduced in Van Erven et al. (2014).
2 Related Work
Our work contributes to two bodies of work: oracle-efficient online learning and adaptive online learning. In
this section, we briefly review the related work in these areas.
2.1 Oracle-efficient online learning
For oracle-efficient online learning, the pioneering work of Hazan & Koren (2016) points out that oracle-efficient
methods do not exist when dealing with general hostile adversaries, which implies that additional assumptions
on the problem structure have to be made. Daskalakis & Syrgkanis (2016) consider the setting in which the
cardinality of the adversary’s action set
Y
is finite and small, and propose to add a series of “fake” losses
to the learning history based on random samples from
Y
. They prove that for this setting an
O
(
|Y|T
)
regret bound can be obtained. Syrgkanis et al. (2016) study the contextual combinatorial online learning
problem, where each action is associated with a binary vector. They make the assumption that the loss
function set contains all linear functions as a sub-class. The approach in Syrgkanis et al. (2016) constructs a
set of synthetic losses for perturbation based on randomly-selected contexts, and achieves worst-case optimal
bounds when all the contextual information can be obtained beforehand, or when there exists a small set
of contexts that can tell each decision apart. Dud´ık et al. (2020) is the first work to focus on the general
non-contextual setting, and propose the generalized FTPL algorithm. This algorithm generates a small
number of random variables at the beginning, and then perturbs the learning history via the innter product
between the PTM matrix and the random variables. The algorithm can be implemented efficiently by setting
the entries of the PTM as carefully designed loss values. Niazadeh et al. (2021) consider a more complicated
combinatorial setting where the offline problem is NP-hard, but a robust approximation oracle exists. For
this case, they propose an online algorithm based on a multiplicative approximation oracle, and prove that it
has low approximate regret, which is a measure weaker than regret, since it only compares with a fraction of
the cumulative loss of the best decision in hindsight. Note that none of the aforementioned methods can be
easily shown to adapt to friendly structure in data. Recently, several concurrent works (Block et al., 2022;
Haghtalab et al., 2022a) investigate how to obtain tighter bounds oracle-efficiently in the smoothed-analysis
setting where the distribution of data is close to the uniform distribution (Rakhlin et al., 2011; Haghtalab
et al., 2022b). The main focus is to adapt to the VC dimension of the hypothesis class, rather than improve
the dependence on the number of rounds T.
In this paper, we mainly focus on the so-called the learning with expert advice setting (Cesa-Bianchi &
Lugosi, 2006), where the action set is discrete, and the loss can be highly non-convex. On the other hand,
efficient algorithms can be obtained even for continuous action sets when the loss functions have certain
properties, such as linearity (Kalai & Vempala, 2005; Hutter & Poland, 2005; Awerbuch & Kleinberg, 2008),
convexity (Zinkevich, 2003; Hazan et al., 2007) or submodularity (Hazan & Kale, 2012). Finally, we note
that, in this paper we mainly focus on the full-information setting, where the learner can observe the whole
loss function after the action is submitted. Oracle-efficient online learning has also been widely studied in the
contextual bandit setting (Langford & Zhang, 2008; Dudik et al., 2011; Agarwal et al., 2014; Foster et al.,
2018; Foster & Rakhlin, 2020). The nature of the oracle-efficient guarantees for the contextual bandit problem
3
is much weaker compared to full-information online learning: positive results either assume a stochastic
probability model on the responses given covariates (e.g. Foster et al. (2018); Foster & Rakhlin (2020)) or
significantly stronger oracles than Eq. (1) (e.g. Agarwal et al. (2014)).
2.2 Adaptive online learning
In this paper, we focus on designing oracle-efficient algorithms with problem-dependent regret guarantees.
Note that this kind of bound can be achieved by many inefficient algorithms in general, such as Hedge and
its variants (Cesa-Bianchi & Lugosi, 2006; De Rooij et al., 2014; Luo & Schapire, 2015), follow-the-perturbed-
leader (Kalai & Vempala, 2005; Van Erven et al., 2014) or follow-the-regularized-leader (Orabona, 2019).
Small-loss bounds can also be obtained efficiently when the loss functions are simply linear (Hutter & Poland,
2005; Syrgkanis et al., 2016). On the other hand, in online convex optimization, small-loss bounds can be
obtained when the loss functions are additionally smooth (Srebro et al., 2010; Orabona et al., 2012; Wang
et al., 2020). However, these algorithms heavily rely on the special structure of the loss functions. In this
paper, we take the first step to extend these methods to support the more complicated (generally non-convex)
problems which appear in real-world applications.
Apart from the small-loss, there exist other types of problem-dependent bounds, such as second-order
bound (Cesa-Bianchi et al., 2005; Gaillard et al., 2014), quantile bound (Chaudhuri et al., 2009; Koolen
& Erven, 2015), or parameter-free bound (Luo & Schapire, 2015; Cutkosky & Orabona, 2018). Moreover,
advanced adaptive results can also be obtained by minimizing more advanced performances measures other
than regret, such as adaptive regret (Hazan & Seshadhri, 2007; Zhang et al., 2019), or dynamic regret (Zhang
et al., 2018; Zhao et al., 2020). How to obtain these more refined theoretical guarantees in the oracle-efficient
setting remains an interesting open problem.
3 GFTPL with Small-Loss Bound
In this section, we ignore computational complexity for the moment and we provide a new FTPL-type
algorithm that enjoys the small-loss bound. We then show that the proposed algorithm can be implemented
efficiently by the offline oracle in Section 4. Before diving into the details, we first briefly recall the definition
of online learning and regret.
Preliminaries.
The online decision problem we consider can be described as follows. In each round
t
, a
learner picks an action
xt∈ X
= [
x(1), . . . , x(K)
]. After observing the adversary’s decision
yt∈ Y
, the learner
suffers a loss
f
(
xt, yt
) where the loss function
f
:
X × Y 7→
[0
,
1] is known to the learner and adversary. The
regret of an online learning algorithm Ais defined as
RA
T:= EhPT
t=1 f(xt, yt)L
Ti,
where
L
T
=
min
k[K]PT
j=1 f
(
x(k), yj
) is the cumulative loss of the best action in hindsight, and the expectation
is taken only with respect to the potentially randomized strategy of the learner.
Our proposed algorithm follows the framework of GFTPL (Dud´ık et al., 2020). We first briefly introduce
to the intuition behind this method. Specifically, in each round
t
, GFTPL picks
xt
by solving the following
optimization problem:
xt= argmin
k[K]Pt1
j=1 f(x(k), yj) + Γ(k), α,
where
α
is a
N
-dimensional noise vector (
NK
) generated from a uniform distribution, and Γ
(k)
is the
k
-th
row of a matrix Γ
[0
,
1]
K×N
, which is referred to as the perturbation translation matrix (PTM). Compared
to vanilla FTPL, which generates
K
random variables (one for each expert), GFTPL only generates
N
random
variables, where
N
is much smaller than
K
. Each expert is perturbed by a different linear combination
of these random variables based on the PTM Γ. The results of Dud´ık et al. (2020) rely on the following
assumption on Γ.
4
Definition 1. (δ-admissibility (Dud´ık et al., 2020))
Let Γ
[0
,
1]
K×N
be a matrix, and denote Γ
(k)
as
the
k
-th row of Γ, and Γ
(k,i)
the
i
-th element of Γ
(k)
. Then, Γis
δ
-admissible if (a)
k, k0
[
K
],
i
[
N
],
such that Γ(k,i)6= Γ(k0,i); and (b) i[N], k, k0[K], such that Γ(k,i)6= Γ(k0,i), then |Γ(k,i)Γ(k0,i)| ≥ δ.
The
δ
-admissibility guarantees that every two rows in Γ are significantly distinct. As pointed out by
Dud´ık et al. (2020), this is the essential property required by GFTPL, and is used to stabilize the algorithm
in the analysis, i.e., ensuring that
P
[
xt6
=
xt+1
] is small. However, the adaptive analysis of inefficient FTPL
(Hutter & Poland, 2005) (i.e. using a noise vector of dimension equal to the size of the decision set) reveals
that this type of stability is insufficient. Instead, one needs to control the following tand i[K],
P[xt=x(i)]
P[xt+1 =x(i)],(2)
the ratio of the probability of picking the
i
-th decision in two consecutive rounds. We note that
δ
-admissibility
is not sufficient to ensure this quantity is bounded, as we establish in the following counter-example lemma.
(See Appendix A.1 for proof).
Lemma 1.
There is an instance of a
δ
-admissible Γ, and a sequence
{yt
:
t
= 1
,
2
, . . .}
, such that if we run
GFTPL we can have P[xt=x(i)]
P[xt+1=x(i)]=for some i[K]and some t > 0.
To address this problem, we propose a new property for Γ. Define
B1
γ
:=
{sRN
:
ksk1γ}
as the
`1-ball of size γ.
Definition 2. (γ-approximability) Let Γ[0,1]K×N. We say that Γis γ-approximable if
k[K], y ∈ Y sB1
γj[K] : DΓ(k)Γ(j), sEf(x(k), y)f(x(j), y).
It may not be immediately obvious how we arrived at this condition, so let us provide some intuition. The
goal of perturbation methods in sequential decision problems, going back to the early work of Hannan (1957),
is to ensure that the algorithm is “hedging” across all available alternative decisions. A newly observed
data point
y
may make expert
j
suddenly look more attractive than expert
k
, as we have now introduced a
new gap
f
(
x(k), y
)
f
(
x(j), y
) in their measured loss values. With this in mind, we say that Γ is a “good”
(i.e. approximable) choice for the PTM, if this gap can be overcome (hedged) by some small (i.e. likely)
perturbation s, so that Γ(k)Γ(j), smakes up the difference. The inequality makes this property flexible
and much easier to satisfy in real-world applications: we only need the gap approximation from above. Later,
we will show that
γ
-approximability guarantees the required stability measure in
(2)
, and thus is critical for
the small-loss bound.
We want to emphasize two final points. First, the
γ
-approximability condition is purely for analysis
purposes and we don’t need compute the quantity
s
in response to
y
and
k
. Second, much of the computational
and decision-theoretic challenges rest heavily on the careful design of Γ. The PTM allows the algorithm to
perform the appropriate hedging across an exponentially-sized set of
K
experts with only
NK
dimensions
of perturbation. As we demonstrate in the following example, we can always construct a
γ
-approximable Γ,
with
N
=
O
(
log K
), but at the expense of computational efficiency. The proposed Γ will not generally be
compatible with the given oracle, in the sense that the optimization problem underlying GFTPL cannot be
written in the form of Eq.
(1)
. In the next section, we will show how to address this problem via another
condition on Γ called implementablity.
Simple Example
For any online learning problem we may construct Γ as follows. Let
N
:=
dlog2Ke
, and
define the
k
th row Γ
(k)
to be the binary representation of the index
k
, with +1
/
1 values instead of 0
/
1.
We claim that this Γ is
γ
-approximable, for
γ
=
dlog2Ke
. We can satisfy the condition of Definition 2, by
setting
s
= Γ
(k)
. It is easy to see that for any
j6
=
k
we have
Γ(k)Γ(j), s
=
Γ(k)Γ(j),Γ(k)
2
f(x(k), y)f(x(j), y), where the last inequality holds because |f(x(i), y)| ≤ 1 for any i[K].
5
摘要:

AdaptiveOracle-EcientOnlineLearningGuanghuiWangyZihaoHuyVidyaMuthukumarzJacobAbernethyyCollegeofComputingySchoolofElectricalandComputerEngineeringzSchoolofIndustrialandSystemsEngineeringzGeorgiaInstituteofTechnologyAtlanta,GA30339fgwang369,zihaohu,vmuthukumar8,profg@gatech.eduAbstractTheclassicalal...

展开>> 收起<<
Adaptive Oracle-Ecient Online Learning Guanghui WangyZihao HuyVidya MuthukumarzJacob Abernethyy College of Computingy.pdf

共38页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:38 页 大小:810.38KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 38
客服
关注