Sequential Decision Making on Unmatched Data using Bayesian Kernel Embeddings

2025-05-03 0 0 527.38KB 11 页 10玖币
侵权投诉
SEQUENTIAL DECISION MAKING ON UNMATCHED DATA USING
BAYESIAN KERNEL EMBEDDINGS
PREPRINT. UNDER REVIEW.
Diego Martinez-Taboada
Department of Statistics & Data Science
Carnegie Mellon University
diegomar@andrew.cmu.edu
Dino Sejdinovic
Department of Statistics
University of Oxford
dino.sejdinovic@stats.ox.ac.uk
ABSTRACT
The problem of sequentially maximizing the expectation of a function seeks to maximize the expected
value of a function of interest without having direct control on its features. Instead, the distribution of
such features depends on a given context and an action taken by an agent. In contrast to Bayesian
optimization, the arguments of the function are not under agent’s control, but are indirectly determined
by the agent’s action based on a given context. If the information of the features is to be included in
the maximization problem, the full conditional distribution of such features, rather than its expectation
only, needs to be accounted for. Furthermore, the function is itself unknown, only counting with noisy
observations of such function, and potentially requiring the use of unmatched data sets. We propose
a novel algorithm for the aforementioned problem which takes into consideration the uncertainty
derived from the estimation of both the conditional distribution of the features and the unknown
function, by modeling the former as a Bayesian conditional mean embedding and the latter as a
Gaussian process. Our algorithm empirically outperforms the current state-of-the-art algorithm in the
experiments conducted.
1 INTRODUCTION
Uncertainty quantification is inherent to sequential decision making problems, where an agent sequentially explores a
set of possible actions while seeking to maximize the associated reward (Alagoz et al. [2010], Sutton and Barto [2018]).
The algorithms designed for such problems are based on an exploration-exploitation trade-off: the agent tries to exploit
its knowledge on the actions that have yielded high rewards, but it also seeks to explore actions that count with little
information (Macready and Wolpert [1998], Audibert et al. [2009]).
The multi-armed bandit (MAB) is one of the sequential decision problems that has received the most attention in
the literature (Vermorel and Mohri [2005], Slivkins et al. [2019]). The classical MAB problem is defined by a tuple
hA, FRi
, where
A
is the state of actions and
FR
the reward distribution. At time step
tN
, the agent decides to take
an action
at∈ A
. A reward
rat
drawn from
ratFR(·|at)
follows. The MAB aims at maximizing the cumulative
reward
JT=E"T
X
t=1
rat#,(1)
where
at
are the actions sequentially taken. The contextual multi-armed bandit problem (CMAB) generalizes the MAB
by including a state space (or context space)
S
(Lu et al. [2010], Zhou [2015]). Formally, the CMAB problem is
determined by a tuple hS,A, FRi, and it aims at maximizing
JT=E"T
X
t=1
rst,at#,(2)
where rst,atFR(·|st, at), and stare sequentially given.
Work primarily done at the University of Oxford and finished at Carnegie Mellon University.
arXiv:2210.13692v1 [stat.ML] 25 Oct 2022
arXiv Template A PREPRINT
However, one may be interested in optimizing the expectation of a function
f(r)
, instead of
r
itself. In other words,
rt
may be seen as an intermediate reward that mediates between the agent’s actions
at
and the ultimate reward
f(rt)
. In
various fields of application, it is usual to study a function of the outcome, rather than the outcome itself. For instance,
the conditional value at risk (CVaR), which has seen extensive use in financial portfolio optimization (Rockafellar et al.
[2000], Zhu and Fukushima [2009]), considers
f(r) = r1rrα
. The median and other quantiles may be of interest for
noisy data (Even-Dar et al. [2002], Altschuler et al. [2018]).
In more general settings, the function fmay be itself unknown. Noisy data of the form
(ri, f(ri) + ξi)M
i=1 (3)
may be the only available information on such dependency. In this case, uncertainty raises from the lack of information
between the traditional CMAB variables r, a, s, but also from the unknown function f.
Please note that this scenario opens the door for considering multiple sources of data. In contrast to the usual matched
scenario, where the sequential data is of the form
(si, ai, ri, f(ri) + ξi)t
i=1
, it may be of interest to consider an
unmatched scenario, where a second data set of the form
(rj, f(rj) + ξj)M
j=1
is given on top of the sequential data.
Algorithms that handle unmatched data sets have currently been attracting more and more interest in the so called data
fusion problem (Meng et al. [2020]).
For instance, a new movie recommendation system might obtain a multivariate
rj
consisting on: whether the user
showed interest in the recommendation, the number of cliques after the recommendation screen, the time spent reading
the description of the recommendation, etc. The final reward
f(rj) + ξj
is given by the number of minutes that the
client ended up watching the content for. A rich data set
(rj, f(rj) + ξj)M
j=1
might be obtained from other platforms
that have been longer in the market. Not considering this second data set would imply a substantial loss of information
which would to suboptimal approaches.
In this work, we revisit the problem of sequentially maximising the conditional expectation of a function in the CMAB
framework. Our contributions are two-folded:
We design a novel algorithm, namely Contextual Bayesian Mean Process Upper Confidence Bound, which
considers two sources of uncertainty derived from the lack of information on
r, a, s
, as well as the unknown
function f. It allows for considering both matched and unmatched data sets.
We empirically show that the Bayesian Mean Process Upper Confidence Bound outperforms the state-of-the-
art Conditional Mean Embeddings Upper Confidence Bound (Chowdhury et al. [2020]) in the experimental
settings considered.
2 RELATED WORK
Multiple algorithms have been proposed for addressing the CMAB problem. The contextual Gaussian process upper
confidence bound (CGP-UCB), which motivates the algorithm proposed in this work, was first presented in Krause and
Ong [2011]. It generalizes the context-free Gaussian process upper confidence bound (GP-UCB) introduced in Srinivas
et al. [2009]. The CGP-UCB attains sublinear contextual regret in many real life applications i.e. it is able to compete
with the optimal mapping from contexts to actions. However, please note that there exist prominent alternatives to
CGP-UCB such as Thompson sampling (Agrawal and Goyal [2013]). Some algorithms designed for the MAB problem
(Chowdhury and Gopalan [2017]) may also be adapted to the CMAB framework.
The problem of sequentially maximising the conditional expectation has not received as much attention in the literature,
although a number of closely related problems have been presented in different forms. Oliveira et al. [2019] proposed
a sequential decision problem where both the reward and actions are noisy (execution and localisation noise), with
applications in robotics and stochastic simulations. The contextual combinatorial multi-armed bandit problem (Chen
et al. [2013], Qin et al. [2014]) may be understood as the problem of sequentially maximixing the conditional expectation
of a function, where the chosen super arm may be seen as a multivariate binary action, and the dependency between the
scores of the individual arms and the reward as the unknown function.
For the problem of sequentially maximizing the expectation of a function, Chowdhury et al. [2020] proposed the
Conditional Mean Embeddings Upper Confidence Bound (CME-UCB) algorithm. The CME-UCB uses the conditional
mean embedding for modeling the conditional distribution of the features. In contrast, alternative approaches for such
conditional density estimation scale poorly with the dimension of the underlying space (Grunewalder et al. [2012]).
The conditional mean embedding, key element of the CME-UCB algorithm, extends the concept of kernel mean
embeddings to conditional distributions (Muandet et al. [2017]). In terms of Bayesian procedures, Flaxman et al. [2016]
proposed a Bayesian approach for learning mean embeddings, and Chau et al. [2021a] generalized the approach for
2
arXiv Template A PREPRINT
Bayesian conditional mean embeddings (BayesCME). Such Bayesian learning of mean embeddings requires the concept
of nuclear dominance (Luki´
c and Beder [2001]), which allows for defining a GP with trajectories in a Reproducing
Kernel Hilbert Space with probability 1.
Furthermore, conditional mean processes are now well established (Chau et al. [2021b]), which study the integral
of a Gaussian process (GP) with respect to a conditional distribution. Building on the Bayesian conditional mean
embedding and the conditional mean process, Chau et al. [2021a] proposed the BayesIMP algorithm for tackling a data
fusion problem in a causal inference setting. Such algorithm considers uncertainty derived from two data samples, with
observations of a mediating variable included in both data samples.
3 BACKGROUND
3.1 RKHS
The concept of Reproducing Kernel Hilbert Spaces (RKHS) is widely used in statistics and machine learning (Hofmann
et al. [2008], Gretton et al. [2012], Sejdinovic et al. [2013]).
Definition 1 (Reproducing Kernel Hilbert Space)
Let
X
be a non-empty set and let
H
be a Hilbert space of functions
f:X Rwith inner product ,·iH. A function k:X × X Ris called a reproducing kernel of Hif it satisfies
k(·, x)∈ H x X ,
(the reproducing property) hf, k(·, x)iH=f(x)x X ,f∈ H.
If Hhas a reproducing kernel, then it is called reproducing kernel Hilbert space (RKHS).
Based on the Cauchy-Schwarz inequality, the reproducing property implies that the map
Fx:f∈ H → f(x)
is a
continuous linear form for all
x
in
X
. Therefore, the reproducing property introduces some degree of smoothness
relative to the Hilbert space inner product of the functions considered (Gretton [2013]).
Notation:
Given a non-empty set
X
, we refer to the considered kernel associated to
X
as
kx
. Given
x∈ X
,
the element
k(·, x)∈ H
is also referred as
φx(x)
, motivated by the notation used when understood as a feature
representation of
x
. Given a data vector
x= [x1, ..., xn]T
, feature matrices are defined by stacking feature maps along
the columns
Φx:= [φx(x1), ..., φx(xn)]
. The Gram matrix is denoted as
Kxx = ΦT
xΦx
, and the vector of evaluations
as
kxx= [kx(x, x1), ..., kx(x, xn)]
. Furthermore,
Φx(x) = kT
xx
. The notation is analogously used for the rest of
variables used, such as ky,y,Φyor Kyy for variable y∈ Y.
The Kernel Mean Embedding (KME), which allows for distribution representation based on RHKS, is the cornerstone
of the algorithms that will be discussed in this work.
Definition 2 (Kernel Mean Embedding (KME))
Let
P
be the set of all probability measures on a measurable space
(X,BX)
, and
k:X × X
a reproducing kernel with associated RKHS
H
such that
supx∈X k(x, x)<
. The kernel
mean embedding (KME) of P∈ P with respect to kis defined as the following Bochner integral
µ:P → H,PµP:= Zk(·, x)dP(x).
Kernel
k
is said to be characteristic if
µ
in injective. In such case,
µP
serves as a representation of
P
. The frequently
used Gaussian (RBF), Matérn and Laplace kernels are characteristic. The notion of characteristic kernel is an analogy
to the characteristic function (Fukumizu et al. [2007]). The succeeding lemma immediately follows from the definition
of kernel mean embedding.
Lemma 1
The kernel mean embedding of
P
maps functions
f∈ H
to their mean with respect to
P
through the inner
product:
hµP, fiH=EXP[f(X)].(4)
The conditional mean embedding
µY|X=x
considers the kernel mean embedding of the conditional distribution
PY|X=x
for every x X .
3.2 Bayesian Conditional Mean Embedding
A Bayesian learning framework on conditional embeddings was proposed in Chau et al. [2021a]. In such framework,
the conditional mean embedding
µY|X=x(y)
is modeled by a Gaussian process
µgp(x, y)
. A prior is defined over
µgp GP (0, kxry), and the posterior models µY|X=x(y).
3
摘要:

SEQUENTIALDECISIONMAKINGONUNMATCHEDDATAUSINGBAYESIANKERNELEMBEDDINGSPREPRINT.UNDERREVIEW.DiegoMartinez-TaboadaDepartmentofStatistics&DataScienceCarnegieMellonUniversitydiegomar@andrew.cmu.eduDinoSejdinovicDepartmentofStatisticsUniversityofOxforddino.sejdinovic@stats.ox.ac.ukABSTRACTTheproblemofsequ...

展开>> 收起<<
Sequential Decision Making on Unmatched Data using Bayesian Kernel Embeddings.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:527.38KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注