Sequential Decision Making on Unmatched Data using Bayesian Kernel Embeddings

2025-05-03 1 0 527.38KB 11 页 10玖币

侵权投诉

SEQUENTIAL DECISION MAKING ON UNMATCHED DATA USING

BAYESIAN KERNEL EMBEDDINGS

PREPRINT. UNDER REVIEW.

Diego Martinez-Taboada∗

Department of Statistics & Data Science

Carnegie Mellon University

diegomar@andrew.cmu.edu

Dino Sejdinovic

Department of Statistics

University of Oxford

dino.sejdinovic@stats.ox.ac.uk

ABSTRACT

The problem of sequentially maximizing the expectation of a function seeks to maximize the expected

value of a function of interest without having direct control on its features. Instead, the distribution of

such features depends on a given context and an action taken by an agent. In contrast to Bayesian

optimization, the arguments of the function are not under agent’s control, but are indirectly determined

by the agent’s action based on a given context. If the information of the features is to be included in

the maximization problem, the full conditional distribution of such features, rather than its expectation

only, needs to be accounted for. Furthermore, the function is itself unknown, only counting with noisy

observations of such function, and potentially requiring the use of unmatched data sets. We propose

a novel algorithm for the aforementioned problem which takes into consideration the uncertainty

derived from the estimation of both the conditional distribution of the features and the unknown

function, by modeling the former as a Bayesian conditional mean embedding and the latter as a

Gaussian process. Our algorithm empirically outperforms the current state-of-the-art algorithm in the

experiments conducted.

1 INTRODUCTION

Uncertainty quantiﬁcation is inherent to sequential decision making problems, where an agent sequentially explores a

set of possible actions while seeking to maximize the associated reward (Alagoz et al. [2010], Sutton and Barto [2018]).

The algorithms designed for such problems are based on an exploration-exploitation trade-off: the agent tries to exploit

its knowledge on the actions that have yielded high rewards, but it also seeks to explore actions that count with little

information (Macready and Wolpert [1998], Audibert et al. [2009]).

The multi-armed bandit (MAB) is one of the sequential decision problems that has received the most attention in

the literature (Vermorel and Mohri [2005], Slivkins et al. [2019]). The classical MAB problem is deﬁned by a tuple

hA, FRi

, where

is the state of actions and

the reward distribution. At time step

t∈N

, the agent decides to take

an action

at∈ A

. A reward

rat

drawn from

rat∼FR(·|at)

follows. The MAB aims at maximizing the cumulative

reward

JT=E"T

t=1

rat#,(1)

where

are the actions sequentially taken. The contextual multi-armed bandit problem (CMAB) generalizes the MAB

by including a state space (or context space)

(Lu et al. [2010], Zhou [2015]). Formally, the CMAB problem is

determined by a tuple hS,A, FRi, and it aims at maximizing

JT=E"T

t=1

rst,at#,(2)

where rst,at∼FR(·|st, at), and stare sequentially given.

∗Work primarily done at the University of Oxford and ﬁnished at Carnegie Mellon University.

arXiv:2210.13692v1 [stat.ML] 25 Oct 2022

arXiv Template A PREPRINT

However, one may be interested in optimizing the expectation of a function

f(r)

, instead of

itself. In other words,

may be seen as an intermediate reward that mediates between the agent’s actions

and the ultimate reward

f(rt)

. In

various ﬁelds of application, it is usual to study a function of the outcome, rather than the outcome itself. For instance,

the conditional value at risk (CVaR), which has seen extensive use in ﬁnancial portfolio optimization (Rockafellar et al.

[2000], Zhu and Fukushima [2009]), considers

f(r) = r1r≤rα

. The median and other quantiles may be of interest for

noisy data (Even-Dar et al. [2002], Altschuler et al. [2018]).

In more general settings, the function fmay be itself unknown. Noisy data of the form

(ri, f(ri) + ξi)M

i=1 (3)

may be the only available information on such dependency. In this case, uncertainty raises from the lack of information

between the traditional CMAB variables r, a, s, but also from the unknown function f.

Please note that this scenario opens the door for considering multiple sources of data. In contrast to the usual matched

scenario, where the sequential data is of the form

(si, ai, ri, f(ri) + ξi)t

i=1

, it may be of interest to consider an

unmatched scenario, where a second data set of the form

(rj, f(rj) + ξj)M

j=1

is given on top of the sequential data.

Algorithms that handle unmatched data sets have currently been attracting more and more interest in the so called data

fusion problem (Meng et al. [2020]).

For instance, a new movie recommendation system might obtain a multivariate

consisting on: whether the user

showed interest in the recommendation, the number of cliques after the recommendation screen, the time spent reading

the description of the recommendation, etc. The ﬁnal reward

f(rj) + ξj

is given by the number of minutes that the

client ended up watching the content for. A rich data set

(rj, f(rj) + ξj)M

j=1

might be obtained from other platforms

that have been longer in the market. Not considering this second data set would imply a substantial loss of information

which would to suboptimal approaches.

In this work, we revisit the problem of sequentially maximising the conditional expectation of a function in the CMAB

framework. Our contributions are two-folded:

•

We design a novel algorithm, namely Contextual Bayesian Mean Process Upper Conﬁdence Bound, which

considers two sources of uncertainty derived from the lack of information on

r, a, s

, as well as the unknown

function f. It allows for considering both matched and unmatched data sets.

•

We empirically show that the Bayesian Mean Process Upper Conﬁdence Bound outperforms the state-of-the-

art Conditional Mean Embeddings Upper Conﬁdence Bound (Chowdhury et al. [2020]) in the experimental

settings considered.

2 RELATED WORK

Multiple algorithms have been proposed for addressing the CMAB problem. The contextual Gaussian process upper

conﬁdence bound (CGP-UCB), which motivates the algorithm proposed in this work, was ﬁrst presented in Krause and

Ong [2011]. It generalizes the context-free Gaussian process upper conﬁdence bound (GP-UCB) introduced in Srinivas

et al. [2009]. The CGP-UCB attains sublinear contextual regret in many real life applications i.e. it is able to compete

with the optimal mapping from contexts to actions. However, please note that there exist prominent alternatives to

CGP-UCB such as Thompson sampling (Agrawal and Goyal [2013]). Some algorithms designed for the MAB problem

(Chowdhury and Gopalan [2017]) may also be adapted to the CMAB framework.

The problem of sequentially maximising the conditional expectation has not received as much attention in the literature,

although a number of closely related problems have been presented in different forms. Oliveira et al. [2019] proposed

a sequential decision problem where both the reward and actions are noisy (execution and localisation noise), with

applications in robotics and stochastic simulations. The contextual combinatorial multi-armed bandit problem (Chen

et al. [2013], Qin et al. [2014]) may be understood as the problem of sequentially maximixing the conditional expectation

of a function, where the chosen super arm may be seen as a multivariate binary action, and the dependency between the

scores of the individual arms and the reward as the unknown function.

For the problem of sequentially maximizing the expectation of a function, Chowdhury et al. [2020] proposed the

Conditional Mean Embeddings Upper Conﬁdence Bound (CME-UCB) algorithm. The CME-UCB uses the conditional

mean embedding for modeling the conditional distribution of the features. In contrast, alternative approaches for such

conditional density estimation scale poorly with the dimension of the underlying space (Grunewalder et al. [2012]).

The conditional mean embedding, key element of the CME-UCB algorithm, extends the concept of kernel mean

embeddings to conditional distributions (Muandet et al. [2017]). In terms of Bayesian procedures, Flaxman et al. [2016]

proposed a Bayesian approach for learning mean embeddings, and Chau et al. [2021a] generalized the approach for

arXiv Template A PREPRINT

Bayesian conditional mean embeddings (BayesCME). Such Bayesian learning of mean embeddings requires the concept

of nuclear dominance (Luki´

c and Beder [2001]), which allows for deﬁning a GP with trajectories in a Reproducing

Kernel Hilbert Space with probability 1.

Furthermore, conditional mean processes are now well established (Chau et al. [2021b]), which study the integral

of a Gaussian process (GP) with respect to a conditional distribution. Building on the Bayesian conditional mean

embedding and the conditional mean process, Chau et al. [2021a] proposed the BayesIMP algorithm for tackling a data

fusion problem in a causal inference setting. Such algorithm considers uncertainty derived from two data samples, with

observations of a mediating variable included in both data samples.

3 BACKGROUND

3.1 RKHS

The concept of Reproducing Kernel Hilbert Spaces (RKHS) is widely used in statistics and machine learning (Hofmann

et al. [2008], Gretton et al. [2012], Sejdinovic et al. [2013]).

Deﬁnition 1 (Reproducing Kernel Hilbert Space)

Let

be a non-empty set and let

be a Hilbert space of functions

f:X → Rwith inner product h·,·iH. A function k:X × X → Ris called a reproducing kernel of Hif it satisﬁes

•k(·, x)∈ H ∀x∈ X ,

•(the reproducing property) hf, k(·, x)iH=f(x)∀x∈ X ,∀f∈ H.

If Hhas a reproducing kernel, then it is called reproducing kernel Hilbert space (RKHS).

Based on the Cauchy-Schwarz inequality, the reproducing property implies that the map

Fx:f∈ H → f(x)

is a

continuous linear form for all

. Therefore, the reproducing property introduces some degree of smoothness

relative to the Hilbert space inner product of the functions considered (Gretton [2013]).

Notation:

Given a non-empty set

, we refer to the considered kernel associated to

. Given

x∈ X

the element

k(·, x)∈ H

is also referred as

φx(x)

, motivated by the notation used when understood as a feature

representation of

. Given a data vector

x= [x1, ..., xn]T

, feature matrices are deﬁned by stacking feature maps along

the columns

Φx:= [φx(x1), ..., φx(xn)]

. The Gram matrix is denoted as

Kxx = ΦT

xΦx

, and the vector of evaluations

kxx= [kx(x, x1), ..., kx(x, xn)]

. Furthermore,

Φx(x) = kT

. The notation is analogously used for the rest of

variables used, such as ky,y,Φyor Kyy for variable y∈ Y.

The Kernel Mean Embedding (KME), which allows for distribution representation based on RHKS, is the cornerstone

of the algorithms that will be discussed in this work.

Deﬁnition 2 (Kernel Mean Embedding (KME))

Let

be the set of all probability measures on a measurable space

(X,BX)

, and

k:X × X

a reproducing kernel with associated RKHS

such that

supx∈X k(x, x)<∞

. The kernel

mean embedding (KME) of P∈ P with respect to kis deﬁned as the following Bochner integral

µ:P → H,P→µP:= Zk(·, x)dP(x).

Kernel

is said to be characteristic if

in injective. In such case,

µP

serves as a representation of

. The frequently

used Gaussian (RBF), Matérn and Laplace kernels are characteristic. The notion of characteristic kernel is an analogy

to the characteristic function (Fukumizu et al. [2007]). The succeeding lemma immediately follows from the deﬁnition

of kernel mean embedding.

Lemma 1

The kernel mean embedding of

maps functions

f∈ H

to their mean with respect to

through the inner

product:

hµP, fiH=EX∼P[f(X)].(4)

The conditional mean embedding

µY|X=x

considers the kernel mean embedding of the conditional distribution

PY|X=x

for every x∈ X .

3.2 Bayesian Conditional Mean Embedding

A Bayesian learning framework on conditional embeddings was proposed in Chau et al. [2021a]. In such framework,

the conditional mean embedding

µY|X=x(y)

is modeled by a Gaussian process

µgp(x, y)

. A prior is deﬁned over

µgp ∼GP (0, kx⊗ry), and the posterior models µY|X=x(y).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SEQUENTIALDECISIONMAKINGONUNMATCHEDDATAUSINGBAYESIANKERNELEMBEDDINGSPREPRINT.UNDERREVIEW.DiegoMartinez-TaboadaDepartmentofStatistics&DataScienceCarnegieMellonUniversitydiegomar@andrew.cmu.eduDinoSejdinovicDepartmentofStatisticsUniversityofOxforddino.sejdinovic@stats.ox.ac.ukABSTRACTTheproblemofsequ...

展开>> 收起<<

Sequential Decision Making on Unmatched Data using Bayesian Kernel Embeddings.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Sequential Decision Making on Unmatched Data using Bayesian Kernel Embeddings

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: