Vertical Federated Linear Contextual Bandits

2025-05-06 0 0 1.01MB 11 页 10玖币

侵权投诉

Zeyu Cao∗

Tencent AI Lab

Shenzhen, China

zc317@cam.ac.uk

Zhipeng Liang∗†

HKUST

Hong Kong, China

zliangao@connect.ust.hk

Shu Zhang

Tencent

Shenzhen, China

bookzhang@tencent.com

Hangyu Li

Tencent

Shenzhen, China

masonhyli@tencent.com

Ouyang Wen

Tencent

Shenzhen, China

gdpouyang@tencent.com

Yu Rong

Tencent AI Lab

Shenzhen, China

royrong@tencent.com

Peilin Zhao

Tencent AI Lab

Shenzhen, China

masonzhao@tencent.com

Bingzhe Wu‡

Tencent AI Lab

Shenzhen, China

wubingzhe94@gmail.com

ABSTRACT

In this paper, we investigate a novel problem of building contextual

bandits in the vertical federated setting, i.e., contextual information

is vertically distributed over dierent departments. This problem

remains largely unexplored in the research community. To this

end, we carefully design a customized encryption scheme named

orthogonal matrix-based mask mechanism(O3M) for encrypting lo-

cal contextual information while avoiding expensive conventional

cryptographic techniques. We further apply the mechanism to two

commonly-used bandit algorithms, LinUCB and LinTS, and instan-

tiate two practical protocols for online recommendation under the

vertical federated setting. The proposed protocols can perfectly

recover the service quality of centralized bandit algorithms while

achieving a satisfactory runtime eciency, which is theoretically

proved and analyzed in this paper. By conducting extensive ex-

periments on both synthetic and real-world datasets, we show the

superiority of the proposed method in terms of privacy protection

and recommendation performance.

CCS CONCEPTS

•Security and privacy →Privacy-preserving protocols

;

•Com-

puting methodologies →Online learning settings.

KEYWORDS

Vertical Federated Learning, Linear Contextual Bandits, Privacy-

Preserving Protocols

1 INTRODUCTION

Personalized recommendation system is one of the fundamental

building blocks of numerous web service platforms such as E-

commerce and online advertising [

]. From a technical perspective,

most of these recommendation tasks can be modeled as an online

sequential decision process wherein the system needs to recom-

mend items to the current user based on sequential dynamics (i.e.,

historical user-item interactions and user/item contexts). In such an

∗Both authors contributed equally to this research.

†Work done during internship at Tencent AI Lab.

‡Corresponding author.

Bandit Algorithm

User Data

in Other Department

Item Pool

Item Recommended User

Compliance

Exploitation Exploration

Figure 1: Overview on using bandit algorithm for recom-

mendation problem in vertical federated setting. The depart-

ment A want to make the recommendation for the cold-start

user 𝑖. Since the user is new to the system, instead of only

exploiting user data in department A, one good strategy to

improve the recommendation results is exploring the user

data from other departments. These department’s data are

illustrated in dierent colors. However, as there are compli-

ance requirement, the recommendation system cannot di-

rectly use the data, and must use privacy preserving tech-

niques during recommendation. This is illustrated with the

lock on the data when the contextual bandit is attempted to

use the data to conduct recommendation.

online process, the cold-start problem is ubiquitous since new users

and items continuously join the online service and these new users

typically come with incomplete proles and they rarely interact

with items [

]. Therefore, cold-start recommendation faces two

critical challenges, i.e., lack of user proles/contexts and lack of

user-item interactions.

The key task of cold-start problems is to eciently collect user-

item interactions when one or both of them are new. This task is

challenging since there are two competing goals in such a collection

arXiv:2210.11050v1 [cs.LG] 20 Oct 2022

process, i.e., exploration and exploitation. To solve this problem,

contextual bandit-based approach is seen as a principled way in

which a learning algorithm sequentially recommends an item to

users based on contextual information of the user and item, while

simultaneously adapting its recommendation strategy based on his-

torical user-click feedback to maximize total user clicks in the long

run. Theoretically, seminal contextual bandit algorithms, including

LinUCB [

] and LinTS [

] are proved to achieve the optimal balance

between exploration and exploitation whereas empirically, these

approaches have shown remarkable performance and have been

deployed into various industrial scenarios for dealing with cold

start recommendation [18, 20, 29].

Despite their remarkable progress, such approaches cannot be

directly employed in many real-world scenarios where the recom-

mendation service provider only has some portions of the user’s

proles while the other user features belong to dierent depart-

ments and cannot be directly shared with the recommendation

provider due to privacy concerns or data regulations. For example,

in the e-commerce recommendation setting, the service provider

(e.g., Amazon and Taobao) can only hold shopping-related informa-

tion while nancial information such as deposits is held by other

departments. This setting is documented as the vertical federated

setting in the literature [

]. A question is naturally raised: How to

design the contextual bandits in the vertical federated setting?

To solve this problem, a natural idea is to apply modern pri-

vacy enhancing techniques such as secure multi-party computation

(MPC) [

] and dierential privacy (DP) [

] to current contex-

tual bandit approaches. However, this manner typically leads to

either high computation/communication costs or the degradation

of recommendation performance: For example, in the multi-party

computation, the Multiplication operator would lead to 380 times

slower and the comparison operator would even lead to 34000 times

slower [

]. Thus directly applying these cryptographic techniques

to the contextual bandit is intractable since there involve large

amounts of multiplication/comparison operations. For dierential

privacy, injecting sucient noise into the contextual information

leads to a signicant loss of the statistics utility and worse recom-

mendation performance [11, 34].

All these cryptographic techniques are originally designed for

general-purpose computations. In this paper, we note that the core

computation operators in the contextual bandits come with unique

properties, thus providing the opportunity to design customized

encryption schemes with better runtime eciency and statistical

utility. In general, the exploration mechanism is at the core of pre-

vious bandit algorithms, which typically use the empirical sample

covariance matrix to construct the quantity needed by the explo-

ration mechanism. Based on the property of the sample covariance

matrix, we propose a technique named

rthogonal

atrix based

ask

echanism (O3M) for computing this quantity in a private

and lossless manner, which is motivated by prior works on fed-

erated PCA [

]. O3M is used by the data provider for masking

the local contextual information (i.e., user features) before sending

them to the service provider. Specically, O3M encodes the raw

contextual information by multiplying with an orthogonal matrix

(i.e., mask) then the data provider will send the masked data to

the service provider. Since the service provider has no access to

the mask, it cannot infer the original information even though it

knows the masked data. However, due to the mathematical property

of the orthogonal matrix, its eect can be removed when used in

the construction of the key statistic in the bandit algorithms and

thus we can obtain "lossless" performance in a privacy-preserving

manner (see proof details in Section 4). We further apply the O3M

technique to two commonly-used bandit algorithms, LinUCB and

LinTS, and implement two practical protocols named VFUCB and

VFTS for real-world recommendation tasks. For simplicity, in the

following discussions, we refer these protocols as VFCB (Vertical

Federated Contextual Bandit) without contextual ambiguity.

Besides, to help the reader have a better understanding of our

protocols, we conduct comprehensive complexity analysis and pro-

vide formal utility and security proofs in Section 4. By perform-

ing extensive empirical studies on both synthetic and real-world

datasets, we show that the proposed protocols can achieve similar

runtime eciency and the same recommendation performance as

the centralized bandit with a theoretical privacy-protection guaran-

tee. We also point out that incorporating more user features indeed

improves the performance of the contextual bandit in the vertical

setting, which provides insight to improve the recommendation

quality in an online and federated manner.

2 RELATED WORK

2.1 Federated Contextual Bandits

Recently a few works have explored the concept of federated ban-

dits [

]. Among the federated bandits with linear

reward structures, Huang et al

consider the case where individual

clients face dierent K-armed stochastic bandits coupled through

common global parameters [

]. They propose a collaborative al-

gorithm to cope with the heterogeneity across clients without ex-

changing local feature vectors or raw data. Dubey and Pentland

design dierentially private mechanisms for tackling centralized

and decentralized (peer-to-peer) federated learning [

]. Tewari and

Murphy consider a fully-decentralized federated bandit learning

setting with gossiping, where an individual agent only has access

to biased rewards, and agents can only exchange their observed

rewards with their neighbor agents [

]. Li and Wang study the

federated linear contextual bandits problem with heterogeneous

clients respectively, i.e., the clients have various response times

and even occasional unavailability in reality. They propose an asyn-

chronous event-triggered communication framework to improve

our method’s robustness against possible delays and the temporary

unavailability of clients [

]. Shi et al

consider a similar setting to

Tewari and Murphy’s setting but the nal payo is a mixture of the

local and global model [

]. All the above-mentioned paper belong

to the horizontal federated bandits, i.e., each client has heteroge-

neous users and all the clients jointly train a central bandit model

without exchanging its own raw users’ features. Since each user

has her features all stored by one client, this horizontal federated

setting has a sharp contrast with the VFL setting. As far as we know,

our paper is the rst to consider the contextual bandit problem in

the VFL setting.

2.2 Vertical Federated Learning

Vertical Federated Learning (VFL), aiming at promoting collabora-

tions among non-competing organizations/entities with vertically

partitioned data, has seen its huge potential in applications [

]

but still remains relatively less explored. Most of the existing work

focuses on supervised learning and unsupervised learning settings.

For unsupervised learning problems, Chai et al

propose the rst

masking-based VFL singular vector decomposition (SVD) method.

Their method recovers to the standard SVD algorithm without

sacricing any computation time or communication overhead [

Cheung et al

propose the federated principal component analysis

and advanced kernel principal component analysis for vertically

partitioned datasets [

]. For supervised learning problems, Chen

et al

solves vertical FL in an asynchronous fashion, and their solu-

tion allows each client to run stochastic gradient algorithms without

coordination with other clients. Hardy et al

propose a VFL variant

of logistic regression using homomorphic encryption [

]. Wu et al

propose a VF decision tree without dependence on any trusted third

party [

]. Hu et al

designed an asynchronous stochastic gradient

descent algorithm to collaboratively train a central model in a VFL

manner [

]. Romanini et al

introduce a framework to train neural

network in the VFL setting using split neural networks [25].

3 PRELIMINARY

3.1 Problem Description

We rst introduce the denition of Linear Contextual Bandits.

Denition 3.1 (Linear Contextual Bandits). Consider a sequential

decision process, where at each time step

𝑡

, the environment would

release a set of contexts

{𝑥𝑡,𝑎 ∈R𝑑}𝑎∈A

for some x

as the

action set. If context

𝑥𝑡,𝑎𝑡

is been selected, then the environment

would release a reward

𝑟𝑡=𝑥⊤

𝑡,𝑎𝑡𝜃∗+𝜖𝑡

for some reward gener-

ating parameter

𝜃∗∈R𝑑

and i.i.d. noise

𝜖𝑡

. A bandit algorithm

𝐴

is dened as a sequence of maps

{𝜋𝑡}𝑇

𝑡=1

. Each map

𝜋𝑡

is from

the historical information,

{(𝑥𝑠,𝑎𝑠, 𝑟𝑠)}𝑡−1

𝑠=1

and the current context

{𝑥𝑡,𝑎 }𝑎∈ A , to the distribution over the action set, Δ(A). The goal

for the bandits algorithm is to minimize the long time regret, where

regret is the gap between best possible achievable rewards and the

rewards obtained by the algorithm. The regret is formally dened

as:

𝑅(𝑇)=E[

𝑇



𝑡=1

𝑟𝑡,𝑎∗

𝑡]−E[

𝑇



𝑡=1

𝑟𝑡,𝑎𝑡],

where

𝑎∗

𝑡=arg max𝑎∈A𝑡𝑥⊤

𝑡,𝑎𝑡𝜃∗

denotes the arm with the optimal

reward.

The task of personalized recommendation can be naturally mod-

eled as a contextual bandit problem. At a high level, the contextual

bandit aims to sequentially select arms (i.e., recommend items) to

serve users when arrived in an online stream. The core of the bandit

algorithm is an arm-selection strategy which is based on contextual

information of the user (e.g., age and hobbies) and can be updated

using user-item historical feedback (click or not). The goal of the

algorithm is to maximize the total reward (i.e., total user clicks) in

the long run. In this paper, we consider the most widely studied

linear contextual bandits, which is formally dened as follow.

Figure 2 shows a high-level comparison between centralized

and vertical federated setting. In summary, user-item contextual

information are collected by dierent participants and the gure

uses dierent colors to distinguish the source of the information

…………

………

…

[][][][] [ ]

Privacy Protected Usage

(a) Central Bandit (b) Vertical Federated Bandit

Participant 1 Participant 2 Participant 3 Participant 4

Direct Usage

Central

Algorithm

Select Item k Select Item k

VFL

Algorithm

Context 𝑥𝑡,𝑎 Contexts 𝑥𝑡,𝑎 𝑗

Service Provider Service Provider

Participant 1 Participant 2 Participant 3 Participant 4

Figure 2: Comparison between centralized(a) and vertical

federated(b) setting.

(e.g., information marked with green color are from participant 1).

The number of circles denotes the specic feature value. Each row

of the table contains the contextual information of one given user

(denoted as

𝑥𝑡,𝑎

). The top box denotes the service provider, which

is responsible for building contextual bandits for recommending

items to users

. For centralized setting (Figure 2a), service provider

has access to all contextual information even though they belong to

dierent participants. In contrast, in the vertical federated setting

(Figure 2b), the service provider cannot access the information

directly from other participants due to privacy issues (there are

locks between dierent columns in Figure 2b). Therefore, these

information need to be used in a privacy-preserving way which is

the main goal of this paper. Specically, we present an eective yet

lossless way to use these information while protecting the users’

privacy. We will present our solution in Section 4.

3.2 Contextual Bandit in the Centralized

Setting

We rst introduce the most commonly-used contextual bandit algo-

rithms, LinUCB and LinTS in the centralized setting, i.e., all user’s

contextual information can be fully accessed by the service provider.

In this setting, the recommendation service provider runs the

linear contextual algorithm

𝐴

which proceeds in discrete trials

indexed by time

𝑡=

,· · · ,𝑇

. In trial

𝑡

, the learning procedure

can be formulated as the following steps:

(1)

A observes an user

𝑢𝑡

and current arm set

A𝑡=

[

𝐾

] consisting

𝐾

dierent arms

𝑎

(e.g., items for recommendation). The con-

text

𝑥𝑡,𝑎

can be seen as a feature vector describing the contextual

information of both user 𝑢𝑡and the given arm 𝑎.

(2) 𝐴

chooses arm based on the estimated value

𝑣𝑡,𝑎

for each each

arm

𝑎𝑡=arg max

𝑎∈A𝑡

𝑣𝑡,𝑎 .(1)

For LinUCB, detailed in the left part of Figure 2,

𝑣𝑡,𝑎

is called

UCB value with denition

𝑣𝑡,𝑎 =ˆ

𝑟𝑡,𝑎 +ˆ

𝐵𝑡,𝑎

, where

𝑟𝑡,𝑎 B𝑥⊤

𝑡,𝑎 ˆ

𝜃𝑡

In Figure 2, the service provider is distinct from the data providers for simplicity. In

practice, the service provider can also provide partial contextual information.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

VerticalFederatedLinearContextualBanditsZeyuCao∗TencentAILabShenzhen,Chinazc317@cam.ac.ukZhipengLiang∗†HKUSTHongKong,Chinazliangao@connect.ust.hkShuZhangTencentShenzhen,Chinabookzhang@tencent.comHangyuLiTencentShenzhen,Chinamasonhyli@tencent.comOuyangWenTencentShenzhen,Chinagdpouyang@tencent.comYuRo...

展开>> 收起<<

Vertical Federated Linear Contextual Bandits.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Vertical Federated Linear Contextual Bandits

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: