Vertical Federated Linear Contextual Bandits

2025-05-06 0 0 1.01MB 11 页 10玖币
侵权投诉
Vertical Federated Linear Contextual Bandits
Zeyu Cao
Tencent AI Lab
Shenzhen, China
zc317@cam.ac.uk
Zhipeng Liang
HKUST
Hong Kong, China
zliangao@connect.ust.hk
Shu Zhang
Tencent
Shenzhen, China
bookzhang@tencent.com
Hangyu Li
Tencent
Shenzhen, China
masonhyli@tencent.com
Ouyang Wen
Tencent
Shenzhen, China
gdpouyang@tencent.com
Yu Rong
Tencent AI Lab
Shenzhen, China
royrong@tencent.com
Peilin Zhao
Tencent AI Lab
Shenzhen, China
masonzhao@tencent.com
Bingzhe Wu
Tencent AI Lab
Shenzhen, China
wubingzhe94@gmail.com
ABSTRACT
In this paper, we investigate a novel problem of building contextual
bandits in the vertical federated setting, i.e., contextual information
is vertically distributed over dierent departments. This problem
remains largely unexplored in the research community. To this
end, we carefully design a customized encryption scheme named
orthogonal matrix-based mask mechanism(O3M) for encrypting lo-
cal contextual information while avoiding expensive conventional
cryptographic techniques. We further apply the mechanism to two
commonly-used bandit algorithms, LinUCB and LinTS, and instan-
tiate two practical protocols for online recommendation under the
vertical federated setting. The proposed protocols can perfectly
recover the service quality of centralized bandit algorithms while
achieving a satisfactory runtime eciency, which is theoretically
proved and analyzed in this paper. By conducting extensive ex-
periments on both synthetic and real-world datasets, we show the
superiority of the proposed method in terms of privacy protection
and recommendation performance.
CCS CONCEPTS
Security and privacy Privacy-preserving protocols
;
Com-
puting methodologies Online learning settings.
KEYWORDS
Vertical Federated Learning, Linear Contextual Bandits, Privacy-
Preserving Protocols
1 INTRODUCTION
Personalized recommendation system is one of the fundamental
building blocks of numerous web service platforms such as E-
commerce and online advertising [
14
]. From a technical perspective,
most of these recommendation tasks can be modeled as an online
sequential decision process wherein the system needs to recom-
mend items to the current user based on sequential dynamics (i.e.,
historical user-item interactions and user/item contexts). In such an
Both authors contributed equally to this research.
Work done during internship at Tencent AI Lab.
Corresponding author.
Bandit Algorithm
User Data
in Other Department
Item Pool
Item Recommended User
Compliance
Exploitation Exploration
Figure 1: Overview on using bandit algorithm for recom-
mendation problem in vertical federated setting. The depart-
ment A want to make the recommendation for the cold-start
user 𝑖. Since the user is new to the system, instead of only
exploiting user data in department A, one good strategy to
improve the recommendation results is exploring the user
data from other departments. These department’s data are
illustrated in dierent colors. However, as there are compli-
ance requirement, the recommendation system cannot di-
rectly use the data, and must use privacy preserving tech-
niques during recommendation. This is illustrated with the
lock on the data when the contextual bandit is attempted to
use the data to conduct recommendation.
online process, the cold-start problem is ubiquitous since new users
and items continuously join the online service and these new users
typically come with incomplete proles and they rarely interact
with items [
22
]. Therefore, cold-start recommendation faces two
critical challenges, i.e., lack of user proles/contexts and lack of
user-item interactions.
The key task of cold-start problems is to eciently collect user-
item interactions when one or both of them are new. This task is
challenging since there are two competing goals in such a collection
arXiv:2210.11050v1 [cs.LG] 20 Oct 2022
process, i.e., exploration and exploitation. To solve this problem,
contextual bandit-based approach is seen as a principled way in
which a learning algorithm sequentially recommends an item to
users based on contextual information of the user and item, while
simultaneously adapting its recommendation strategy based on his-
torical user-click feedback to maximize total user clicks in the long
run. Theoretically, seminal contextual bandit algorithms, including
LinUCB [
1
] and LinTS [
3
] are proved to achieve the optimal balance
between exploration and exploitation whereas empirically, these
approaches have shown remarkable performance and have been
deployed into various industrial scenarios for dealing with cold
start recommendation [18, 20, 29].
Despite their remarkable progress, such approaches cannot be
directly employed in many real-world scenarios where the recom-
mendation service provider only has some portions of the user’s
proles while the other user features belong to dierent depart-
ments and cannot be directly shared with the recommendation
provider due to privacy concerns or data regulations. For example,
in the e-commerce recommendation setting, the service provider
(e.g., Amazon and Taobao) can only hold shopping-related informa-
tion while nancial information such as deposits is held by other
departments. This setting is documented as the vertical federated
setting in the literature [
4
]. A question is naturally raised: How to
design the contextual bandits in the vertical federated setting?
To solve this problem, a natural idea is to apply modern pri-
vacy enhancing techniques such as secure multi-party computation
(MPC) [
10
] and dierential privacy (DP) [
8
,
9
] to current contex-
tual bandit approaches. However, this manner typically leads to
either high computation/communication costs or the degradation
of recommendation performance: For example, in the multi-party
computation, the Multiplication operator would lead to 380 times
slower and the comparison operator would even lead to 34000 times
slower [
12
]. Thus directly applying these cryptographic techniques
to the contextual bandit is intractable since there involve large
amounts of multiplication/comparison operations. For dierential
privacy, injecting sucient noise into the contextual information
leads to a signicant loss of the statistics utility and worse recom-
mendation performance [11, 34].
All these cryptographic techniques are originally designed for
general-purpose computations. In this paper, we note that the core
computation operators in the contextual bandits come with unique
properties, thus providing the opportunity to design customized
encryption schemes with better runtime eciency and statistical
utility. In general, the exploration mechanism is at the core of pre-
vious bandit algorithms, which typically use the empirical sample
covariance matrix to construct the quantity needed by the explo-
ration mechanism. Based on the property of the sample covariance
matrix, we propose a technique named
o
rthogonal
m
atrix based
m
ask
m
echanism (O3M) for computing this quantity in a private
and lossless manner, which is motivated by prior works on fed-
erated PCA [
4
]. O3M is used by the data provider for masking
the local contextual information (i.e., user features) before sending
them to the service provider. Specically, O3M encodes the raw
contextual information by multiplying with an orthogonal matrix
(i.e., mask) then the data provider will send the masked data to
the service provider. Since the service provider has no access to
the mask, it cannot infer the original information even though it
knows the masked data. However, due to the mathematical property
of the orthogonal matrix, its eect can be removed when used in
the construction of the key statistic in the bandit algorithms and
thus we can obtain "lossless" performance in a privacy-preserving
manner (see proof details in Section 4). We further apply the O3M
technique to two commonly-used bandit algorithms, LinUCB and
LinTS, and implement two practical protocols named VFUCB and
VFTS for real-world recommendation tasks. For simplicity, in the
following discussions, we refer these protocols as VFCB (Vertical
Federated Contextual Bandit) without contextual ambiguity.
Besides, to help the reader have a better understanding of our
protocols, we conduct comprehensive complexity analysis and pro-
vide formal utility and security proofs in Section 4. By perform-
ing extensive empirical studies on both synthetic and real-world
datasets, we show that the proposed protocols can achieve similar
runtime eciency and the same recommendation performance as
the centralized bandit with a theoretical privacy-protection guaran-
tee. We also point out that incorporating more user features indeed
improves the performance of the contextual bandit in the vertical
setting, which provides insight to improve the recommendation
quality in an online and federated manner.
2 RELATED WORK
2.1 Federated Contextual Bandits
Recently a few works have explored the concept of federated ban-
dits [
2
,
21
,
27
,
27
,
35
]. Among the federated bandits with linear
reward structures, Huang et al
.
consider the case where individual
clients face dierent K-armed stochastic bandits coupled through
common global parameters [
16
]. They propose a collaborative al-
gorithm to cope with the heterogeneity across clients without ex-
changing local feature vectors or raw data. Dubey and Pentland
design dierentially private mechanisms for tackling centralized
and decentralized (peer-to-peer) federated learning [
7
]. Tewari and
Murphy consider a fully-decentralized federated bandit learning
setting with gossiping, where an individual agent only has access
to biased rewards, and agents can only exchange their observed
rewards with their neighbor agents [
30
]. Li and Wang study the
federated linear contextual bandits problem with heterogeneous
clients respectively, i.e., the clients have various response times
and even occasional unavailability in reality. They propose an asyn-
chronous event-triggered communication framework to improve
our method’s robustness against possible delays and the temporary
unavailability of clients [
19
]. Shi et al
.
consider a similar setting to
Tewari and Murphy’s setting but the nal payo is a mixture of the
local and global model [
28
]. All the above-mentioned paper belong
to the horizontal federated bandits, i.e., each client has heteroge-
neous users and all the clients jointly train a central bandit model
without exchanging its own raw users’ features. Since each user
has her features all stored by one client, this horizontal federated
setting has a sharp contrast with the VFL setting. As far as we know,
our paper is the rst to consider the contextual bandit problem in
the VFL setting.
2.2 Vertical Federated Learning
Vertical Federated Learning (VFL), aiming at promoting collabora-
tions among non-competing organizations/entities with vertically
2
partitioned data, has seen its huge potential in applications [
32
]
but still remains relatively less explored. Most of the existing work
focuses on supervised learning and unsupervised learning settings.
For unsupervised learning problems, Chai et al
.
propose the rst
masking-based VFL singular vector decomposition (SVD) method.
Their method recovers to the standard SVD algorithm without
sacricing any computation time or communication overhead [
4
].
Cheung et al
.
propose the federated principal component analysis
and advanced kernel principal component analysis for vertically
partitioned datasets [
6
]. For supervised learning problems, Chen
et al
.
solves vertical FL in an asynchronous fashion, and their solu-
tion allows each client to run stochastic gradient algorithms without
coordination with other clients. Hardy et al
.
propose a VFL variant
of logistic regression using homomorphic encryption [
13
]. Wu et al
.
propose a VF decision tree without dependence on any trusted third
party [
33
]. Hu et al
.
designed an asynchronous stochastic gradient
descent algorithm to collaboratively train a central model in a VFL
manner [
15
]. Romanini et al
.
introduce a framework to train neural
network in the VFL setting using split neural networks [25].
3 PRELIMINARY
3.1 Problem Description
We rst introduce the denition of Linear Contextual Bandits.
Denition 3.1 (Linear Contextual Bandits). Consider a sequential
decision process, where at each time step
𝑡
, the environment would
release a set of contexts
{𝑥𝑡,𝑎 R𝑑}𝑎A
for some x
A
as the
action set. If context
𝑥𝑡,𝑎𝑡
is been selected, then the environment
would release a reward
𝑟𝑡=𝑥
𝑡,𝑎𝑡𝜃+𝜖𝑡
for some reward gener-
ating parameter
𝜃R𝑑
and i.i.d. noise
𝜖𝑡
. A bandit algorithm
𝐴
is dened as a sequence of maps
{𝜋𝑡}𝑇
𝑡=1
. Each map
𝜋𝑡
is from
the historical information,
{(𝑥𝑠,𝑎𝑠, 𝑟𝑠)}𝑡1
𝑠=1
and the current context
{𝑥𝑡,𝑎 }𝑎∈ A , to the distribution over the action set, Δ(A). The goal
for the bandits algorithm is to minimize the long time regret, where
regret is the gap between best possible achievable rewards and the
rewards obtained by the algorithm. The regret is formally dened
as:
𝑅(𝑇)=E[
𝑇
𝑡=1
𝑟𝑡,𝑎
𝑡]E[
𝑇
𝑡=1
𝑟𝑡,𝑎𝑡],
where
𝑎
𝑡=arg max𝑎A𝑡𝑥
𝑡,𝑎𝑡𝜃
denotes the arm with the optimal
reward.
The task of personalized recommendation can be naturally mod-
eled as a contextual bandit problem. At a high level, the contextual
bandit aims to sequentially select arms (i.e., recommend items) to
serve users when arrived in an online stream. The core of the bandit
algorithm is an arm-selection strategy which is based on contextual
information of the user (e.g., age and hobbies) and can be updated
using user-item historical feedback (click or not). The goal of the
algorithm is to maximize the total reward (i.e., total user clicks) in
the long run. In this paper, we consider the most widely studied
linear contextual bandits, which is formally dened as follow.
Figure 2 shows a high-level comparison between centralized
and vertical federated setting. In summary, user-item contextual
information are collected by dierent participants and the gure
uses dierent colors to distinguish the source of the information
[][][][] [ ]
Privacy Protected Usage
(a) Central Bandit (b) Vertical Federated Bandit
?
Participant 1 Participant 2 Participant 3 Participant 4
1
2
3
4
1
2
3
4
Direct Usage
Central
Algorithm
Select Item k Select Item k
VFL
Algorithm
Context 𝑥𝑡,𝑎 Contexts 𝑥𝑡,𝑎 𝑗
Service Provider Service Provider
Participant 1 Participant 2 Participant 3 Participant 4
Figure 2: Comparison between centralized(a) and vertical
federated(b) setting.
(e.g., information marked with green color are from participant 1).
The number of circles denotes the specic feature value. Each row
of the table contains the contextual information of one given user
(denoted as
𝑥𝑡,𝑎
). The top box denotes the service provider, which
is responsible for building contextual bandits for recommending
items to users
1
. For centralized setting (Figure 2a), service provider
has access to all contextual information even though they belong to
dierent participants. In contrast, in the vertical federated setting
(Figure 2b), the service provider cannot access the information
directly from other participants due to privacy issues (there are
locks between dierent columns in Figure 2b). Therefore, these
information need to be used in a privacy-preserving way which is
the main goal of this paper. Specically, we present an eective yet
lossless way to use these information while protecting the users’
privacy. We will present our solution in Section 4.
3.2 Contextual Bandit in the Centralized
Setting
We rst introduce the most commonly-used contextual bandit algo-
rithms, LinUCB and LinTS in the centralized setting, i.e., all user’s
contextual information can be fully accessed by the service provider.
In this setting, the recommendation service provider runs the
linear contextual algorithm
𝐴
which proceeds in discrete trials
indexed by time
𝑡=
1
,
2
,
3
,· · · ,𝑇
. In trial
𝑡
, the learning procedure
can be formulated as the following steps:
(1)
A observes an user
𝑢𝑡
and current arm set
A𝑡=
[
𝐾
] consisting
of
𝐾
dierent arms
𝑎
(e.g., items for recommendation). The con-
text
𝑥𝑡,𝑎
can be seen as a feature vector describing the contextual
information of both user 𝑢𝑡and the given arm 𝑎.
(2) 𝐴
chooses arm based on the estimated value
𝑣𝑡,𝑎
for each each
arm
𝑎𝑡=arg max
𝑎A𝑡
𝑣𝑡,𝑎 .(1)
For LinUCB, detailed in the left part of Figure 2,
𝑣𝑡,𝑎
is called
UCB value with denition
𝑣𝑡,𝑎 =ˆ
𝑟𝑡,𝑎 +ˆ
𝐵𝑡,𝑎
, where
ˆ
𝑟𝑡,𝑎 B𝑥
𝑡,𝑎 ˆ
𝜃𝑡
1
In Figure 2, the service provider is distinct from the data providers for simplicity. In
practice, the service provider can also provide partial contextual information.
3
摘要:

VerticalFederatedLinearContextualBanditsZeyuCao∗TencentAILabShenzhen,Chinazc317@cam.ac.ukZhipengLiang∗†HKUSTHongKong,Chinazliangao@connect.ust.hkShuZhangTencentShenzhen,Chinabookzhang@tencent.comHangyuLiTencentShenzhen,Chinamasonhyli@tencent.comOuyangWenTencentShenzhen,Chinagdpouyang@tencent.comYuRo...

展开>> 收起<<
Vertical Federated Linear Contextual Bandits.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:1.01MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注