Empowering Long-tail Item Recommendation through Cross Decoupling Network CDN

2025-05-03 0 0 1.33MB 10 页 10玖币
侵权投诉
Empowering Long-tail Item Recommendation through Cross
Decoupling Network (CDN)
Yin Zhang
yinzh@google.com
Google Research, Brain Team, USA
Ruoxi Wang
ruoxi@google.com
Google Research, Brain Team, USA
Tiansheng Yao
tyao@google.com
Google Research, Brain Team, USA
Xinyang Yi
xinyang@google.com
Google Research, Brain Team, USA
Lichan Hong
lichan@google.com
Google Research, Brain Team, USA
James Caverlee
caverlee@cse.tamu.edu
Texas A&M University, USA
Ed H. Chi
edchi@google.com
Google Research, Brain Team, USA
Derek Zhiyuan Cheng
zcheng@google.com
Google Research, Brain Team, USA
ABSTRACT
Industry recommender systems usually suer from highly-skewed
long-tail item distributions where a small fraction of the items
receives most of the user feedback. This skew hurts recommender
quality especially for the item slices without much user feedback.
While there have been many research advances made in academia,
deploying these methods in production is very dicult and very
few improvements have been made in industry. One challenge is
that these methods often hurt overall performance; additionally,
they could be complex and expensive to train and serve.
In this work, we aim to improve tail item recommendations
while maintaining the overall performance with less training and
serving cost. We rst nd that the predictions of user preferences
are biased under long-tail distributions. The bias comes from the
dierences between training and serving data in two perspectives:
1) the item distributions, and 2) user’s preference given an item.
Most existing methods mainly attempt to reduce the bias from the
item distribution perspective, ignoring the discrepancy from user
preference given an item. This leads to a severe forgetting issue
and results in sub-optimal performance.
To address the problem, we design a novel Cross Decoupling
Network (CDN) to reduce the two dierences. Specically, CDN
(i) decouples the learning process of memorization and generaliza-
tion on the item side through a mixture-of-expert architecture; (ii)
decouples the user samples from dierent distributions through
a regularized bilateral branch network. Finally, a new adapter is
introduced to aggregate the decoupled vectors, and softly shift the
training attention to tail items. Extensive experimental results show
that CDN signicantly outperforms state-of-the-art approaches on
popular benchmark datasets. We also demonstrate its eectiveness
by a case study of CDN in a large-scale recommendation system at
Google.
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
KDD ’23, August 6–10, 2023, Long Beach, CA, USA
©2023 Copyright held by the owner/author(s).
ACM ISBN 979-8-4007-0103-0/23/08.
https://doi.org/10.1145/3580305.3599814
CCS CONCEPTS
Information systems Information retrieval.
KEYWORDS
decoupling, recommendation, memorization and generalization
ACM Reference Format:
Yin Zhang, Ruoxi Wang, Tiansheng Yao, Xinyang Yi, Lichan Hong, James
Caverlee, Ed H. Chi, and Derek Zhiyuan Cheng. 2023. Empowering Long-
tail Item Recommendation through Cross Decoupling Network (CDN). In
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery
and Data Mining (KDD ’23), August 6–10, 2023, Long Beach, CA, USA. ACM,
New York, NY, USA, 10 pages. https://doi.org/10.1145/3580305.3599814
1 INTRODUCTION
In industry recommender systems, user feedback towards items
usually exhibits server long-tail distributions. That is, a small frac-
tion of items (head items) are extremely popular and receive most of
the user feedback, while rest of the items (tail items) have very little
if any user feedback. Recommender models that are trained based
on the long-tail data usually amplify head items, enable the “rich
get richer” eect while hurting long-term user satisfaction. Models
trained on highly skewed data distribution may lead to even worse
skewness in real-world applications. Hence, it’s critical to address
the long-tail distribution problem in industry recommenders.
There have been some methods that are successfully deployed in
production models to alleviate the long-tail distribution inuence,
for example, logQ corrections [
17
,
27
] and re-sampling. However,
further improvements in this area for industry models are very
limited, despite many research advances made in academia [
31
].
There are several challenges which make putting these research in
production models dicult. First, many work targeting long-tail
performance would hurt head or overall performance, directly im-
pacting top line business metrics. Second, production models have
very strict latency requirements for real-time inference, while many
existing techniques are complex and expensive to serve. Third, pro-
duction models prefer simplicity for easy adoption and maintenance.
Many research (e.g., meta-learning [
14
], transfer learning [
29
]) are
dicult to be productionized due to these challenges.
In our work, we aim to improve tail item recommendations while
maintaining the overall performance with less training and serving
arXiv:2210.14309v3 [cs.IR] 3 Sep 2023
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Yin Zhang, et al.
cost. We draw our inspiration from recent work with great suc-
cess in computer vision [
13
]. Its core idea is that representation
learning and classication learning require dierent data distribu-
tions, where traditionally models don’t consider such decoupling
for model parameters. Specically, they propose a two-stage decou-
pling training strategy, where the rst stage trains on the original
long-tail distribution for item representation learning, and the sec-
ond stage trains on the re-balanced data to improve the predictions
of tail items. However, in recommendation application, we empir-
ically observe that these methods suer from a severe forgetting
issue [
21
]. This means that the learned knowledge of certain parts
of items (e.g. head) in the rst training stage are easily forgotten
when the learning focus is shifted to other items (e.g. tail) in the
second training stage, leading to a degradation in overall model
quality (as shown in Figure 1). Moreover, in large-scale production
systems, two-stage training is much more complex to achieve and
maintain than co-training scheme. In light of the pros and cons of
this method, we target on developing a model that improves the
decoupling technique to accommodate web-scale recommenders.
Like many other methods tackling cold-start problems, the de-
coupling methods potentially hurt the overall recommendation
performance. We attempt to understand this from a theoretical
point of view. In particular, we found that the prediction of user
preference towards an item is biased. The bias comes from the dif-
ferences between training and serving data in two perspectives:
1) the item distributions, and 2) user’s preference given an item.
Most existing methods mainly attempt to reduce the bias from the
item distribution perspective, ignoring the discrepancy from user
preference given an item.
Motivated by the theoretical ndings, we propose a novel Cross
Decoupling Network (CDN) framework to mitigate the forgetting
issue, by considering decoupling from both item and user sides.
“Decoupling” means we treat the corresponding learning as two
independent processes. In more details:
On the item side, the amount of user feedback received by
head items and tail items vary signicantly. This variations
may cause the model to forget the learned knowledge of head
(more memorization), when its attention is shifted to tail (more
generalization)
1
. Hence, we propose to decouple the memo-
rization and generalization for the item representation learning.
In particular, we rst group features into memorization and
generalization related features, and feed them separately into
a memorization-focused expert and a generalization-focused
expert, which are then aggregated through a frequency-based
gating. This mixture-of-expert structure allows us to dynami-
cally balance memorization and generalization abilities for head
and tail items.
On the user side, we leverage a regularized bilateral branch
network to decouple user samples from two distributions. The
network consists of two branches: a “main” branch that trains
on the original distribution for a high-quality representation
learning; and a new “regularizer” branch that trains on the re-
balanced distribution to add more tail information to the model.
1
Memorization is to learn and exploit the existing knowledge of visited training data.
Generalization, on the other hand, is to explore new knowledge that has not occurred
in the training dataset based on the transitivity (e.g. data correlation) [5].
Figure 1: The recommender performance (HR@50) of dier-
ent methods on tail items (x-axis) and overall items (y-axis).
Dots and plus signals represents four two-stage decoupling
methods. ‘H’ means focusing on head items, ‘T’ means focus-
ing on tail items, and
means switching from the 1st to the
2nd stage. When tail (T) is focused in the second stage, the
performance on tail items improves; however the overall per-
formance signicantly degrades. Our model CDN achieves
excellent performance for both overall and tail item perfor-
mances.
These two branches share some hidden layers and are jointly
trained to mitigate the forgetting issue. Shared tower on the
user side are used for scalability.
Finally, a new adapter (called
𝛾
-adapter) is introduced to aggregate
the learned vectors from the user and item sides. By adjusting the
hyperparameter
𝛾
in the adapter, we are able to shift the training
attention to tail items in a soft and exible way based on dierent
long-tail distributions.
The resulting model CDN mitigates the forgetting issue of exist-
ing models, where it not only improves tail item performance, but
also preserves or even improves the overall performance. Further,
it adopts a co-training scheme that is easy to maintain. These all
make CDN suitable for industrial caliber applications.
The contributions of this paper are 4-fold:
We provide a theoretical understanding on how the long-tail
distribution inuences recommendation performance from both
item and user perspectives.
We propose a novel cross decoupling network that decouples
the learning process of memorization and generalization, and
the sampling strategies. A
𝛾
-adapter is utilized to aggregate the
learning from the two sides.
Extensive experimental results on public dataset show CDN
signicantly outperforms the SOTA method, improving perfor-
mance on both overall and tail item recommendations.
We further provide a case study of applying CDN to a large-
scale recommender system at Google. We show that CDN is
easy to adapt to real setting and achieves signicant quality
improvements both oine and online.
2 LONG-TAIL DISTRIBUTION IN
RECOMMENDATION AND MOTIVATION
Problem Settings. Our goal is to predict user engagement (e.g.
clicks, installs) of candidates (e.g. videos, apps) that are in long-
tail distributions. We start with formulating the problem: Given
a set of users
U={
1
,
2
, . . . , 𝑚}
, a set of items
I={
1
,
2
, . . . , 𝑛}
and their content information (e.g. item tags, categories). Let
ˆ
𝑑(𝑢, 𝑖)
摘要:

EmpoweringLong-tailItemRecommendationthroughCrossDecouplingNetwork(CDN)YinZhangyinzh@google.comGoogleResearch,BrainTeam,USARuoxiWangruoxi@google.comGoogleResearch,BrainTeam,USATianshengYaotyao@google.comGoogleResearch,BrainTeam,USAXinyangYixinyang@google.comGoogleResearch,BrainTeam,USALichanHonglich...

展开>> 收起<<
Empowering Long-tail Item Recommendation through Cross Decoupling Network CDN.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:1.33MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注