Empowering Long-tail Item Recommendation through Cross Decoupling Network CDN

2025-05-03 0 0 1.33MB 10 页 10玖币

侵权投诉

Empowering Long-tail Item Recommendation through Cross

Decoupling Network (CDN)

Yin Zhang

yinzh@google.com

Google Research, Brain Team, USA

Ruoxi Wang

ruoxi@google.com

Google Research, Brain Team, USA

Tiansheng Yao

tyao@google.com

Google Research, Brain Team, USA

Xinyang Yi

xinyang@google.com

Google Research, Brain Team, USA

Lichan Hong

lichan@google.com

Google Research, Brain Team, USA

James Caverlee

caverlee@cse.tamu.edu

Texas A&M University, USA

Ed H. Chi

edchi@google.com

Google Research, Brain Team, USA

Derek Zhiyuan Cheng

zcheng@google.com

Google Research, Brain Team, USA

ABSTRACT

Industry recommender systems usually suer from highly-skewed

long-tail item distributions where a small fraction of the items

receives most of the user feedback. This skew hurts recommender

quality especially for the item slices without much user feedback.

While there have been many research advances made in academia,

deploying these methods in production is very dicult and very

few improvements have been made in industry. One challenge is

that these methods often hurt overall performance; additionally,

they could be complex and expensive to train and serve.

In this work, we aim to improve tail item recommendations

while maintaining the overall performance with less training and

serving cost. We rst nd that the predictions of user preferences

are biased under long-tail distributions. The bias comes from the

dierences between training and serving data in two perspectives:

1) the item distributions, and 2) user’s preference given an item.

Most existing methods mainly attempt to reduce the bias from the

item distribution perspective, ignoring the discrepancy from user

preference given an item. This leads to a severe forgetting issue

and results in sub-optimal performance.

To address the problem, we design a novel Cross Decoupling

Network (CDN) to reduce the two dierences. Specically, CDN

(i) decouples the learning process of memorization and generaliza-

tion on the item side through a mixture-of-expert architecture; (ii)

decouples the user samples from dierent distributions through

a regularized bilateral branch network. Finally, a new adapter is

introduced to aggregate the decoupled vectors, and softly shift the

training attention to tail items. Extensive experimental results show

that CDN signicantly outperforms state-of-the-art approaches on

popular benchmark datasets. We also demonstrate its eectiveness

by a case study of CDN in a large-scale recommendation system at

Google.

Permission to make digital or hard copies of part or all of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for third-party components of this work must be honored.

For all other uses, contact the owner/author(s).

KDD ’23, August 6–10, 2023, Long Beach, CA, USA

ACM ISBN 979-8-4007-0103-0/23/08.

https://doi.org/10.1145/3580305.3599814

CCS CONCEPTS

•Information systems →Information retrieval.

KEYWORDS

decoupling, recommendation, memorization and generalization

ACM Reference Format:

Yin Zhang, Ruoxi Wang, Tiansheng Yao, Xinyang Yi, Lichan Hong, James

Caverlee, Ed H. Chi, and Derek Zhiyuan Cheng. 2023. Empowering Long-

tail Item Recommendation through Cross Decoupling Network (CDN). In

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery

and Data Mining (KDD ’23), August 6–10, 2023, Long Beach, CA, USA. ACM,

New York, NY, USA, 10 pages. https://doi.org/10.1145/3580305.3599814

1 INTRODUCTION

In industry recommender systems, user feedback towards items

usually exhibits server long-tail distributions. That is, a small frac-

tion of items (head items) are extremely popular and receive most of

the user feedback, while rest of the items (tail items) have very little

if any user feedback. Recommender models that are trained based

on the long-tail data usually amplify head items, enable the “rich

get richer” eect while hurting long-term user satisfaction. Models

trained on highly skewed data distribution may lead to even worse

skewness in real-world applications. Hence, it’s critical to address

the long-tail distribution problem in industry recommenders.

There have been some methods that are successfully deployed in

production models to alleviate the long-tail distribution inuence,

for example, logQ corrections [

] and re-sampling. However,

further improvements in this area for industry models are very

limited, despite many research advances made in academia [

There are several challenges which make putting these research in

production models dicult. First, many work targeting long-tail

performance would hurt head or overall performance, directly im-

pacting top line business metrics. Second, production models have

very strict latency requirements for real-time inference, while many

existing techniques are complex and expensive to serve. Third, pro-

duction models prefer simplicity for easy adoption and maintenance.

Many research (e.g., meta-learning [

], transfer learning [

]) are

dicult to be productionized due to these challenges.

In our work, we aim to improve tail item recommendations while

maintaining the overall performance with less training and serving

arXiv:2210.14309v3 [cs.IR] 3 Sep 2023

KDD ’23, August 6–10, 2023, Long Beach, CA, USA Yin Zhang, et al.

cost. We draw our inspiration from recent work with great suc-

cess in computer vision [

]. Its core idea is that representation

learning and classication learning require dierent data distribu-

tions, where traditionally models don’t consider such decoupling

for model parameters. Specically, they propose a two-stage decou-

pling training strategy, where the rst stage trains on the original

long-tail distribution for item representation learning, and the sec-

ond stage trains on the re-balanced data to improve the predictions

of tail items. However, in recommendation application, we empir-

ically observe that these methods suer from a severe forgetting

issue [

]. This means that the learned knowledge of certain parts

of items (e.g. head) in the rst training stage are easily forgotten

when the learning focus is shifted to other items (e.g. tail) in the

second training stage, leading to a degradation in overall model

quality (as shown in Figure 1). Moreover, in large-scale production

systems, two-stage training is much more complex to achieve and

maintain than co-training scheme. In light of the pros and cons of

this method, we target on developing a model that improves the

decoupling technique to accommodate web-scale recommenders.

Like many other methods tackling cold-start problems, the de-

coupling methods potentially hurt the overall recommendation

performance. We attempt to understand this from a theoretical

point of view. In particular, we found that the prediction of user

preference towards an item is biased. The bias comes from the dif-

ferences between training and serving data in two perspectives:

1) the item distributions, and 2) user’s preference given an item.

Most existing methods mainly attempt to reduce the bias from the

item distribution perspective, ignoring the discrepancy from user

preference given an item.

Motivated by the theoretical ndings, we propose a novel Cross

Decoupling Network (CDN) framework to mitigate the forgetting

issue, by considering decoupling from both item and user sides.

“Decoupling” means we treat the corresponding learning as two

independent processes. In more details:

•

On the item side, the amount of user feedback received by

head items and tail items vary signicantly. This variations

may cause the model to forget the learned knowledge of head

(more memorization), when its attention is shifted to tail (more

generalization)

. Hence, we propose to decouple the memo-

rization and generalization for the item representation learning.

In particular, we rst group features into memorization and

generalization related features, and feed them separately into

a memorization-focused expert and a generalization-focused

expert, which are then aggregated through a frequency-based

gating. This mixture-of-expert structure allows us to dynami-

cally balance memorization and generalization abilities for head

and tail items.

•

On the user side, we leverage a regularized bilateral branch

network to decouple user samples from two distributions. The

network consists of two branches: a “main” branch that trains

on the original distribution for a high-quality representation

learning; and a new “regularizer” branch that trains on the re-

balanced distribution to add more tail information to the model.

Memorization is to learn and exploit the existing knowledge of visited training data.

Generalization, on the other hand, is to explore new knowledge that has not occurred

in the training dataset based on the transitivity (e.g. data correlation) [5].

Figure 1: The recommender performance (HR@50) of dier-

ent methods on tail items (x-axis) and overall items (y-axis).

Dots and plus signals represents four two-stage decoupling

methods. ‘H’ means focusing on head items, ‘T’ means focus-

ing on tail items, and

→

means switching from the 1st to the

2nd stage. When tail (T) is focused in the second stage, the

performance on tail items improves; however the overall per-

formance signicantly degrades. Our model CDN achieves

excellent performance for both overall and tail item perfor-

mances.

These two branches share some hidden layers and are jointly

trained to mitigate the forgetting issue. Shared tower on the

user side are used for scalability.

Finally, a new adapter (called

𝛾

-adapter) is introduced to aggregate

the learned vectors from the user and item sides. By adjusting the

hyperparameter

𝛾

in the adapter, we are able to shift the training

attention to tail items in a soft and exible way based on dierent

long-tail distributions.

The resulting model CDN mitigates the forgetting issue of exist-

ing models, where it not only improves tail item performance, but

also preserves or even improves the overall performance. Further,

it adopts a co-training scheme that is easy to maintain. These all

make CDN suitable for industrial caliber applications.

The contributions of this paper are 4-fold:

•

We provide a theoretical understanding on how the long-tail

distribution inuences recommendation performance from both

item and user perspectives.

•

We propose a novel cross decoupling network that decouples

the learning process of memorization and generalization, and

the sampling strategies. A

𝛾

-adapter is utilized to aggregate the

learning from the two sides.

•

Extensive experimental results on public dataset show CDN

signicantly outperforms the SOTA method, improving perfor-

mance on both overall and tail item recommendations.

•

We further provide a case study of applying CDN to a large-

scale recommender system at Google. We show that CDN is

easy to adapt to real setting and achieves signicant quality

improvements both oine and online.

2 LONG-TAIL DISTRIBUTION IN

RECOMMENDATION AND MOTIVATION

Problem Settings. Our goal is to predict user engagement (e.g.

clicks, installs) of candidates (e.g. videos, apps) that are in long-

tail distributions. We start with formulating the problem: Given

a set of users

U={

, . . . , 𝑚}

, a set of items

I={

, . . . , 𝑛}

and their content information (e.g. item tags, categories). Let

𝑑(𝑢, 𝑖)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EmpoweringLong-tailItemRecommendationthroughCrossDecouplingNetwork(CDN)YinZhangyinzh@google.comGoogleResearch,BrainTeam,USARuoxiWangruoxi@google.comGoogleResearch,BrainTeam,USATianshengYaotyao@google.comGoogleResearch,BrainTeam,USAXinyangYixinyang@google.comGoogleResearch,BrainTeam,USALichanHonglich...

展开>> 收起<<

Empowering Long-tail Item Recommendation through Cross Decoupling Network CDN.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Empowering Long-tail Item Recommendation through Cross Decoupling Network CDN

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: