Disentangling Past-Future Modeling in Sequential Recommendation via Dual Networks

2025-05-03 0 0 1.29MB 11 页 10玖币

侵权投诉

Disentangling Past-Future Modeling in Sequential

Recommendation via Dual Networks

Hengyu Zhang∗

zhang-hy21@mails.tsinghua.edu.cn

Tsinghua Shenzhen International

Graduate School, Tsinghua University

Shenzhen, China

Enming Yuan∗

yem19@mails.tsinghua.edu.cn

Institute for Interdisciplinary

Information Sciences,Tsinghua

University

Beijing, China

Wei Guo

guowei67@huawei.com

Huawei Noah’s Ark Lab

Shenzhen, China

Zhicheng He

hezhicheng9@huawei.com

Huawei Noah’s Ark Lab

Shenzhen, China

Jiarui Qin

qinjr@icloud.com

Shanghai Jiao Tong University

Shanghai, China

Huifeng Guo

huifeng.guo@huawei.com

Huawei Noah’s Ark Lab

Shenzhen, China

Bo Chen

chenbo116@huawei.com

Huawei Noah’s Ark Lab

Shenzhen, China

Xiu Li†

li.xiu@sz.tsinghua.edu.cn

Tsinghua Shenzhen International

Graduate School, Tsinghua University

Shenzhen, China

Ruiming Tang†

tangruiming@huawei.com

Huawei Noah’s Ark Lab

Shenzhen, China

ABSTRACT

Sequential recommendation (SR) plays an important role in per-

sonalized recommender systems because it captures dynamic and

diverse preferences from users’ real-time increasing behaviors. Un-

like the standard autoregressive training strategy, future data (also

available during training) has been used to facilitate model training

as it provides richer signals about user’s current interests and can

be used to improve the recommendation quality. However, these

methods suer from a severe training-inference gap, i.e., both past

and future contexts are modeled by the same encoder when train-

ing, while only historical behaviors are available during inference.

This discrepancy leads to potential performance degradation. To

alleviate the training-inference gap, we propose a new framework

DualRec, which achieves past-future disentanglement and past-

future mutual enhancement by a novel dual network. Specically,

a dual network structure is exploited to model the past and future

context separately. And a bi-directional knowledge transferring

mechanism enhances the knowledge learnt by the dual network.

Extensive experiments on four real-world datasets demonstrate the

superiority of our approach over baseline methods. Besides, we

demonstrate the compatibility of DualRec by instantiating using

RNN, Transformer, and lter-MLP as backbones. Further empirical

analysis veries the high utility of modeling future contexts under

∗

Work done when they were research interns at Huawei Noah’s Ark Lab, and both

authors contributed equally to this research.

†Corresponding author.

This work is licensed under a Creative Commons Attribution

International 4.0 License.

CIKM ’22, October 17–21, 2022, Atlanta, GA, USA

ACM ISBN 978-1-4503-9236-5/22/10.

https://doi.org/10.1145/3511808.3557289

our DualRec framework. The code of DualRec is publicly available

at https://github.com/zhy99426/DualRec.

CCS CONCEPTS

•Information systems →Recommender systems.

KEYWORDS

Sequential recommendation; training-inference gap; dual network

ACM Reference Format:

Hengyu Zhang, Enming Yuan, Wei Guo, Zhicheng He, Jiarui Qin, Huifeng

Guo, Bo Chen, Xiu Li, and Ruiming Tang. 2022. Disentangling Past-Future

Modeling in Sequential Recommendation via Dual Networks. In Proceedings

of the 31st ACM Int’l Conference on Information and Knowledge Management

(CIKM ’22), Oct. 17–21, 2022, Atlanta, GA, USA. ACM, New York, NY, USA,

11 pages. https://doi.org/10.1145/3511808.3557289

1 INTRODUCTION

Recommender systems have been widely deployed in online service

platforms, ranging from online advertising and retailing [

]

to music and video recommendation [

]. Generally, users’

interests are dynamic and evolve over time, which are depicted

by users’ sequential interactions. Therefore, it leads to Sequential

Recommendation (SR) that models the sequential characteristics

of users’ behaviors and provides more precise and customized ser-

vices. To make accurate predictions, it’s essential to learn eective

representation for users based on the historical interactions they

have engaged with. Over the years, great eorts have been devoted,

and dierent model architectures are proposed for sequential rec-

ommendation, including Recurrent Neural Network (RNN) [

Convolutional Neural Networks (CNN) [

], Self-Attention Net-

work (SAN)[14], and Graph Neural Networks (GNN) [23, 33].

Typically, the sequential recommendation problem is formulated

as a next-item-prediction problem, also known as autoregressive

arXiv:2210.14577v2 [cs.IR] 11 Jan 2023

CIKM ’22, October 17–21, 2022, Atlanta, GA, USA Hengyu Zhang, et al.

Figure 1: Illustration of common sequential recommenda-

tion models.

model (Figure 1a), which predicts the next item a user will in-

teract with based on her historical behaviors [

]. It is a

straightforward modeling choice for sequential data. However, in

the sequential recommendation, the autoregressive schema could

weaken the model’s expressiveness. Because in practice, the se-

quential dependencies of user behaviors may not strictly hold. For

example, after purchasing an iPad, a user may click on Apple pencil,

iPad case, and headphones. But it is likely that the user clicked

on these three products at random. Then simply modeling it in a

compulsory sequential order loses some overall contextual infor-

mation. Because future data (interactions that occur after the target

interaction) also provide rich collaborative information to assist

the model training. Therefore, it’s reasonable to leverage the future

data to train better sequential recommendation models.

Recently, researchers have proven that leveraging both past and

future contextual information during training will signicantly

boost recommendation performances compared to the autoregres-

sive models [

]. For example, inspired by the advances in the

eld of natural language processing (NLP), BERT4Rec [

] employs

a masked language model (MLM) training objective which predicts

masked items based on both historical and future behavior records

during training (Figure 1b). BERT4Rec signicantly improves rec-

ommendation performances compared to its unidirectional autore-

gressive counterpart SASRec [14].

Despite the richer contextual information brought by training

with future interaction data, simply adopting MLM objectives for

SR can introduce a severe

training-inference gap

. Specically, at

training time, the MLM model predicts masked items with both

past and future interactions as context, which can be illustrated

𝑃(𝑖|x𝑝𝑎𝑠𝑡 ,x𝑓 𝑢𝑡𝑢𝑟 𝑒 )

. While at inference, only past behaviors are

available available for prediction, i.e.,

𝑃(𝑖|x𝑝𝑎𝑠𝑡 ,NULL)

. The discrep-

ancy of context between training and inference can bias the model

during inference and lead to potential performance degradation.

To exploit richer contextual information from the future while al-

leviating the potential training-inference gap. The following model

desiderata should be met:

•Past-future disentanglement

: The training-inference gap in

existing methods is caused by the use of a single encoder predictor

that entangles past and future contextual information, thus mess-

ing with inference. Instead, the future data should be modeled

in a separate way without explicitly interfering with modeling

historical interaction data. If both disentangled encoders get well-

trained, the absence of future information will not degrade the

performance of the past information encoder. By this means, we

can use only past behaviors for inference, with a minimal gap

between training and inference.

•Past-future mutual enhancement

: Users’ interests captured

by past and future behaviors are closely related and comple-

mentary. Simply separating past and future modeling processes

hinders leveraging knowledge learned from each other. To better

exploit future data, an elegant way is to have the disentangled

past-future modeling process mutually enhance each other.

In this article, we propose a framework for better utilization of

past and future information in sequential recommendation, named

DualRec

. To alleviate the training-inference gap, DualRec adopts

a dual network structure. For a target interaction, past and future

contextual behaviors are modeled by two encoders, respectively.

The two encoders perform dual tasks, i.e. the past encoder performs

next-item prediction (primal) while the future encoder performs

previous-item-prediction (dual). Future information is decoupled

from the modeling of past information in this way. During inference,

only the past encoder is used to make predictions, thus avoiding the

training-inference gap. Secondly, dual network enhance each other

through a multi-scale knowledge transferring. Specically, the in-

ternal representations of two networks are constraint to alignment

based on the assumption that users’ interests captured by past and

future behaviors are closely related and complementary. Finally, as

a general framework, DualRec can be instantiated using dierent

backbone models, including RNN, Transformer and lter-based

MLP.

To summarize, our contributions are as follows:

•

We highlight the training-inference gap existed in sequential rec-

ommendation models when leveraging future data. To handle this

problem, we propose a novel framework DualRec that achieves

the disentanglement and mutual enhancement of past-future

modeling.

•

DualRec explicitly decouples the past information and future

information modeling into two separate encoders, thus alleviat-

ing the training-inference gap, and further using a past-future

knowledge transferring to learn an enhanced representation.

•

We conduct comprehensive experiments on four public datasets.

Experimental results demonstrate the eectiveness of our pro-

posed DualRec as compared with several baseline models. Further

analysis illustrates its compatibility.

2 RELATED WORK

Sequential recommenders are designed to model the sequential dy-

namics in user behaviors. Early eorts leverage the Markov Chain

(MC) assumption [

] and model item-item transition to predict

user’s next action based on the last visited items. Recently, dif-

ferent neural network-based models have been applied, including

Recurrent Neural Networks (RNNs) [

], Convolutional Neural

Networks (CNNs) [

], Attention Networks [

] and Graph Neural

Networks [

]. GRU4Rec [

] is a pioneering work that employs

RNN to capture the dynamic characteristic of user behaviors. Hidasi

and Karatzoglou further extends GRU4Rec with enhanced ranking

functions as well as eective sampling strategies. Another line of re-

search is based on CNN. Caser [

] treats the embedding matrix of

items as a 2D image and models user behavior sequences with con-

volution. The main advantage of CNN-based models is that they are

much easier to be parallelized on GPUs compared with RNN-based

Disentangling Past-Future Modeling in Sequential Recommendation via Dual Networks CIKM ’22, October 17–21, 2022, Atlanta, GA, USA

models. Self-Attention Mechanism and the Transformer architec-

tures [

] are applied to sequential recommendation and proved

to be advantageous in discovering user behavior patterns due to

the adaptive weight learning and better integration of long-range

dependencies, such as SASRec [

], BERT4Rec [

] and S3-Rec [

More recently, Graph Neural Networks have been explored to en-

code the global structure of user interactions and capture complex

item transitions. SRGNN [

] and GCSAN [

] model sequence

or session as graph structure, and use graph neural networks and

attention mechanism to capture the rich dependencies. Inspired by

work related to self-supervised learning, S3-Rec [

] and CL4Rec

[

] used a contrastive learning approach to pre-train the sequential

recommendation model. Furthermore, CLEA [

] uses a contrastive

learning approach to automatically lter out items from the user

sequence that are more irrelevant to the target.

Among these sequential recommendation works, some bidirec-

tional Transformer-based methods adopting MLM objectives, such

as BERT4Rec[

], attempt to utilize both the past and future con-

textual information. However, these works suer from a severe

training-inference gap as analyzed in Section 1.

3 METHOD

In this section, we rst dene the sequential recommendation prob-

lem (Section 3.1). Then, we elaborate on the technical details of our

proposed DualRec framework. The DualRec framework is shown

in Figure 2 (b). We rst introduce the base encoder (Section 3.2),

which is a Transformer-based backbone. Then to utilize future con-

text while alleviating the training-inference gap, we present the

dual-network structure (Section 3.3) that models past and future

behaviors through two encoders, respectively. Furthermore, the

bi-directional information transferring mechanism (Section 3.4) is

adopted to make the dual networks enhance each other. The de-

tails in model training and inference are presented in Section 3.5.

Further, we discuss the complexity and compatibility of DualRec

(Section 3.6). Essentially, DualRec can work in cooperation with

most existing sequential recommendation models.

3.1 Problem Formulation

Sequential recommendation learns to predict users’ next behav-

ior from their historical behavior sequences. Given a set of users

U={𝑢1, 𝑢2, ...,𝑢| U | }

, and a set of items

I={𝑖1, 𝑖2, ..., 𝑖 | I | }

, let

S(𝑢)=[𝑖(𝑢)

1, 𝑖 (𝑢)

2, ..., 𝑖 (𝑢)

𝑇𝑢]

denote the chronologically sorted be-

havior sequence of user

𝑢∈ U

, where

𝑖(𝑢)

𝑡

-th interacted item

of user

𝑢

, and

𝑇𝑢=|S (𝑢)|

is the length of behavior sequence. A

sequential recommendation model predicts the next item

𝑖(𝑢)

𝑇𝑢+1

that

user

𝑢

will interact with based on the behavior history

S(𝑢)

, which

can be formulated as:

𝑝(𝑖(𝑢)

𝑇𝑢+1=𝑖(𝑐)|S(𝑢))=SeqRecModel(S(𝑢), 𝑖 (𝑐)),(1)

where

𝑖(𝑐)

is the candidate item, and

SeqRecModel(·,·)

is a sequen-

tial recommendation model.

3.2 Base Encoder

For simplicity, we illustrate with the standard Transformer as the

base encoder of the dual network structure, which is widely used in

many sequential recommendation methods [

] and has been

proven to be advantageous in discovering sequential patterns due

to the adaptive weight learning. Notably, our proposed framework

can also work with other backbone architectures, including RNNs

and CNNs. We will evaluate the performance of our framework

with dierent backbones in the experimental section.

3.2.1 Embedding Layer. The input user behavior sequence

𝑆(𝑢)

rst transformed into a xed-length sequence

𝑠=(𝑖1, 𝑖2, ..., 𝑖𝑛)

(For

simplicity, we omit the superscript

𝑢

), where

𝑛

is the predened

maximum length. Specically, if the original sequence length is

greater than

𝑛

, we keep the most recent

𝑛

actions. If the original

sequence length is less than

𝑛

, special

[padding]

tokens are padded

to the left of the sequence as dummy past interactions.

Item Embedding:

For all items given in

, we create an item

embedding matrix

EI∈R|I |×𝑑

where

𝑑

is the embedding size.

User behavior sequence 𝑠=(𝑖1, 𝑖2, ..., 𝑖𝑛)is embedded as:

X(0)=(e1,· · · ,e𝑛),e𝑘=LookUp(𝑖𝑘,EI),(2)

where

LookUp(·,·)

retrieves an item embedding from the embed-

ding matrix, and always gets a constant zero vector for the padding

item.

Relative Positional Embedding:

Transformer doesn’t contain

any recurrent or convolutional operation and is not sensitive to

positional information. Therefore, we need to inject additional

positional embedding. We adopt the relative positional embedding

as it is more robust than absolute positional embedding [

]. When

calculating the attention weight of the

𝑗

-th item given the

𝑖

-th item

as query, the relative positional embedding is calculated as

p(𝑖, 𝑗)=LookUp(Dist(𝑖, 𝑗),EP),(3)

where

Dist(𝑖, 𝑗) ∈ [−𝑛+

, 𝑛 −

]

is the relative distance between

two items.

EP∈R(2𝑛−1)×ℎ

is the positional embedding where

ℎ

is the number of heads in the self-attention mechanism. The

relative positional matrix

is added to the attention matrix to

inject additional relative positional information.

3.2.2 Transformer Layer. After the embedding layer, the input

X∈R𝑛×𝑑

are fed to multi-head self-attention (MSA) blocks for

capturing the relations among dierent items in this sequence. The

computation for each head can be formulated as follows:

head𝑖=𝑠𝑜 𝑓 𝑡𝑚𝑎𝑥 (XW𝑄

𝑖)·(XW𝐾

𝑖)⊤

𝑑/ℎ!(XW𝑉

𝑖),(4)

where

W𝑄

𝑖∈R𝑑×𝑑

ℎ

W𝐾

𝑖∈R𝑑×𝑑

ℎ

and

W𝑉

𝑖∈R𝑑×𝑑

ℎ

are the query,

key, and value projection matrices, respectively,

(·)

is the matrix

multiplication operator,

𝑖

indicates a specic head,

ℎ

is the total

number of heads and 𝑑/ℎis the scaling factor.

MSA performs the above self-attention operation

ℎ

times in

parallel, then combines the outputs of each head together, and

linearly projects them to get the nal output:

MSA(X)=Concat(head𝑖,head2, ..., headℎ)W𝑜,(5)

where W𝑜∈R𝑑×𝑑is the corresponding transformation matrix.

To introduce non-linearity and perform feature transformation

between MSA layers, point-wise feed-forward (PFF) layer is used:

PFF(X)=FC(𝜎(FC(X))),FC(X)=XW +b,(6)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DisentanglingPast-FutureModelinginSequentialRecommendationviaDualNetworksHengyuZhang∗zhang-hy21@mails.tsinghua.edu.cnTsinghuaShenzhenInternationalGraduateSchool,TsinghuaUniversityShenzhen,ChinaEnmingYuan∗yem19@mails.tsinghua.edu.cnInstituteforInterdisciplinaryInformationSciences,TsinghuaUniversityBe...

展开>> 收起<<

Disentangling Past-Future Modeling in Sequential Recommendation via Dual Networks.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Disentangling Past-Future Modeling in Sequential Recommendation via Dual Networks

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: