AutoAttention Automatic Field Pair Selection for Attention in User Behavior Modeling Zuowu Zheng Xiaofeng Gao Junwei Pany Qi Luoz Guihai Chen Dapeng Liuy and Jie Jiangy

2025-05-02 0 0 2.8MB 10 页 10玖币

侵权投诉

AutoAttention: Automatic Field Pair Selection for

Attention in User Behavior Modeling

Zuowu Zheng∗, Xiaofeng Gao∗, Junwei Pan†, Qi Luo‡, Guihai Chen∗, Dapeng Liu†, and Jie Jiang†

∗Shanghai Jiao Tong University, Shanghai, China

waydrow@sjtu.edu.cn, {gao-xf, gchen}@cs.sjtu.edu.cn

†Tencent Inc., Shenzhen, China

{jonaspan, rocliu, zeus}@tencent.com

‡Shandong University, Shandong, China

luoqi2018@mail.sdu.edu.cn

Abstract—In Click-through rate (CTR) prediction models, a

user’s interest is usually represented as a ﬁxed-length vector

based on her history behaviors. Recently, several methods are

proposed to learn an attentive weight for each user behavior

and conduct weighted sum pooling. However, these methods only

manually select several ﬁelds from the target item side as the

query to interact with the behaviors, neglecting the other target

item ﬁelds, as well as user and context ﬁelds. Directly including all

these ﬁelds in the attention may introduce noise and deteriorate

the performance. In this paper, we propose a novel model named

AutoAttention, which includes all item/user/context side ﬁelds as

the query, and assigns a learnable weight for each ﬁeld pair

between behavior ﬁelds and query ﬁelds. Pruning on these ﬁeld

pairs via these learnable weights lead to automatic ﬁeld pair

selection, so as to identify and remove noisy ﬁeld pairs. Though

including more ﬁelds, the computation cost of AutoAttention

is still low due to using a simple attention function and ﬁeld

pair selection. Extensive experiments on the public dataset and

Tencent’s production dataset demonstrate the effectiveness of the

proposed approach.

Index Terms—Click-Through Rate Prediction, User Behavior

Modeling, Recommendation System

I. INTRODUCTION

Click-through rate (CTR) prediction is one of the most

fundamental tasks for online advertising systems, and it has

attracted much attention from both industrial and academic

communities [1]–[3]. Modeling a user’s interest through his

or her history behaviors on items has proven as one of the

most successful advances in the CTR prediction task [4]–[6].

In the Embedding & Multi-Layer Perceptron (MLP) algo-

rithms for online advertising and recommendation systems,

a user’s interest is usually represented as a ﬁxed-length

embedding vector, based on her history behaviors [4], [7].

Traditional methods take a straightforward way to do a sum or

mean pooling over all behavior embedding vectors to generate

one embedding [7]. However, it ignores the fact that some

behaviors are more important than others given the target item,

user and context features.

Recently, several user behavior modeling methods are pro-

posed to calculate attentive weights for different behaviors w.r.t

1Z. Zheng, X. Gao, and G. Chen are with the MoE Key Lab of Artiﬁcial

Intelligence, Department of Computer Science and Engineering, Shanghai Jiao

Tong University. X. Gao is the Corresponding author.

0.00 0.01 0.02 0.03 0.04 0.05 0.06

Inference Time (ms)

0.600

0.602

0.604

0.606

0.608

0.610

0.612

0.614

0.616

AUC

Sum Pooling

DIN

DIEN

DSIN

MAF-C

MAF-S

DIN+

DotProduct

AutoAttention

Fig. 1: AUC and inference time comparison of the proposed

AutoAttention with baselines on the public Alibaba dataset.

Sum pooling, DIN, DIEN, and DSIN are four existing meth-

ods, which only include several manually selected ﬁelds in

the attention unit. MAF-C, MAF-S, DIN+, and DotProduct

are several proposed baselines which include all available

ﬁelds. AutoAttention also includes all ﬁelds, but conducted

ﬁeld pair selection and achieves new state-of-the-art AUC with

low inference time.

a given target item and then conduct a weighted sum pooling,

such as Deep Interest Network (DIN) [4] and its variants [5],

[6]. Even though these methods achieve signiﬁcant perfor-

mance lift, they still suffer from the following limitations:

•First, in real-world recommendation systems, a user’s

interest may not only depend on the target item but also

on the user’s demographic features or context features.

However, existing works only manually select several

ﬁelds from the target item side as the query and interact

them with each behavior to calculate the attentive weight.

It neglects the effect of other ﬁelds, including other

ﬁelds from the target item side, as well as those from

the user and context sides. For example, when browsing

the game zone of a shopping website, a boy will click

arXiv:2210.15154v1 [cs.IR] 27 Oct 2022

a recommended new game The Witcher 3 because he

clicked some similar games last week, so the item side

ﬁelds should be included as all existing works do. Or

it’s because he is in the game zone now and any history

click on games indicates a strong interest in games. In

the latter case, the game zone feature from the context

side plays an important role in capturing his interest from

behaviors.

•Second, existing works interact all behavior ﬁelds with all

target item side ﬁelds. Recent studies [8], [9] show that

some interactions in attention are unnecessary and harm

the performance. Involving more ﬁelds as the query may

introduce more irrelevant ﬁeld interactions and further

deteriorate the performance.

•Third, as a part of the input layer of a more complicated

DNN model for CTR prediction, the procedure of gener-

ating a user interest vector should be lightweight. Unfor-

tunately, most existing methods use an MLP to calculate

the attention weight, which leads to high computation

complexity.

To resolve these challenges, we propose to include all

item/user/context ﬁelds as the query in the attention unit, and

calculate a learnable weight for each ﬁeld pair between user

behavior ﬁelds and these query ﬁelds. To avoid introducing

noisy ﬁeld pairs, we further propose to automatically select the

most important ones by pruning on these weights. Besides, we

adopt a simple dot product function rather than an MLP as the

attention function, leading to much less computation cost. We

summarize the AUC as well as the average inference time of

AutoAttention and several baseline models in Fig. 1. Except

Sum Pooling which has a very low inference time due to its

simplicity, the proposed AutoAttention gets a higher AUC than

all the other baseline models with low inference time. The

main contributions of this paper are summarized as follows:

•We propose to involve all item/user/context ﬁelds as the

query in the attention unit for user interest modeling.

A weight is assigned for each ﬁeld pair between user

behavior ﬁelds and these query ﬁelds. Pruning on weights

automates the ﬁeld pair selection, preventing performance

deterioration due to introducing irrelevant ﬁeld pairs.

•We propose to use a simple dot product attention, rather

than an MLP in existing methods. This greatly reduces

the time complexity with comparable or even better

performance.

•We conduct extensive experiments on public and produc-

tion datasets to compare AutoAttention with state-of-the-

art methods. Evaluation results verify the effectiveness

of AutoAttention. We also study the learnt ﬁeld pair

weights and ﬁnd that AutoAttention does identify several

ﬁeld pairs including user or context side ﬁelds, which are

ignored by expert knowledge in existing works.

The rest of the paper is organized as follows. Section II

provides the preliminaries of existing user behavior methods.

In Section III, we describe AutoAttention, and describe its

connection with several existing methods. Experiment settings

and evaluation results are presented in Section IV. Finally,

Section V and Section VI discusses the related work and

concludes the paper, respectively.

II. PRELIMINARIES

In this section, we present the preliminaries of user behavior

modeling in CTR prediction. A CTR prediction model aims

at predicting the probability that a user clicks an item given

a context (e.g., time, location, publisher information, etc.). It

takes ﬁelds from three sides as the input:

pCTR =f(user,item,context)

where user side ﬁelds consists of user demographic ﬁelds and

behavior ﬁelds, item and context denote ﬁelds from the item

and context sides, respectively. In this paper, we focus on how

to capture a user’s interest from user behaviors.

Given a user uand her corresponding behaviors

{v1,v2,...,vH}, her interest is represented as a ﬁxed-length

vector as follows:

vu=f(v1,v2,...,vH,eF1,eF2,...,eFM)(1)

where videnotes the embedding for the i-th behavior, H

denotes the length of user behaviors, and eFj∈ RKdenotes

the feature embedding from any other ﬁeld besides the user

behaviors(e.g., item/user/context side ﬁelds), i.e., Fj. Each

behavior is usually represented by multiple item side ﬁelds.

Denoting the set of ﬁelds to represent behaviors as B={Bp},

then each behavior is represented as vi=PBp∈B vBp, where

vBp∈ RKdenotes the feature embedding for the ﬁeld Bpof

the i-th behavior.

A straightforward way to calculate vuis to do a sum

or mean pooling over all these viembedding vectors [7].

However, it neglects the importance of each behavior given

a speciﬁc target item. Recently, a commonly used behavior

modeling strategy is to adopt an attention mechanism over

the user’s historical behaviors. It learns an attentive weight

for each behavior iw.r.t. a given target item tand then

conducts a weighted sum pooling, i.e. vu=PH

i=1 a(i, t)vi,

where a(i, t)denotes an attention function. For example, Deep

Interest Network (DIN) considers the inﬂuence of the target

item on user behaviors [4], which learns larger weights to

those behaviors that are more important given the target item,

as shown in Eqn. (2).

vu=f(v1,v2,...,vH,et)

i=1

a(i, t)vi=

i=1

MLP(vi,et)vi

(2)

where etdenotes the embedding vector of the target item t.

MLP() denotes an MLP with its output as the attention weight.

Following DIN, DIEN [5] further considers the evolution of

user interest, and DSIN [6] considers the homogeneity and

heterogeneity of a user’s interests within and among sessions.

DIF-SR [10] proposes to only consider the interaction between

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AutoAttention:AutomaticFieldPairSelectionforAttentioninUserBehaviorModelingZuowuZheng,XiaofengGao,JunweiPany,QiLuoz,GuihaiChen,DapengLiuy,andJieJiangyShanghaiJiaoTongUniversity,Shanghai,Chinawaydrow@sjtu.edu.cn,fgao-xf,gcheng@cs.sjtu.edu.cnyTencentInc.,Shenzhen,Chinafjonaspan,rocliu,zeusg@tencen...

展开>> 收起<<

AutoAttention Automatic Field Pair Selection for Attention in User Behavior Modeling Zuowu Zheng Xiaofeng Gao Junwei Pany Qi Luoz Guihai Chen Dapeng Liuy and Jie Jiangy.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

AutoAttention Automatic Field Pair Selection for Attention in User Behavior Modeling Zuowu Zheng Xiaofeng Gao Junwei Pany Qi Luoz Guihai Chen Dapeng Liuy and Jie Jiangy

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: