AutoAttention Automatic Field Pair Selection for Attention in User Behavior Modeling Zuowu Zheng Xiaofeng Gao Junwei Pany Qi Luoz Guihai Chen Dapeng Liuy and Jie Jiangy

2025-05-02 0 0 2.8MB 10 页 10玖币
侵权投诉
AutoAttention: Automatic Field Pair Selection for
Attention in User Behavior Modeling
Zuowu Zheng, Xiaofeng Gao, Junwei Pan, Qi Luo, Guihai Chen, Dapeng Liu, and Jie Jiang
Shanghai Jiao Tong University, Shanghai, China
waydrow@sjtu.edu.cn, {gao-xf, gchen}@cs.sjtu.edu.cn
Tencent Inc., Shenzhen, China
{jonaspan, rocliu, zeus}@tencent.com
Shandong University, Shandong, China
luoqi2018@mail.sdu.edu.cn
Abstract—In Click-through rate (CTR) prediction models, a
user’s interest is usually represented as a fixed-length vector
based on her history behaviors. Recently, several methods are
proposed to learn an attentive weight for each user behavior
and conduct weighted sum pooling. However, these methods only
manually select several fields from the target item side as the
query to interact with the behaviors, neglecting the other target
item fields, as well as user and context fields. Directly including all
these fields in the attention may introduce noise and deteriorate
the performance. In this paper, we propose a novel model named
AutoAttention, which includes all item/user/context side fields as
the query, and assigns a learnable weight for each field pair
between behavior fields and query fields. Pruning on these field
pairs via these learnable weights lead to automatic field pair
selection, so as to identify and remove noisy field pairs. Though
including more fields, the computation cost of AutoAttention
is still low due to using a simple attention function and field
pair selection. Extensive experiments on the public dataset and
Tencent’s production dataset demonstrate the effectiveness of the
proposed approach.
Index Terms—Click-Through Rate Prediction, User Behavior
Modeling, Recommendation System
I. INTRODUCTION
Click-through rate (CTR) prediction is one of the most
fundamental tasks for online advertising systems, and it has
attracted much attention from both industrial and academic
communities [1]–[3]. Modeling a user’s interest through his
or her history behaviors on items has proven as one of the
most successful advances in the CTR prediction task [4]–[6].
In the Embedding & Multi-Layer Perceptron (MLP) algo-
rithms for online advertising and recommendation systems,
a user’s interest is usually represented as a fixed-length
embedding vector, based on her history behaviors [4], [7].
Traditional methods take a straightforward way to do a sum or
mean pooling over all behavior embedding vectors to generate
one embedding [7]. However, it ignores the fact that some
behaviors are more important than others given the target item,
user and context features.
Recently, several user behavior modeling methods are pro-
posed to calculate attentive weights for different behaviors w.r.t
1Z. Zheng, X. Gao, and G. Chen are with the MoE Key Lab of Artificial
Intelligence, Department of Computer Science and Engineering, Shanghai Jiao
Tong University. X. Gao is the Corresponding author.
0.00 0.01 0.02 0.03 0.04 0.05 0.06
Inference Time (ms)
0.600
0.602
0.604
0.606
0.608
0.610
0.612
0.614
0.616
AUC
Sum Pooling
DIN
DIEN
DSIN
MAF-C
MAF-S
DIN+
DotProduct
AutoAttention
Fig. 1: AUC and inference time comparison of the proposed
AutoAttention with baselines on the public Alibaba dataset.
Sum pooling, DIN, DIEN, and DSIN are four existing meth-
ods, which only include several manually selected fields in
the attention unit. MAF-C, MAF-S, DIN+, and DotProduct
are several proposed baselines which include all available
fields. AutoAttention also includes all fields, but conducted
field pair selection and achieves new state-of-the-art AUC with
low inference time.
a given target item and then conduct a weighted sum pooling,
such as Deep Interest Network (DIN) [4] and its variants [5],
[6]. Even though these methods achieve significant perfor-
mance lift, they still suffer from the following limitations:
First, in real-world recommendation systems, a user’s
interest may not only depend on the target item but also
on the user’s demographic features or context features.
However, existing works only manually select several
fields from the target item side as the query and interact
them with each behavior to calculate the attentive weight.
It neglects the effect of other fields, including other
fields from the target item side, as well as those from
the user and context sides. For example, when browsing
the game zone of a shopping website, a boy will click
arXiv:2210.15154v1 [cs.IR] 27 Oct 2022
a recommended new game The Witcher 3 because he
clicked some similar games last week, so the item side
fields should be included as all existing works do. Or
it’s because he is in the game zone now and any history
click on games indicates a strong interest in games. In
the latter case, the game zone feature from the context
side plays an important role in capturing his interest from
behaviors.
Second, existing works interact all behavior fields with all
target item side fields. Recent studies [8], [9] show that
some interactions in attention are unnecessary and harm
the performance. Involving more fields as the query may
introduce more irrelevant field interactions and further
deteriorate the performance.
Third, as a part of the input layer of a more complicated
DNN model for CTR prediction, the procedure of gener-
ating a user interest vector should be lightweight. Unfor-
tunately, most existing methods use an MLP to calculate
the attention weight, which leads to high computation
complexity.
To resolve these challenges, we propose to include all
item/user/context fields as the query in the attention unit, and
calculate a learnable weight for each field pair between user
behavior fields and these query fields. To avoid introducing
noisy field pairs, we further propose to automatically select the
most important ones by pruning on these weights. Besides, we
adopt a simple dot product function rather than an MLP as the
attention function, leading to much less computation cost. We
summarize the AUC as well as the average inference time of
AutoAttention and several baseline models in Fig. 1. Except
Sum Pooling which has a very low inference time due to its
simplicity, the proposed AutoAttention gets a higher AUC than
all the other baseline models with low inference time. The
main contributions of this paper are summarized as follows:
We propose to involve all item/user/context fields as the
query in the attention unit for user interest modeling.
A weight is assigned for each field pair between user
behavior fields and these query fields. Pruning on weights
automates the field pair selection, preventing performance
deterioration due to introducing irrelevant field pairs.
We propose to use a simple dot product attention, rather
than an MLP in existing methods. This greatly reduces
the time complexity with comparable or even better
performance.
We conduct extensive experiments on public and produc-
tion datasets to compare AutoAttention with state-of-the-
art methods. Evaluation results verify the effectiveness
of AutoAttention. We also study the learnt field pair
weights and find that AutoAttention does identify several
field pairs including user or context side fields, which are
ignored by expert knowledge in existing works.
The rest of the paper is organized as follows. Section II
provides the preliminaries of existing user behavior methods.
In Section III, we describe AutoAttention, and describe its
connection with several existing methods. Experiment settings
and evaluation results are presented in Section IV. Finally,
Section V and Section VI discusses the related work and
concludes the paper, respectively.
II. PRELIMINARIES
In this section, we present the preliminaries of user behavior
modeling in CTR prediction. A CTR prediction model aims
at predicting the probability that a user clicks an item given
a context (e.g., time, location, publisher information, etc.). It
takes fields from three sides as the input:
pCTR =f(user,item,context)
where user side fields consists of user demographic fields and
behavior fields, item and context denote fields from the item
and context sides, respectively. In this paper, we focus on how
to capture a user’s interest from user behaviors.
Given a user uand her corresponding behaviors
{v1,v2,...,vH}, her interest is represented as a fixed-length
vector as follows:
vu=f(v1,v2,...,vH,eF1,eF2,...,eFM)(1)
where videnotes the embedding for the i-th behavior, H
denotes the length of user behaviors, and eFj∈ RKdenotes
the feature embedding from any other field besides the user
behaviors(e.g., item/user/context side fields), i.e., Fj. Each
behavior is usually represented by multiple item side fields.
Denoting the set of fields to represent behaviors as B={Bp},
then each behavior is represented as vi=PBp∈B vBp, where
vBp∈ RKdenotes the feature embedding for the field Bpof
the i-th behavior.
A straightforward way to calculate vuis to do a sum
or mean pooling over all these viembedding vectors [7].
However, it neglects the importance of each behavior given
a specific target item. Recently, a commonly used behavior
modeling strategy is to adopt an attention mechanism over
the user’s historical behaviors. It learns an attentive weight
for each behavior iw.r.t. a given target item tand then
conducts a weighted sum pooling, i.e. vu=PH
i=1 a(i, t)vi,
where a(i, t)denotes an attention function. For example, Deep
Interest Network (DIN) considers the influence of the target
item on user behaviors [4], which learns larger weights to
those behaviors that are more important given the target item,
as shown in Eqn. (2).
vu=f(v1,v2,...,vH,et)
=
H
X
i=1
a(i, t)vi=
H
X
i=1
MLP(vi,et)vi
(2)
where etdenotes the embedding vector of the target item t.
MLP() denotes an MLP with its output as the attention weight.
Following DIN, DIEN [5] further considers the evolution of
user interest, and DSIN [6] considers the homogeneity and
heterogeneity of a user’s interests within and among sessions.
DIF-SR [10] proposes to only consider the interaction between
2
摘要:

AutoAttention:AutomaticFieldPairSelectionforAttentioninUserBehaviorModelingZuowuZheng,XiaofengGao,JunweiPany,QiLuoz,GuihaiChen,DapengLiuy,andJieJiangyShanghaiJiaoTongUniversity,Shanghai,Chinawaydrow@sjtu.edu.cn,fgao-xf,gcheng@cs.sjtu.edu.cnyTencentInc.,Shenzhen,Chinafjonaspan,rocliu,zeusg@tencen...

展开>> 收起<<
AutoAttention Automatic Field Pair Selection for Attention in User Behavior Modeling Zuowu Zheng Xiaofeng Gao Junwei Pany Qi Luoz Guihai Chen Dapeng Liuy and Jie Jiangy.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:2.8MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注