IntTower the Next Generation of Two-Tower Model for Pre-Ranking System

2025-05-05
0
0
1.37MB
11 页
10玖币
侵权投诉
IntTower: the Next Generation of Two-Tower Model for
Pre-Ranking System
Xiangyang Li∗
Peking University
China
xiangyangli@pku.edu.cn
Bo Chen∗
Huawei Noah’s Ark Lab
China
chenbo116@huawei.com
HuiFeng Guo†
Huawei Noah’s Ark Lab
China
huifeng.guo@huawei.com
Jingjie Li
Huawei Noah’s Ark Lab
China
lijingjie1@huawei.com
Chenxu Zhu
Shanghai JiaoTong University
China
zhuchenxv@sjtu.edu.cn
Xiang Long
Beijing University of Posts and
Telecommunication
China
xianglong@bupt.edu.cn
Sujian Li
Peking University
China
lisujian@pku.edu.cn
Yichao Wang
Huawei Noah’s Ark Lab
China
wangyichao5@huawei.com
Wei Guo
Huawei Noah’s Ark Lab
China
guowei67@huawei.com
Longxia Mao
Huawei Technologies Co Ltd
China
maolongxia@huawei.com
Jinxing Liu
Huawei Technologies Co Ltd
China
liujinxing5@huawei.com
Zhenhua Dong
Huawei Noah’s Ark Lab
China
dongzhenhua@huawei.com
Ruiming Tang†
Huawei Noah’s Ark Lab
China
tangruiming@huawei.com
ABSTRACT
Scoring a large number of candidates precisely in several mil-
liseconds is vital for industrial pre-ranking systems. Existing pre-
ranking systems primarily adopt the
two-tower
model since the
“user-item decoupling architecture” paradigm is able to balance the
eciency and eectiveness. However, the cost of high eciency is
the neglect of the potential information interaction between user
and item towers, hindering the prediction accuracy critically. In
this paper, we show it is possible to design a two-tower model that
emphasizes both information interactions and inference eciency.
The proposed model, IntTower (short for Interaction enhanced Two-
Tower), consists of Light-SE, FE-Block and CIR modules. Specically,
∗
Co-rst authors with equal contributions. Work done when Xiangyang Li was intern
at Huawei Noah’s Ark Lab.
†Corresponding authors.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA
©2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9236-5/22/10. . . $15.00
https://doi.org/10.1145/3511808.3557072
lightweight Light-SE module is used to identify the importance of
dierent features and obtain rened feature representations in each
tower. FE-Block module performs ne-grained and early feature
interactions to capture the interactive signals between user and
item towers explicitly and CIR module leverages a contrastive inter-
action regularization to further enhance the interactions implicitly.
Experimental results on three public datasets show that IntTower
outperforms the SOTA pre-ranking models signicantly and even
achieves comparable performance in comparison with the ranking
models. Moreover, we further verify the eectiveness of IntTower
on a large-scale advertisement pre-ranking system. The code of
IntTower is publicly available1.
CCS CONCEPTS
•Information systems →Information retrieval
;
Recommender
systems;•Computing methodologies →Machine learning.
KEYWORDS
Recommender Systems, Pre-Ranking System, Neural Networks
ACM Reference Format:
Xiangyang Li, Bo Chen, HuiFeng Guo, Jingjie Li, Chenxu Zhu, Xiang Long,
Sujian Li, Yichao Wang, Wei Guo, Longxia Mao, Jinxing Liu, Zhenhua Dong,
and Ruiming Tang. 2022. IntTower: the Next Generation of Two-Tower
1https://github.com/archersama/IntTower
arXiv:2210.09890v1 [cs.IR] 18 Oct 2022
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA Xiangyang Li et al.
Model for Pre-Ranking System. In Proceedings of the 31st ACM International
Conference on Information and Knowledge Management (CIKM ’22), October
17–21, 2022, Atlanta, GA, USA. ACM, New York, NY, USA, 11 pages. https:
//doi.org/10.1145/3511808.3557072
1 INTRODUCTION
Existing industrial information services, such as recommender sys-
tem, search engine, and advertisement system, are multi-stage cas-
cade ranking architecture, which contributes to balancing the ef-
ciency and eectiveness in comparison with the single-stage ar-
chitecture [
22
]. Typical cascade ranking system consists of Recall,
Pre-Ranking, Ranking, and Re-Ranking stages (Figure 1(a)). The
early stages face a massive number of candidates, and thus using
simple models (e.g., LR and DSSM [
9
]) to guarantee low inference
latency. On the contrary, the later stages pursue subtly selected
items that meet the user’s preferences, and hence complex models
(e.g., DeepFM [
6
] and AutoInt [
26
]) are conducive to improve the
prediction accuracy.
Prediction Accuracy
Inference Efficiency
FTRL’ 11
DAT’ 21
DCN’ 17
IntTower’ 22
COLD’ 20
Pre-Ranking Model
Ranking Model
DSSM’ 13
FSCD’ 21
DeepFM’ 17
AutoInt’ 19
Millions
Cascade Ranking
Thousands
Hundreds
Ten s
Recall
Pre-Ranking
Ranking
Re-Ranking
Display
(a) Cascade ranking system. (b) Comparison between prediction accuracy
and inference efficiency.
xDeepFM’ 18
Figure 1: The multi-stage cascade ranking architecture and the
comparison of model prediction accuracy and inference eciency.
Pre-ranking stage is in the middle of a cascade link, which is
absorbed in preliminarily ltering items (thousands of scale) re-
trieved from the previous recall stage and generating candidates
(hundreds of scale) for the subsequent ranking stage. Therefore,
both eectiveness and eciency need to be carefully considered.
Figure 1(b) depicts some representative pre-ranking and ranking
models from the perspective of prediction accuracy and inference
eciency. Compared with the ranking models, pre-ranking models
need to score more candidate items for each user request. Therefore,
pre-ranking models have higher inference eciency while weaker
prediction performance due to simpler structure.
In the evolution of pre-ranking system, LR (Logistic Regres-
sion) [
19
] is the most basic personalized pre-ranking model, which
is widely used in the shallow machine learning era [
20
,
28
,
31
].
With the rise of deep learning, many industrial companies deploy
various deep models in their commercial systems. The dominant
pre-ranking model in industry is two-tower model [
9
] (i.e., DSSM),
which utilizes neural networks to capture the interactive signals
within the user/item towers. Moreover, the item representations can
be pre-calculated oine and stored in the fast retrieval container.
During the online serving, only user representations are required
to be calculated in real time while the representations of candidate
items can be retrieved directly. These “user-item decoupling architec-
ture” paradigm provides sterling eciency. Besides, COLD [
32
] and
FSCD [
18
] propose a single-tower structure to fully model feature
interaction and further improve the prediction accuracy.
Despite great promise, existing pre-ranking models are dicult
to balance model eectiveness and inference eciency. For the
two-tower model, the cost of high eciency is the neglect of the
information interaction between user and item towers. Two towers
perform intra-tower information extraction parallelly and inde-
pendently, and the learned latent representations do not interact
until the output layer, which is referred to as “Late Interaction” [
36
],
hindering the model performance critically. However, the inter-
active signals between user features and item features are vital
for prediction [
30
]. Though DAT [
35
] attempts to alleviate this
issue by implicitly modeling the information interaction between
the two towers, the performance gain is still limited. As for the
single-tower structure pre-ranking models (i.e., COLD and FSCD),
although several optimization tricks are introduced for acceleration,
the eciency degradation is still severe (×10).
To solve the eciency-accuracy dilemma, we propose a next
generation of two-tower model for pre-ranking system, named
Int
eraction enhanced Two-
Tower
(
IntTower
), as illustrated in the
Figure 3. The core idea is to enhance the information interaction be-
tween user and item towers while keeping the “user-item decoupling
architecture” paradigm. By introducing ne-grained feature inter-
action modeling, the model capacity of two-tower can be improved
signicantly and the sterling inference eciency can be maintained.
Specically, IntTower rst leverages a lightweight Light-SE module
to identify the importance of dierent features and obtain rened
feature representations. Based on the rened representations, user
and item towers leverage multi-layer nonlinear transformation to
extract latent representations. To capture the interactive signals
between user and item representations, IntTower designs FE-Block
module and CIR module from
explicit
and
implicit
perspectives,
respectively. FE-Block module performs ne-grained and early fea-
ture interactions between multi-layer user representations and
last-layer item representation. Thus, multi-level feature interac-
tion modeling contributes to improving the prediction accuracy
while the user-item decoupling architecture enables high inference
eciency. Moreover, CIR proposes a contrastive interaction reg-
ularization to further enhance the interactions between user and
item representations.
Our main contributions are summarized as follows: (1) We pro-
pose IntTower, the next generation of two-tower model for the
pre-ranking system, which emphasizes both high prediction accu-
racy and inference eciency. (2) IntTower leverages a lightweight
Light-SE module to obtain rened feature representations. Based
on this, FE-Block module and CIR module are proposed to capture
the interactive signals between user and item representations from
explicit and implicit perspectives. (3) Comprehensive experiments
are conducted on three public datasets to demonstrate the superi-
ority of IntTower over prediction accuracy and inference eciency.
Moreover, we further verify the eectiveness of IntTower on a
large-scale advertisement pre-ranking system.
IntTower: the Next Generation of Two-Tower Model for Pre-Ranking System CIKM ’22, October 17–21, 2022, Atlanta, GA, USA
2 RELATED WORK
2.1 Pre-Ranking System
Pre-ranking system is located in middle stage of cascade ranking
system and needs to predict thousands of candidate items in a few
milliseconds, which is sensitive to both model eectiveness and
eciency. LR model is the simplest personalized pre-ranking model,
which has strong tting capability and is widely used in the shallow
machine learning era. Besides, FM model [
24
] is also popular for the
pre-ranking stage, which models the low-order feature interactions
using factorized parameters and can be calculated in linear time.
With the rise of deep learning [
14
,
15
], neural network-based
models are gradually introduced into industrial pre-ranking sys-
tems. The most important one is two-tower model [
9
] (i.e., DSSM),
which designs two-tower to capture the interactive signals within
the user/item towers. Moreover, the “user-item decoupling architec-
ture” designing paradigm enables sterling inference eciency. Due
to the pre-storage of the item representations, during the online
serving, only a single inference is required to obtain user repre-
sentation for each request. However, the interaction modeling is
inadequate for DSSM, hindering the model performance critically.
To alleviate this issue, DAT [
35
] implicitly models the information
interaction between the two towers with an Adaptive-Mimic Mech-
anism. Compared with the DSSM and DAT, our proposed IntTower
proposes FE-Block and CIR modules to capture the interactive sig-
nals between two-tower from explicit and implicit perspectives,
respectively.
To improve the prediction accuracy, COLD [
32
] and FSCD [
18
]
leverage the single-tower structure to fully model feature interac-
tion. To balance the model eciency and eectiveness, they adopt
optimization tricks (e.g., parallel computation and semi-precision
calculation) and complexity-aware feature selection for eciency
acceleration, respectively. However, the single-tower structure de-
termines the need for multiple inferences for each user request
given multiple candidate items, which is more time-consuming
than two-tower structure.
2.2 Ranking System
For ranking system, whose scale of predicted candidate items is
much smaller than that in pre-ranking, user preferences over items
need to be learned more accurately. Therefore, models deployed in
the ranking system focus on extracting feature interactions with
various operations. DNN model is widely used to capture the high-
order implicit feature interactions; while the explicit feature inter-
action modeling is diverse for dierent models. Wide&Deep [
4
]
utilizes handcrafted cross features to memorize important patterns.
PNN [
23
] and DeepFM [
6
] use the inner product to capture pair-
wise interactions. CFM [
33
] and FGCNN [
17
] leverages the convo-
lution operation to identify the local patterns and models feature
interactions. DCN [
30
] and EDCN [
2
] use Cross Network to learn
certain bounded-degree feature interactions, while xDeepFM [
16
]
extends to vector-wise level with a Compressed Interaction Net-
work. AutoInt [
26
] and DIN [
39
] leverage the Attention Network to
model high-order feature interactions and user historical behaviors.
However, these ranking models belong to single-tower structure,
which have large serving latency and cannot be deployed in the
pre-ranking system directly. Instead, our IntTower performs ne-
grained and early interaction modeling with two-tower structure,
achieving comparable prediction accuracy while higher inference
eciency than the ranking models.
3 PRELIMINARY
The neural network based two-tower, one of the state-of-the-art
pre-ranking models, has excellent trade-o between prediction
accuracy and inference eciency. In this section, we rst present
the details of two-tower, then demonstrate both advantages and
disadvantages of applying two-tower model in the pre-ranking
system. The architecture is shown in Figure 2, which consists of
two parallel sub-networks.
User Tower Item Tower
Field 1Field 2
…
Field 𝒎Field 1Field 2
…
Field 𝒏
Embedding Layer Embedding Layer
FC Layer FC Layer
FC Layer FC Layer
FC Layer FC Layer
Inner Product
!y y
𝐿!"# Online Serving
Item Vector
Index
Figure 2: The overview of neural network based two-tower model.
Specically, the dataset for training pre-ranking models consists
of instances
(x, 𝑦)
, where
𝑦∈ {
0
,
1
}
indicates the user-item feedback
label [
25
] that means positive signal when equals 1 and negative
when it is 0, features
x
can be divided into
𝑚
user-related features
and
𝑛
item-related features, i.e.,
x=[𝑥1, 𝑥2, . . . , 𝑥𝑚
| {z }
user-related
;
𝑥1, 𝑥2, . . . , 𝑥𝑛
| {z }
item-related
]
.
Then these user-related features and item-related features are fed
into the corresponding parallel sub-towers (i.e., user tower and item
tower) to obtain user and item representations. Take the user tower
as an example. The user-related features are rst fed into the em-
bedding layer to obtain user feature embeddings
e=[e1,e2, ..., e𝑚]
via embedding look-up operation [37], where e𝑖∈R𝑑denotes the
embedding of the
𝑖
-th feature and
𝑑
is the embedding dimension.
Then the feature embeddings are further processed by multiple FC
(fully connected) layers:
h𝑖+1=𝑟𝑒𝑙𝑢(W𝑖h𝑖+b𝑖),(1)
where
W𝑖∈R𝑑𝑖+1×𝑑𝑖
,
b𝑖∈R𝑑𝑖
are the weight and bias of the
𝑖
-th
FC layer, and the
h0=e
,
𝑑𝑖
is the width of the
𝑖
-th FC layer and
𝑑0=𝑑×𝑚
. Finally, the user representation can be obtained by:
h𝑢=𝐿
2
𝑁𝑜𝑟𝑚(h𝐿)
, where
𝐿
is the depth of FC layers. Similarly, the
item representation
h𝑣
can learned via the item tower. Finally, the
摘要:
展开>>
收起<<
IntTower:theNextGenerationofTwo-TowerModelforPre-RankingSystemXiangyangLi∗PekingUniversityChinaxiangyangli@pku.edu.cnBoChen∗HuaweiNoah’sArkLabChinachenbo116@huawei.comHuiFengGuo†HuaweiNoah’sArkLabChinahuifeng.guo@huawei.comJingjieLiHuaweiNoah’sArkLabChinalijingjie1@huawei.comChenxuZhuShanghaiJiaoTon...
声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
相关推荐
-
VIP免费2024-12-03 2
-
VIP免费2024-12-03 2
-
VIP免费2024-12-03 2
-
VIP免费2024-12-03 6
-
VIP免费2024-12-03 1
-
VIP免费2024-12-03 4
-
VIP免费2024-12-03 33
-
VIP免费2024-12-03 10
-
VIP免费2024-12-03 7
-
VIP免费2024-12-03 49
分类:图书资源
价格:10玖币
属性:11 页
大小:1.37MB
格式:PDF
时间:2025-05-05
作者详情
-
VP-STO Via-point-based Stochastic Trajectory Optimization for Reactive Robot Behavior Julius Jankowski12 Lara Bruderm uller3 Nick Hawes3and Sylvain Calinon1210 玖币0人下载
-
WA VEFIT AN ITERATIVE AND NON-AUTOREGRESSIVE NEURAL VOCODER BASED ON FIXED-POINT ITERATION Yuma Koizumi1 Kohei Yatabe2 Heiga Zen1 Michiel Bacchiani110 玖币0人下载