IntTower the Next Generation of Two-Tower Model for Pre-Ranking System

2025-05-05 0 0 1.37MB 11 页 10玖币

侵权投诉

IntTower: the Next Generation of Two-Tower Model for

Pre-Ranking System

Xiangyang Li∗

Peking University

China

xiangyangli@pku.edu.cn

Bo Chen∗

Huawei Noah’s Ark Lab

China

chenbo116@huawei.com

HuiFeng Guo†

Huawei Noah’s Ark Lab

China

huifeng.guo@huawei.com

Jingjie Li

Huawei Noah’s Ark Lab

China

lijingjie1@huawei.com

Chenxu Zhu

Shanghai JiaoTong University

China

zhuchenxv@sjtu.edu.cn

Xiang Long

Beijing University of Posts and

Telecommunication

China

xianglong@bupt.edu.cn

Sujian Li

Peking University

China

lisujian@pku.edu.cn

Yichao Wang

Huawei Noah’s Ark Lab

China

wangyichao5@huawei.com

Wei Guo

Huawei Noah’s Ark Lab

China

guowei67@huawei.com

Longxia Mao

Huawei Technologies Co Ltd

China

maolongxia@huawei.com

Jinxing Liu

Huawei Technologies Co Ltd

China

liujinxing5@huawei.com

Zhenhua Dong

Huawei Noah’s Ark Lab

China

dongzhenhua@huawei.com

Ruiming Tang†

Huawei Noah’s Ark Lab

China

tangruiming@huawei.com

ABSTRACT

Scoring a large number of candidates precisely in several mil-

liseconds is vital for industrial pre-ranking systems. Existing pre-

ranking systems primarily adopt the

two-tower

model since the

“user-item decoupling architecture” paradigm is able to balance the

eciency and eectiveness. However, the cost of high eciency is

the neglect of the potential information interaction between user

and item towers, hindering the prediction accuracy critically. In

this paper, we show it is possible to design a two-tower model that

emphasizes both information interactions and inference eciency.

The proposed model, IntTower (short for Interaction enhanced Two-

Tower), consists of Light-SE, FE-Block and CIR modules. Specically,

∗

Co-rst authors with equal contributions. Work done when Xiangyang Li was intern

at Huawei Noah’s Ark Lab.

†Corresponding authors.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

CIKM ’22, October 17–21, 2022, Atlanta, GA, USA

ACM ISBN 978-1-4503-9236-5/22/10. . . $15.00

https://doi.org/10.1145/3511808.3557072

lightweight Light-SE module is used to identify the importance of

dierent features and obtain rened feature representations in each

tower. FE-Block module performs ne-grained and early feature

interactions to capture the interactive signals between user and

item towers explicitly and CIR module leverages a contrastive inter-

action regularization to further enhance the interactions implicitly.

Experimental results on three public datasets show that IntTower

outperforms the SOTA pre-ranking models signicantly and even

achieves comparable performance in comparison with the ranking

models. Moreover, we further verify the eectiveness of IntTower

on a large-scale advertisement pre-ranking system. The code of

IntTower is publicly available1.

CCS CONCEPTS

•Information systems →Information retrieval

;

Recommender

systems;•Computing methodologies →Machine learning.

KEYWORDS

Recommender Systems, Pre-Ranking System, Neural Networks

ACM Reference Format:

Xiangyang Li, Bo Chen, HuiFeng Guo, Jingjie Li, Chenxu Zhu, Xiang Long,

Sujian Li, Yichao Wang, Wei Guo, Longxia Mao, Jinxing Liu, Zhenhua Dong,

and Ruiming Tang. 2022. IntTower: the Next Generation of Two-Tower

1https://github.com/archersama/IntTower

arXiv:2210.09890v1 [cs.IR] 18 Oct 2022

CIKM ’22, October 17–21, 2022, Atlanta, GA, USA Xiangyang Li et al.

Model for Pre-Ranking System. In Proceedings of the 31st ACM International

Conference on Information and Knowledge Management (CIKM ’22), October

17–21, 2022, Atlanta, GA, USA. ACM, New York, NY, USA, 11 pages. https:

//doi.org/10.1145/3511808.3557072

1 INTRODUCTION

Existing industrial information services, such as recommender sys-

tem, search engine, and advertisement system, are multi-stage cas-

cade ranking architecture, which contributes to balancing the ef-

ciency and eectiveness in comparison with the single-stage ar-

chitecture [

]. Typical cascade ranking system consists of Recall,

Pre-Ranking, Ranking, and Re-Ranking stages (Figure 1(a)). The

early stages face a massive number of candidates, and thus using

simple models (e.g., LR and DSSM [

]) to guarantee low inference

latency. On the contrary, the later stages pursue subtly selected

items that meet the user’s preferences, and hence complex models

(e.g., DeepFM [

] and AutoInt [

]) are conducive to improve the

prediction accuracy.

Prediction Accuracy

Inference Efficiency

FTRL’ 11

DAT’ 21

DCN’ 17

IntTower’ 22

COLD’ 20

Pre-Ranking Model

Ranking Model

DSSM’ 13

FSCD’ 21

DeepFM’ 17

AutoInt’ 19

Millions

Cascade Ranking

Thousands

Hundreds

Ten s

Recall

Pre-Ranking

Ranking

Re-Ranking

Display

(a) Cascade ranking system. (b) Comparison between prediction accuracy

and inference efficiency.

xDeepFM’ 18

Figure 1: The multi-stage cascade ranking architecture and the

comparison of model prediction accuracy and inference eciency.

Pre-ranking stage is in the middle of a cascade link, which is

absorbed in preliminarily ltering items (thousands of scale) re-

trieved from the previous recall stage and generating candidates

(hundreds of scale) for the subsequent ranking stage. Therefore,

both eectiveness and eciency need to be carefully considered.

Figure 1(b) depicts some representative pre-ranking and ranking

models from the perspective of prediction accuracy and inference

eciency. Compared with the ranking models, pre-ranking models

need to score more candidate items for each user request. Therefore,

pre-ranking models have higher inference eciency while weaker

prediction performance due to simpler structure.

In the evolution of pre-ranking system, LR (Logistic Regres-

sion) [

] is the most basic personalized pre-ranking model, which

is widely used in the shallow machine learning era [

With the rise of deep learning, many industrial companies deploy

various deep models in their commercial systems. The dominant

pre-ranking model in industry is two-tower model [

] (i.e., DSSM),

which utilizes neural networks to capture the interactive signals

within the user/item towers. Moreover, the item representations can

be pre-calculated oine and stored in the fast retrieval container.

During the online serving, only user representations are required

to be calculated in real time while the representations of candidate

items can be retrieved directly. These “user-item decoupling architec-

ture” paradigm provides sterling eciency. Besides, COLD [

] and

FSCD [

] propose a single-tower structure to fully model feature

interaction and further improve the prediction accuracy.

Despite great promise, existing pre-ranking models are dicult

to balance model eectiveness and inference eciency. For the

two-tower model, the cost of high eciency is the neglect of the

information interaction between user and item towers. Two towers

perform intra-tower information extraction parallelly and inde-

pendently, and the learned latent representations do not interact

until the output layer, which is referred to as “Late Interaction” [

hindering the model performance critically. However, the inter-

active signals between user features and item features are vital

for prediction [

]. Though DAT [

] attempts to alleviate this

issue by implicitly modeling the information interaction between

the two towers, the performance gain is still limited. As for the

single-tower structure pre-ranking models (i.e., COLD and FSCD),

although several optimization tricks are introduced for acceleration,

the eciency degradation is still severe (×10).

To solve the eciency-accuracy dilemma, we propose a next

generation of two-tower model for pre-ranking system, named

Int

eraction enhanced Two-

Tower

(

IntTower

), as illustrated in the

Figure 3. The core idea is to enhance the information interaction be-

tween user and item towers while keeping the “user-item decoupling

architecture” paradigm. By introducing ne-grained feature inter-

action modeling, the model capacity of two-tower can be improved

signicantly and the sterling inference eciency can be maintained.

Specically, IntTower rst leverages a lightweight Light-SE module

to identify the importance of dierent features and obtain rened

feature representations. Based on the rened representations, user

and item towers leverage multi-layer nonlinear transformation to

extract latent representations. To capture the interactive signals

between user and item representations, IntTower designs FE-Block

module and CIR module from

explicit

and

implicit

perspectives,

respectively. FE-Block module performs ne-grained and early fea-

ture interactions between multi-layer user representations and

last-layer item representation. Thus, multi-level feature interac-

tion modeling contributes to improving the prediction accuracy

while the user-item decoupling architecture enables high inference

eciency. Moreover, CIR proposes a contrastive interaction reg-

ularization to further enhance the interactions between user and

item representations.

Our main contributions are summarized as follows: (1) We pro-

pose IntTower, the next generation of two-tower model for the

pre-ranking system, which emphasizes both high prediction accu-

racy and inference eciency. (2) IntTower leverages a lightweight

Light-SE module to obtain rened feature representations. Based

on this, FE-Block module and CIR module are proposed to capture

the interactive signals between user and item representations from

explicit and implicit perspectives. (3) Comprehensive experiments

are conducted on three public datasets to demonstrate the superi-

ority of IntTower over prediction accuracy and inference eciency.

Moreover, we further verify the eectiveness of IntTower on a

large-scale advertisement pre-ranking system.

IntTower: the Next Generation of Two-Tower Model for Pre-Ranking System CIKM ’22, October 17–21, 2022, Atlanta, GA, USA

2 RELATED WORK

2.1 Pre-Ranking System

Pre-ranking system is located in middle stage of cascade ranking

system and needs to predict thousands of candidate items in a few

milliseconds, which is sensitive to both model eectiveness and

eciency. LR model is the simplest personalized pre-ranking model,

which has strong tting capability and is widely used in the shallow

machine learning era. Besides, FM model [

] is also popular for the

pre-ranking stage, which models the low-order feature interactions

using factorized parameters and can be calculated in linear time.

With the rise of deep learning [

], neural network-based

models are gradually introduced into industrial pre-ranking sys-

tems. The most important one is two-tower model [

] (i.e., DSSM),

which designs two-tower to capture the interactive signals within

the user/item towers. Moreover, the “user-item decoupling architec-

ture” designing paradigm enables sterling inference eciency. Due

to the pre-storage of the item representations, during the online

serving, only a single inference is required to obtain user repre-

sentation for each request. However, the interaction modeling is

inadequate for DSSM, hindering the model performance critically.

To alleviate this issue, DAT [

] implicitly models the information

interaction between the two towers with an Adaptive-Mimic Mech-

anism. Compared with the DSSM and DAT, our proposed IntTower

proposes FE-Block and CIR modules to capture the interactive sig-

nals between two-tower from explicit and implicit perspectives,

respectively.

To improve the prediction accuracy, COLD [

] and FSCD [

]

leverage the single-tower structure to fully model feature interac-

tion. To balance the model eciency and eectiveness, they adopt

optimization tricks (e.g., parallel computation and semi-precision

calculation) and complexity-aware feature selection for eciency

acceleration, respectively. However, the single-tower structure de-

termines the need for multiple inferences for each user request

given multiple candidate items, which is more time-consuming

than two-tower structure.

2.2 Ranking System

For ranking system, whose scale of predicted candidate items is

much smaller than that in pre-ranking, user preferences over items

need to be learned more accurately. Therefore, models deployed in

the ranking system focus on extracting feature interactions with

various operations. DNN model is widely used to capture the high-

order implicit feature interactions; while the explicit feature inter-

action modeling is diverse for dierent models. Wide&Deep [

]

utilizes handcrafted cross features to memorize important patterns.

PNN [

] and DeepFM [

] use the inner product to capture pair-

wise interactions. CFM [

] and FGCNN [

] leverages the convo-

lution operation to identify the local patterns and models feature

interactions. DCN [

] and EDCN [

] use Cross Network to learn

certain bounded-degree feature interactions, while xDeepFM [

]

extends to vector-wise level with a Compressed Interaction Net-

work. AutoInt [

] and DIN [

] leverage the Attention Network to

model high-order feature interactions and user historical behaviors.

However, these ranking models belong to single-tower structure,

which have large serving latency and cannot be deployed in the

pre-ranking system directly. Instead, our IntTower performs ne-

grained and early interaction modeling with two-tower structure,

achieving comparable prediction accuracy while higher inference

eciency than the ranking models.

3 PRELIMINARY

The neural network based two-tower, one of the state-of-the-art

pre-ranking models, has excellent trade-o between prediction

accuracy and inference eciency. In this section, we rst present

the details of two-tower, then demonstrate both advantages and

disadvantages of applying two-tower model in the pre-ranking

system. The architecture is shown in Figure 2, which consists of

two parallel sub-networks.

User Tower Item Tower

Field 1Field 2

…

Field 𝒎Field 1Field 2

…

Field 𝒏

Embedding Layer Embedding Layer

FC Layer FC Layer

Inner Product

!y y

𝐿!"# Online Serving

Item Vector

Index

Figure 2: The overview of neural network based two-tower model.

Specically, the dataset for training pre-ranking models consists

of instances

(x, 𝑦)

, where

𝑦∈ {

}

indicates the user-item feedback

label [

] that means positive signal when equals 1 and negative

when it is 0, features

can be divided into

𝑚

user-related features

and

𝑛

item-related features, i.e.,

x=[𝑥1, 𝑥2, . . . , 𝑥𝑚

| {z }

user-related

;

𝑥1, 𝑥2, . . . , 𝑥𝑛

| {z }

item-related

]

Then these user-related features and item-related features are fed

into the corresponding parallel sub-towers (i.e., user tower and item

tower) to obtain user and item representations. Take the user tower

as an example. The user-related features are rst fed into the em-

bedding layer to obtain user feature embeddings

e=[e1,e2, ..., e𝑚]

via embedding look-up operation [37], where e𝑖∈R𝑑denotes the

embedding of the

𝑖

-th feature and

𝑑

is the embedding dimension.

Then the feature embeddings are further processed by multiple FC

(fully connected) layers:

h𝑖+1=𝑟𝑒𝑙𝑢(W𝑖h𝑖+b𝑖),(1)

where

W𝑖∈R𝑑𝑖+1×𝑑𝑖

b𝑖∈R𝑑𝑖

are the weight and bias of the

𝑖

-th

FC layer, and the

h0=e

𝑑𝑖

is the width of the

𝑖

-th FC layer and

𝑑0=𝑑×𝑚

. Finally, the user representation can be obtained by:

h𝑢=𝐿

𝑁𝑜𝑟𝑚(h𝐿)

, where

𝐿

is the depth of FC layers. Similarly, the

item representation

h𝑣

can learned via the item tower. Finally, the

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

IntTower:theNextGenerationofTwo-TowerModelforPre-RankingSystemXiangyangLi∗PekingUniversityChinaxiangyangli@pku.edu.cnBoChen∗HuaweiNoah’sArkLabChinachenbo116@huawei.comHuiFengGuo†HuaweiNoah’sArkLabChinahuifeng.guo@huawei.comJingjieLiHuaweiNoah’sArkLabChinalijingjie1@huawei.comChenxuZhuShanghaiJiaoTon...

展开>> 收起<<

IntTower the Next Generation of Two-Tower Model for Pre-Ranking System.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

IntTower the Next Generation of Two-Tower Model for Pre-Ranking System

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: