because both explicit and implicit similarities are crucial to
the problem, and the latter ones are hard to be fully recog-
nized.
In this paper, we propose a method called Knowl-
edge aware Adaptive Session Multi-Topic Network(KAST),
which employs two key modules: Adaptive Session Seg-
mentation (ASS) module, and Knowledge aware Structure-
information Extraction (KSE) module. In the ASS module,
on the basis of the original session, by adaptively segment-
ing the user behavior sequence into sessions, the effect of
learning the session-level topic evolution can be enhanced,
and interference from casual behaviors among the sequence
is precluded, which the whole step is end-to-end. More-
over, the performance of ASS module depends on the quality
of the embedding matrix. Therefore, it is expected that the
more similar an item pair is in reality, the closer it will be in
the latent space. Hence, we employ KSE module, which in-
corporates into KAST to optimize the user and item embed-
ding matrices. The structural information between the users
and items on the graph, as novel knowledge for embedding
representation learning, is utilized in this process. In addi-
tion, the marginal based loss is merged into the major loss
function. In this way, the KSE module further assists ASS
module to accurately learn the topics of interest in session
level.
It should be noted that user behaviors in the sequence,
such as click or collection, are not equally spaced. How-
ever, the input sequence in RNN is supposed to be equally
spaced by default. Hence, having acquired the optimized
sessions, the pooling layer distills the session behaviors
into session-level vector representations. In this way, the
non-equality problem between items within each session no
longer jeopardizes the NLP conditions, and it is alleviated in
the session-level. Next, GRU is utilized to capture the evo-
lution of the topics, after which the attention layer is used
to weigh all the sessions by evaluating the correlation be-
tween the target item and each session. In the end, the final
representation is obtained.
To sum up, the main contributions of this paper are listed
as follows:
• We propose the KAST network architecture, which cap-
tures the topic of interests from each adaptively-divided
session, and alleviates the problem of unequal spacing in
user behavior sequences.
• We design an ASS module to update sessions automati-
cally, which uses the dynamic updating embedding matrix
to make the session division more reasonable, and reduces
more manual feature engineering as well.
• In order to enhance the reliability of the ASS module,
we further design the KSE module. It is able to learn the
structural knowledge from the graph to improve the effect
of the embedding matrix.
• We conduct extensive experiments to compare the pro-
posed method with many typical methods on the public
datasets, and we also evaluate the effect and robustness
for ASS and KSE modules. It is shown that the proposed
method obtains the state-of-art results on the CTR predic-
tion task.
2 Related Work
General Deep Models
Most deep networks are based on embedding and
multi-layer perceptron(Emb&MLP) structure, and
Wide&Deep(Cheng et al. 2016) combines the memory
ability of the linear part and the generalization ability of
the DNN part to improve the overall performance, which
constructs the basis for most of subsequent deep models.
Deep&Cross Net(DCN)(Wang et al. 2017) can explicitly
selects feature set to design higher-order feature crossing,
which avoids useless combined features. The ”cross”
net structure can effectively learn the bounded-degree
combined feature. Compared to Emb&MLP, PNN(Qu et al.
2016) designs the “product layer” after the embedding
to capture field-based second-order feature correlation.
AFM(Xiao et al. 2017) adds attention mechanism on the
basis of FM, which evaluates the importance of feature
interactions and reduce the impact of feature noise. Similar
to the Wide&Deep, DeepFM(Guo et al. 2017) is also jointly
trained by the shallow part and the deep part. The major
difference is that the LR is replaced by FM in the shallow
part, and FM is able to automatically learn cross features.
AutoInt(Song et al. 2019) uses multi-head self-attention
mechanism to perform automatic feature-crossing learning,
and therefore improves the accuracy of CTR prediction
tasks.
Sequence-based Deep Models
The user’s behavior sequence contains rich information,
which implies user’s interest trend. Therefore, modeling
behavior sequences can improve the accuracy of CTR
prediction. FPMC(Rendle, Freudenthaler, and Schmidt-
Thieme 2010) introduces a personalized transition ma-
trix based on Markov chains, which captures both time
information and long-term user preference information.
YoutubeDNN(Covington, Adams, and Sargin 2016) uses
average pooling to encode user behavior sequences into a
fixed-length vector to feed into MLP. DIN(Zhou et al. 2018)
learns the user’s historical behavior representation through
the attention mechanism. DIEN(Zhou et al. 2019) further
accomplishes efficient characterization of user behavior se-
quences by introducing auxiliary loss, and then uses AU-
GRU to capture the evolving trend of user interests. How-
ever, it should be pointed out that the time intervals of user
behavior sequences are not evenly spaced, so RNN-based
techniques are not perfectly suitable for this problem. SLi-
Rec(Yu et al. 2019) improves the structure of LSTM and
introduces time difference between the adjacent items to
model unequally-spaced behavior, which significantly im-
proves the performance.
Session-based Deep Models
In CTR prediction tasks, there are not many session-based
deep methods. GRU4REC(Hidasi et al. 2015) uses RNN for
session-based recommendation for the first time, and the
user’s click sequence is compressed by embedding for the
purpose of forming a continuous low-dimensional vector in-
put to GRU. After that, neural attentive recommendation ma-