
WWW ’23, May 1–5, 2023, Austin, TX, USA Hou, et al.
universal item representations. Based on such item representations,
sequential recommenders pre-trained on the interaction data from
a mixture of multiple domains [
10
,
18
,
50
] have shown promising
transferability. Such a paradigm can be denoted as “text
=⇒
rep-
resentation”. Despite the eectiveness, we argue that the binding
between item text and item representations is “too tight” in previous
approaches [
10
,
18
], thus leading to two potential issues. First, since
these methods employ text encodings to derive item representations
(without using item IDs), text semantics have a direct inuence on
the recommendation model. Thus, the recommender might over-
emphasize the eect of text features (e.g., generating very similar
recommendations in texts) instead of sequential characteristics re-
ected in interaction data. Secondly, text encodings from dierent
domains (with varied distributions and semantics [
11
,
18
]) are not
naturally aligned in a unied semantic space, and the domain gap
existing in text encodings is likely to cause a performance drop
during multi-domain pre-training. The tight binding between text
encodings and item representations might exaggerate the negative
impact of the domain gap.
Considering these issues, our solution is to incorporate inter-
mediate discrete item indices (called codes in this work) in item
representation scheme and relax the strong binding between item
text and item representations, which can be denoted as “text
=⇒
code
=⇒
representation”. Instead of directly mapping text encodings
into item representations, we consider a two-step item representa-
tion scheme. Given an item, it rst maps the item text to a vector
of discrete indices (i.e., item code), and then aggregates the cor-
responding embeddings according to the item code as the item
representation. The merits of such a representation scheme are
twofold. Firstly, item text is mainly utilized to generate discrete
codes, which can reduce its inuence on the recommendation model
meanwhile inject useful text semantics. Second, the two mapping
steps can be learned or tuned according to downstream domains
or tasks, making it more exible to t new recommendation sce-
narios. To develop our approach, we highlight two key challenges
to address: (i) how to learn discrete item codes that are suciently
distinguishable for accurate recommendation; (ii) how to eectively
pre-train and adapt the item representations considering the varied
distribution and semantics across dierent domains.
To this end, we propose
VQ-Rec
, a novel approach to learn
V
ector-
Q
uantized item representations for transferable sequential
Rec
ommenders. Dierent from existing transferable recommenders
based on PLM encoding, VQ-Rec maps each item into a discrete
𝐷
-dimensional code as the indices for embedding lookup. To ob-
tain semantically-rich and distinguishable item codes, we utilize
optimized product quantization (OPQ) techniques to discretize text
encodings of items. In this way, the discrete codes that preserve
the textual semantics are distributed over the item set in a more
uniform way, so as to be highly distinguishable. Since our repre-
sentation scheme does not modify the underlying backbone (i.e.,
Transformer), it is generally applicable to various sequential ar-
chitectures. To capture transferable patterns based on item codes,
we pre-train the recommender on a mixture of multiple domains
in a contrastive learning approach. Both mixed-domain and semi-
synthetic code representations are used as hard negatives to en-
hance the contrastive training. To transfer the pre-trained model
to a downstream domain, we propose a dierentiable permutation-
based network to learn the code-embedding alignment, and fur-
ther update the code embedding table to t the new domain. Such
ne-tuning is highly parameter-ecient, as only the parameters
involved in item representations need to be tuned.
Empirically, we conduct extensive experiments on six bench-
marks, including both cross-domain and cross-platform scenarios.
Experimental results demonstrate the strong transferability of our
approach. Especially, inductive recommenders purely based on item
text can recommend new items without re-training, and meanwhile
gain better performance on known items.
2 METHODOLOGY
In this section, we present the proposed transferable sequential
Rec
ommendation approach based on
V
ector-
Q
uantized item in-
dices, named VQ-Rec.
2.1 Approach Overview
Task formulation.
We consider the sequential recommendation
task setting that multi-domain interaction data is available as train-
ing (or pre-training) data. Formally, the interaction data of a user
in some domain can be denoted as an interaction sequence
𝑠=
{𝑖1, 𝑖2, . . . , 𝑖𝑛}
(in chronological order), where each interacted item
𝑖
is associated with a unique item ID and text data, e.g., title or de-
scription (item text). Since a user is likely to interact with items from
multiple domains, we can derive multiple interaction sequences
for a user. Considering the large semantic gap across dierent do-
mains [
18
], we don’t combine the multiple interaction sequences of
a user into a single sequence, but instead keep these sequences per
domain. Note that the item IDs are not explicitly utilized to generate
item representations in our approach. The task goal is to pre-train
a transferable sequential recommender that can eectively adapt
to new domains (unseen in training data).
Solution overview.
To develop the sequential recommender, we
adopt the popular Transformer architecture [
24
] as the backbone
of our approach. It is built on the self-attention mechanism, tak-
ing as input item embeddings and positional embeddings at each
time step. Unlike previous related studies [
18
], we don’t include
any additional components (e.g., adaptors) into the Transformer
architecture, but instead learn transferable item representations
for feeding the backbone. The key novelty of our approach lies in
the new item representation scheme for sequential recommenders.
In this scheme, we rst map item text into a vector of discrete in-
dices (called an item code), and then employ these indices to lookup
the code embedding table for deriving item representations. Such a
scheme can be denoted as “text
=⇒
code
=⇒
representation”, which
removes the tight binding between item text and item represen-
tations. In order to learn and transfer such item representations,
we further propose specic strategies for contrastive recommender
pre-training and cross-domain recommender ne-tuning.
The overall framework of the proposed approach VQ-Rec is
depicted in Figure 1. We consider three key components for devel-
oping transferable recommenders: (i) how to represent the items
with vector-quantized code representation (Section 2.2); (ii) how to
train the recommenders based on the new representation scheme