
KDD Workshop, Aug 15, 2022, Washington DC, USA Diddigi et al.
Figure 1: Proposed architecture of session recommendation.
The previous 𝑘(three in this example) products are consid-
ered for recommending product at instant 𝑘+
1
. First, the at-
tributes of the products (𝐴, 𝐵, 𝐶) are extracted. Next, these at-
tributes are then fed to the RL agents to generate the next at-
tribute recommendations. Finally, the attribute recommen-
dations are combined to obtain the product to be recom-
mended at instant 𝑘+1(fourth in this example).
immediate impact (i.e., customers might either like and continue
exploring it or exit the store) and also inuences the future actions
of the customers (i.e., making a purchase). Emulating this in the
online scenario requires capturing this dynamic nature of the prob-
lem and balancing short-term and long-term goals. This motivates
us to formulate it in the framework of the Markov Decision Process
(MDP) [
3
,
24
]. We dene a session as a sequence of events of a user
until it leads to one of the following (a). purchase (b). user exits
the session. This work aims to optimally recommend a sequence of
products that could potentially lead to a purchase event.
Reinforcement Learning (RL) [
31
] is a popular model-free para-
digm for solving an MDP problem. Here, we train an agent to make
optimal decisions based only on the trajectories of the environ-
ment. When the number of states and actions in the environment
is very high, one resorts to function approximation architectures.
RL algorithms combined with neural network architectures, i.e.,
Deep RL, have achieved a lot of success in recent times [
21
,
22
]. In
this work, we train a Deep RL agent to recommend a sequence of
products in a session. It is important to note that the traditional
RL setup where agents learn by exploring dierent actions is not a
favorable setting for our problem due to a large number of prod-
ucts in the e-commerce space. Hence, we train the algorithm under
an o-policy setting using the users’ historical session data. The
dataset considered in this work is compiled from the click-stream
data of users on the Myntra e-commerce platform, one of the largest
fashion e-commerce in India.
Training a deep RL agent to recommend the products directly is
not practical due to heterogeneity (in terms of dierent attributes)
in the users’ browsing history. There are two problems associated
with this training paradigm. First, the number of products is huge
(which constitutes the action space). Second, there might not be
a denite trend that can be learned from this data, making the
training very unstable. For example, consider a scenario where two
users browse products in a sequence that is similar in every regard
(like product type, color) except a specic attribute ‘brand.’ Say
the session of the former user ends up in a purchase, whereas the
latter is a non-purchase session. The eectiveness of RL training
lies in the generalization of actions and hence the sessions of such
nature will lead to unstable learning. To mitigate this problem, we
propose a divide-and-conquer approach where multiple RL agents
will be trained to recommend various attributes of the products, and
these recommendations will be combined at the end to generate
product recommendations. This is illustrated in Figure 1. First,
attributes of the products (like color, product type, and brand) are
extracted. These attributes are then sent as inputs to independent
DRL models to obtain the recommendations for attributes in the
next time instant. Finally, the products that match the attributes
are presented as nal recommendations to the user.
The overall contributions of the paper are as follows:
•
We mathematically formulate the problem of User Session
recommendation in the framework of MDP.
•
We propose a Deep Q-Learning based model to predict the
next products within a user session while optimizing for
purchase intent.
•
We compare the proposed model with a similarity-based
baseline model to showcase our proposed approach’s e-
cacy.
2 RELATED WORK
Recommendation systems deal with building algorithms for rec-
ommending products to the user to meet various objectives like
user personalization, increased engagement rate, and improving
business goals. The idea here is to accurately predict users’ interest
and recommend products that meet their expectations. Recom-
mendation systems nds its applications in various domains like
news recommendation [
16
,
17
,
19
,
39
,
42
], movie recommendation
[
2
,
26
,
30
,
35
] etc. The importance of recommendation systems is
even more pronounced in the e-commerce business, where the buy-
ing and selling of products are performed virtually online. There-
fore, it is imperative from the business point of view to recommend
relevant and specic products to the users to sustain their interest
over a long period. As a result, a lot of research has been dedicated
to build good recommendation systems [
33
] in recent times to solve
problems like Click-Through-Rate (CTR) prediction [
7
,
14
,
40
,
41
],
intent and purchase prediction [6, 13, 37] etc.
Deep Learning is a popular class of machine learning algorithms
that uses articial neural networks to learn and derive required
patterns from the input data. We will now discuss some popular
deep learning techniques proposed in the literature for solving the
recommendation problem. In [
34
], two deep learning algorithms
based on the collaborative ltering technique have been proposed
to handle cold-start problems. In [
8
], a deep learning model has
been deployed to simulate the interaction between item and user by
feeding the pre-trained representations of item and user as input to