
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA Govind Waghmare, Ankur Debnath, Siddhartha Asthana, & Aakarsh Malhotra
marks impart realistic and reliable information. Marked TPP is a
probabilistic framework [
6
] which aims to model the joint distri-
bution of time and mark of the next event using previous event
history. An estimation of the next event time and the mark has
practical application in many domains that exhibit complex time
and mark interactions. Such application include online user engage-
ments [
11
,
16
,
37
], information diusion [
28
], econometrics [
1
], and
healthcare [
10
]. In personalized healthcare, a patient could have
a complex medical history, and several diseases may depend on
each other. Predictive EHR modeling could reveal potential future
clinical events and facilitate ecient resource allocation.
Time and mark dependency: While modeling the conditional
joint distribution of time and marks, many prior works assume
marks to be conditionally independent of time [
8
,
25
]. This assump-
tion on the conditional joint distribution of time and mark leads
to two types of marked TPPs, (i) conditionally independent, and
(ii) conditionally dependent models. The independence assump-
tion allows factorization of the conditional joint distribution into
a product of two independent conditional distributions. It is the
product of continuous-time distribution and categorical mark distri-
bution
1
, both conditioned on the event history. The independence
between time and mark limits the structural design of the neural ar-
chitecture in conditionally independent models. Thus, such models
require fewer parameters to specify the conditional joint distribu-
tion of time and marks but fail to capture their dependence. On the
contrary, conditionally dependent models capture the dependency
between time and mark by either conditioning time distribution on
mark or mark distribution on time. A recent study by [
10
] shows
that the conditionally independent models perform poorly com-
pared to conditionally dependent models.
Multivariate TPP: Marked TPP is a joint probability distribution
over a given time interval. In order to model time and mark depen-
dency, the time distribution should be conditioned on all possible
marks. It leads to a multivariate TPP model where a tuple of time
distributions is learned over a set of categorical marks [
21
]. For
𝐾
distinct marks,
𝑘𝑡ℎ
multivariate distribution (
𝑘∈ {
1
, . . . , 𝐾}
)
indicates the joint distribution of the time and the 𝑘𝑡ℎ mark.
Intensity-based vs intensity-free modeling: In both condition-
ally independent and conditionally dependent models, inter-event
time distribution is a key factor of the joint distribution. The stan-
dard way of learning time distribution is by estimating conditional
intensity function. However, the intensity function requires select-
ing good parametric formulation [
29
]. The parametric intensity
function often makes assumptions about the latent dynamics of
the point process. A simple parametrization has limited expres-
siveness but makes likelihood computation easy. Though an ad-
vanced parametrization adequately captures event dynamics, likeli-
hood computation often involves numerical approximation using
Newton-Raphson or Monte Carlo (MC). Besides intensity-based
formulation, other ways to model conditional inter-event time dis-
tribution involve probability density function (PDF) modeling, cu-
mulative distribution function, survival function, and cumulative
intensity function [
24
,
30
]. A model based on an intensity-free
focuses on closed-form likelihood, closed-form sampling, and exi-
bility to approximate any distribution.
1Categorical marks are conventional in the prior works.
In this work, we model inter-dependence between time and mark
by learning conditionally dependent distribution. While inferring
the next event, we model a PDF of inter-event time distribution for
each discrete mark. The time distribution conditioned on marks
improves the predictive performance of the proposed models com-
pared to others. A high-level overview of our approach is shown in
Figure 1. In summary, we make the following contributions:
•
We overcome the structural design limitation of condition-
ally independent models by proposing novel conditionally
dependent, both intensity-free and intensity-based, and multi-
variate TPP models. To capture inter-dependence between
mark and time, we condition the time distribution on the
current mark in addition to event history.
•
We improve the predictive performance of the intensity-
based models through conditionally dependent modeling.
Further, we draw on the intensity-free literature to design
a exible multivariate marked TPP model. We model the
PDF of conditional inter-event time to enable closed-form
likelihood computation and closed-form sampling.
•
Using multiple metrics, we provide a comprehensive evalua-
tion of a diverse set of synthetic and real-world datasets. The
proposed models consistently outperform both conditionally
independent and conditionally dependent models.
2 RELATED WORK
In this section, we provide a brief overview of classical (non-neural)
TPPs and neural TPPs. Later, we discuss conditionally independent
and conditionally dependent models. In the end, we dierentiate the
proposed solution against state-of-the-art models in the literature.
2.1 Classical (non-neural) TPPs
TPPs are mainly described via conditional intensity function. Basic
TPP models make suitable assumptions about the underlying sto-
chastic process resulting in constrained intensity parametrizations.
For instance, Poisson process [
18
,
26
] assumes that inter-event
times are independent. In Hawkes process [
14
,
23
] event excitation
is positive, additive over time, and decays exponentially with time.
Self-correcting process [
15
] and autoregressive conditional duration
process [
9
] propose dierent conditional intensity parametrizations
to capture inter-event time dynamics. These constraints on condi-
tional intensity limit the expressive power of the models and hurt
predictive performance due to model misspecication [8].
2.2 Neural TPPs
Neural TPPs are more expressive and computationally ecient
than classical TPPs due to their ability to learn complex dependen-
cies. A TPP model inferring the time and mark of the next event
sequentially is called autoregressive (AR) TPP. A seminal work
by [
8
,
35
] connects the point processes with a neural network by
realizing conditional intensity function using a recurrent neural
network (RNN). Generally, the event history is encoded using either
recurrent encoders or set aggregation encoders [38, 39].
Conditionally independent models assume time and mark are
independent and inferred from the history vector representing
past events. This assumption makes this neural architecture com-
putationally inexpensive but hurts the predictive performance as