
FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels
(e.g., Linderman & Adams,2015). These approaches are
designed whether in continuous-time (Rasmussen,2013;
Zhang et al.,2018;Donnet et al.,2020;Sulem et al.,2021)
or in discrete-time (Mohler et al.,2013;Linderman &
Adams,2015;Zhang et al.,2018;Browning et al.,2022).
These functions allow great flexibility for the shape of the
kernel, yet this comes at the risk of poor estimation of it
when only a small amount of data is available (Xu et al.,
2017). Another approach to estimating the intensity func-
tion is to consider parametrized kernels. Although it can
introduce a potential bias by assuming a particular ker-
nel shape, this approach has several benefits. First, it re-
duces inference burden, as the parameter, say η, is typically
lower dimensional than non-parametric kernels. Moreover,
for kernels satisfying the Markov property (Bacry et al.,
2015), computing the conditional intensity function is lin-
ear in the total number of timestamps/events. The most
popular kernel belonging to this family is the exponen-
tial kernel (Ogata,1981). It is defined by η= (α, γ)7→
αγ exp(−γt), where αand γare the scaling and the de-
cay parameters, respectively (Veen & Schoenberg,2008;
Zhou et al.,2013b). However, as pointed out by Lemon-
nier & Vayatis (2014), the maximum likelihood estimator
for MHP with exponential kernels is efficient only if the
decay γis fixed. Thus, only the scaling parameter αis usu-
ally inferred. This implies that the hyperparameter γmust
be chosen in advance, usually using a grid search, a random
search, or Bayesian optimization. This leads to a computa-
tional burden when the dimension of the MHP is high. The
second option is to define a γdecay parameter common
to all kernels, which results in a loss of expressiveness of
the model. In both cases, the relevance of the exponential
kernel relies on the choice of the decay parameter, which
may not be adapted to the data (Hall & Willett,2016). For
more general parametric kernels which do not verify the
Markov property, the inference procedure with both MLE
or ℓ2loss scales poorly as they have quadratic computa-
tional scaling with the number of events, making their use
limited in practice (see e.g., Bompaire,2019, Chapter 1).
Recently, neural network-based MHP estimation has been
introduced, offering, with sufficient data, relevant models
at the cost of high computational cost (Mei & Eisner,2017;
Shchur et al.,2019;Pan et al.,2021). These limitations for
parametric and non-parametric kernels prevent their usage
in some applications, as pointed out by Carreira (2021) in
finance or Allain et al. (2021) in neuroscience. A strong
motivation for this work is also neuroscience applications.
The quantitative analysis of electrophysiological signals
such as electroencephalography (EEG) or magnetoen-
cephalography (MEG) is a challenging modern neuro-
science research topic (Cohen,2014). By giving a non-
invasive way to record human neural activity with a high
temporal resolution, EEG and MEG offer a unique oppor-
tunity to study cognitive processes as triggered by con-
trolled stimulation (Baillet,2017). Convolutional dictio-
nary learning (CDL) is an unsupervised algorithm recently
proposed to study M/EEG signals (Jas et al.,2017;Dupr´
e la
Tour et al.,2018). It consists in extracting patterns of in-
terest in M/EEG signals. It learns a combination of time-
invariant patterns – called atoms – and their activation func-
tion to reconstruct the signal sparsely. However, while
CDL recovers the local structure of signals, it does not
provide any global information, such as interactions be-
tween patterns or how their activations are affected by stim-
uli. Atoms typically correspond to transient bursts of neu-
ral activity (Sherman et al.,2016) or artifacts such as eye
blinks or heartbeats. By offering an event-based perspec-
tive on non-invasive electromagnetic brain signals, CDL
makes Hawkes processes amenable to M/EEG-based stud-
ies. Given the estimated events, one important goal is to
uncover potential temporal dependencies between external
stimuli presented to the subject and the appearance of the
atoms in the data. More precisely, one is interested in
statistically quantifying such dependencies, e.g., by esti-
mating the mean and variance of the neural response la-
tency following a stimulus. In Allain et al. (2021), the
authors address this precise problem. Their approach is
based on an EM algorithm and a Truncated Gaussian ker-
nel, which can cope with only a few brain data, as opposed
to non-parametric kernels, which are more data-hungry.
Beyond neuroscience, Carreira (2021) uses a likelihood-
based approach using exponential kernels to model order
book events. Their approach uses high-frequency trading
data, considering the latency at hand in the proposed loss.
This paper proposes a new inference method – named
FaDIn – to estimate any parametric kernels for Hawkes pro-
cesses. Our approach is based on two key features. First,
we use finite-support kernels and a discretization applied to
the ERM-inspired least-squares loss. Second, we propose
to employ some precomputations that significantly reduce
the computational cost. We then show, empirically and the-
oretically, that the implicit bias induced by the discretiza-
tion procedure is negligible compared to the statistical er-
ror. Further, we highlight the efficiency of FaDIn in com-
putation and statistical estimation over the non-parametric
approach. Finally, we demonstrate the benefit of using a
general kernel with MEG data. The flexibility of FaDIn al-
lows us to model neural response to external stimuli with
a much better-adapted kernel than the existing method de-
rived in Allain et al. (2021).
2. Fast Discretized Inference for Hawkes
processes (FaDIn)
After recalling key notions of Hawkes processes, we intro-
duce our proposed framework FaDIn.
2