FaDIn Fast Discretized Inference for Hawkes Processes with General Parametric Kernels Guillaume Staerman 1Cedric Allain 1Alexandre Gramfort1Thomas Moreau1

2025-04-27 0 0 2.33MB 23 页 10玖币

侵权投诉

FaDIn: Fast Discretized Inference for Hawkes Processes with General

Parametric Kernels

Guillaume Staerman * 1 C´

edric Allain * 1 Alexandre Gramfort 1Thomas Moreau 1

Abstract

Temporal point processes (TPP) are a natural tool

for modeling event-based data. Among all TPP

models, Hawkes processes have proven to be the

most widely used, mainly due to their adequate

modeling for various applications, particularly

when considering exponential or non-parametric

kernels. Although non-parametric kernels are an

option, such models require large datasets. While

exponential kernels are more data efﬁcient and

relevant for speciﬁc applications where events

immediately trigger more events, they are ill-

suited for applications where latencies need to be

estimated, such as in neuroscience. This work

aims to offer an efﬁcient solution to TPP infer-

ence using general parametric kernels with ﬁ-

nite support. The developed solution consists of

a fast ℓ2gradient-based solver leveraging a dis-

cretized version of the events. After theoretically

supporting the use of discretization, the statisti-

cal and computational efﬁciency of the novel ap-

proach is demonstrated through various numer-

ical experiments. Finally, the method’s effec-

tiveness is evaluated by modeling the occurrence

of stimuli-induced patterns from brain signals

recorded with magnetoencephalography (MEG).

Given the use of general parametric kernels, re-

sults show that the proposed approach leads to an

improved estimation of pattern latency than the

state-of-the-art.

1. Introduction

The statistical framework of Temporal Point Processes

(TPPs; see e.g., Daley & Vere-Jones 2003) is well adapted

for modeling event-based data. It offers a principled way

*Equal contribution 1Universit´

e Paris-Saclay, Inria, CEA,

Palaiseau, 91120, France. Correspondence to: Guillaume Staer-

man <guillaume.staerman@inria.fr>.

Proceedings of the 40 th International Conference on Machine

Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright

2023 by the author(s).

to predict the rate of events as a function of time and the

previous events’ history. TPPs are historically used to

model intervals between events, such as in renewal the-

ory, which studies the sequence of intervals between suc-

cessive replacements of a component susceptible to fail-

ure. TPPs ﬁnd many applications in neuroscience, in par-

ticular, to model single-cell recordings and neural spike

trains (Truccolo et al.,2005;Okatan et al.,2005;Kim

et al.,2011;Rad & Paninski,2011), occasionally asso-

ciated with spatial statistics (Pillow et al.,2008) or net-

work models (Galves & L¨

ocherbach,2015). Multivari-

ate Hawkes processes (MHP; Hawkes 1971) are likely the

most popular, as they can model interactions between each

univariate process. They also have the peculiarity that a

process can be self-exciting, meaning that a past event will

increase the probability of having another event in the fu-

ture on the same process. The conditional intensity func-

tion is the key quantity for TPPs. With MHP, it is composed

of a baseline parameter and kernels. It describes the proba-

bility of occurrence of an event depending on time. The

kernel function represents how processes inﬂuence each

other or themselves. The most commonly used inference

method to obtain the baseline and the kernel parameters of

MHP is the maximum likelihood (MLE; see e.g., Daley &

Vere-Jones,2007 or Lewis & Mohler,2011). One alterna-

tive and often overlooked estimation criterion is the least

squares ℓ2error, inspired by the theory of empirical risk

minimization (ERM; Reynaud-Bouret & Rivoirard 2010;

Hansen et al. 2015;Bacry et al. 2020).

A key feature of MHP modeling is the choice of ker-

nels that can be either non-parametric or parametric. In

the non-parametric setting, kernel functions are approxi-

mated by histograms (Lewis & Mohler,2011;Lemonnier

& Vayatis,2014), by a linear combination of pre-deﬁned

functions (Zhou et al.,2013a;Xu et al.,2016) or, alterna-

tively, by functions lying in a RKHS (Yang et al.,2017).

In addition to the frequentist approach, many Bayesian

approaches, such as Gibbs sampling (Ishwaran & James,

2001) or (stochastic) variational inference (Hoffman et al.,

2013), have been adapted to MHP in particular to ﬁt non-

parametric kernels. Bayesian methods also rely on the

modeling of the kernel by histograms (e.g., Donnet et al.,

2020) or by a linear combination of pre-deﬁned functions

arXiv:2210.04635v3 [stat.ML] 2 Aug 2023

FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels

(e.g., Linderman & Adams,2015). These approaches are

designed whether in continuous-time (Rasmussen,2013;

Zhang et al.,2018;Donnet et al.,2020;Sulem et al.,2021)

or in discrete-time (Mohler et al.,2013;Linderman &

Adams,2015;Zhang et al.,2018;Browning et al.,2022).

These functions allow great ﬂexibility for the shape of the

kernel, yet this comes at the risk of poor estimation of it

when only a small amount of data is available (Xu et al.,

2017). Another approach to estimating the intensity func-

tion is to consider parametrized kernels. Although it can

introduce a potential bias by assuming a particular ker-

nel shape, this approach has several beneﬁts. First, it re-

duces inference burden, as the parameter, say η, is typically

lower dimensional than non-parametric kernels. Moreover,

for kernels satisfying the Markov property (Bacry et al.,

2015), computing the conditional intensity function is lin-

ear in the total number of timestamps/events. The most

popular kernel belonging to this family is the exponen-

tial kernel (Ogata,1981). It is deﬁned by η= (α, γ)7→

αγ exp(−γt), where αand γare the scaling and the de-

cay parameters, respectively (Veen & Schoenberg,2008;

Zhou et al.,2013b). However, as pointed out by Lemon-

nier & Vayatis (2014), the maximum likelihood estimator

for MHP with exponential kernels is efﬁcient only if the

decay γis ﬁxed. Thus, only the scaling parameter αis usu-

ally inferred. This implies that the hyperparameter γmust

be chosen in advance, usually using a grid search, a random

search, or Bayesian optimization. This leads to a computa-

tional burden when the dimension of the MHP is high. The

second option is to deﬁne a γdecay parameter common

to all kernels, which results in a loss of expressiveness of

the model. In both cases, the relevance of the exponential

kernel relies on the choice of the decay parameter, which

may not be adapted to the data (Hall & Willett,2016). For

more general parametric kernels which do not verify the

Markov property, the inference procedure with both MLE

or ℓ2loss scales poorly as they have quadratic computa-

tional scaling with the number of events, making their use

limited in practice (see e.g., Bompaire,2019, Chapter 1).

Recently, neural network-based MHP estimation has been

introduced, offering, with sufﬁcient data, relevant models

at the cost of high computational cost (Mei & Eisner,2017;

Shchur et al.,2019;Pan et al.,2021). These limitations for

parametric and non-parametric kernels prevent their usage

in some applications, as pointed out by Carreira (2021) in

ﬁnance or Allain et al. (2021) in neuroscience. A strong

motivation for this work is also neuroscience applications.

The quantitative analysis of electrophysiological signals

such as electroencephalography (EEG) or magnetoen-

cephalography (MEG) is a challenging modern neuro-

science research topic (Cohen,2014). By giving a non-

invasive way to record human neural activity with a high

temporal resolution, EEG and MEG offer a unique oppor-

tunity to study cognitive processes as triggered by con-

trolled stimulation (Baillet,2017). Convolutional dictio-

nary learning (CDL) is an unsupervised algorithm recently

proposed to study M/EEG signals (Jas et al.,2017;Dupr´

e la

Tour et al.,2018). It consists in extracting patterns of in-

terest in M/EEG signals. It learns a combination of time-

invariant patterns – called atoms – and their activation func-

tion to reconstruct the signal sparsely. However, while

CDL recovers the local structure of signals, it does not

provide any global information, such as interactions be-

tween patterns or how their activations are affected by stim-

uli. Atoms typically correspond to transient bursts of neu-

ral activity (Sherman et al.,2016) or artifacts such as eye

blinks or heartbeats. By offering an event-based perspec-

tive on non-invasive electromagnetic brain signals, CDL

makes Hawkes processes amenable to M/EEG-based stud-

ies. Given the estimated events, one important goal is to

uncover potential temporal dependencies between external

stimuli presented to the subject and the appearance of the

atoms in the data. More precisely, one is interested in

statistically quantifying such dependencies, e.g., by esti-

mating the mean and variance of the neural response la-

tency following a stimulus. In Allain et al. (2021), the

authors address this precise problem. Their approach is

based on an EM algorithm and a Truncated Gaussian ker-

nel, which can cope with only a few brain data, as opposed

to non-parametric kernels, which are more data-hungry.

Beyond neuroscience, Carreira (2021) uses a likelihood-

based approach using exponential kernels to model order

book events. Their approach uses high-frequency trading

data, considering the latency at hand in the proposed loss.

This paper proposes a new inference method – named

FaDIn – to estimate any parametric kernels for Hawkes pro-

cesses. Our approach is based on two key features. First,

we use ﬁnite-support kernels and a discretization applied to

the ERM-inspired least-squares loss. Second, we propose

to employ some precomputations that signiﬁcantly reduce

the computational cost. We then show, empirically and the-

oretically, that the implicit bias induced by the discretiza-

tion procedure is negligible compared to the statistical er-

ror. Further, we highlight the efﬁciency of FaDIn in com-

putation and statistical estimation over the non-parametric

approach. Finally, we demonstrate the beneﬁt of using a

general kernel with MEG data. The ﬂexibility of FaDIn al-

lows us to model neural response to external stimuli with

a much better-adapted kernel than the existing method de-

rived in Allain et al. (2021).

2. Fast Discretized Inference for Hawkes

processes (FaDIn)

After recalling key notions of Hawkes processes, we intro-

duce our proposed framework FaDIn.

FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels

2.1. Hawkes processes

Given a stopping time T∈R+and an observation pe-

riod [0, T ], a temporal point process (TPP) is a stochas-

tic process whose realization consists of a set of distinct

timestamps FT={tn, tn∈[0, T ]}occurring in continu-

ous time. The behavior of a TPP is fully characterized by its

intensity function that corresponds to the expected inﬁnites-

imal rate at which events are occurring at time t∈[0, T ].

The values of this function may depend on time (e.g., inho-

mogeneous Poisson processes) or rely on past events such

as self-exciting processes (see Daley & Vere-Jones 2003 for

an excellent account of TPP). For the latter, the occurrence

of one event will modify the probability of having a new

event in the near future. The conditional intensity function

λ: [0, T ]→R+has the following form:

λ(t|Ft):= lim

dt→0

P(Nt+dt−Nt= 1|Ft)

dt,

where Nt:=Pn≥11tn≤tis the counting process asso-

ciated to the PP. Among this family, Multivariate Hawkes

processes (MHP; Hawkes,1971) model the interactions of

p∈N∗self-exciting TPPs. Given psets of timestamps

T={ti

n, ti

n∈[0, T ]}Ni

n=1, i = 1, . . . , p , each process is

described by the following intensity function:

λi(t) = µi+

j=1 Zt

ϕij (t−s) dNj

s,(1)

where µiis the baseline parameter, Nt= [N1

t, . . . , Np

t]the

associated multivariate counting process and ϕij : [0, T ]→

R+the excitation function – called kernel – representing

the inﬂuence of j-th process’ past events onto i-th process’

future events. From an inference perspective, the goal is to

estimate the baseline and kernels associated with the MHP

from the data. In this paper, we focus on the ERM-inspired

least squares loss. Assuming a class of parametric kernel

parametrized by η, the objective is to ﬁnd parameters that

minimize (see e.g., Eq. (I.2) in Bompaire,2019, Chapter 1):

L(θ, FT) = 1

i=1 

ZT

λi(s)2ds−2X

n∈Fi

λiti

n

,

(2)

where NT=Pp

i=1 Ni

Tis the total number of timestamps,

and where θ:= (µ, η). Interestingly, when used with an

exponential kernel, this loss beneﬁts from some precom-

putations of complexity O(NT), making the subsequent it-

erative optimization procedure independent of NT. This

computational ease is the main advantage of the loss L

over the log-likelihood function. However, when using a

general parametric kernel, these precomputations require

O((NT)2)operations killing the computational beneﬁt of

the ℓ2loss Lover the log-likelihood. It is worth noting

that this loss differs from the quadratic error minimized be-

tween the counting processes and the integral of the inten-

sity function, as used in Wang et al. (2016); Eichler et al.

(2017) and Xu et al. (2018).

2.2. FaDIn

The approach we propose in this paper ﬁlls the need for

general parametric kernels in many applications. We pro-

vide a computationally and statistically efﬁcient solver –

coined FaDIn – that works with many parametric kernels

using gradient-based algorithms. Precisely, it relies on

three key ideas: (i) the use of parametric ﬁnite-support ker-

nels, (ii) a discretization of the time interval [0, T ], and (iii)

precomputations allowing an efﬁcient optimization proce-

dure detailed below.

Finite support kernels A core bottleneck for MLE or ℓ2

estimation of parametric kernels is the need to compute the

intensity function for all events. For general kernels, the

intensity function usually requires O((NT)2)operations,

which makes it intractable for long-time-length processes.

To make this computation more efﬁcient, we consider ﬁnite

support kernels. Using a ﬁnite support kernel amounts to

setting a limit in time for the inﬂuence of a past event on the

intensity, i.e., ∀t /∈[0 , W ], ϕij (t)=0, where Wdenotes

the length of the kernel’s support. This assumption matches

applications where an event cannot have inﬂuence far in

the future, such as in neuroscience (Krumin et al.,2010;

Eichler et al.,2017;Allain et al.,2021), genetics (Reynaud-

Bouret & Schbath,2010) or high-frequency trading (Bacry

et al.,2015;Carreira,2021). The intensity function (1) can

then be reformulated as a convolution between the kernel

ϕij and the sum of Dirac functions zi(t) = Pti

n∈Fi

tδti

n(t)

located at the event occurrences ti

λi(t) = µi+

j=1

ϕij ∗zj(t), t ∈[0 , T ].

As ϕij has ﬁnite support, the intensity can be computed

efﬁciently with this formula. Indeed, only events in the

interval [t−W , t]need to be considered. See Section B.2

for more details.

Discretization To make these computations even more ef-

ﬁcient, we propose to rely on discretized processes. Most

Hawkes processes estimation procedures involve a contin-

uous paradigm to minimize (2) or its log-likelihood coun-

terpart. Discretization has been investigated so far for non-

parametric kernels (Kirchner,2016;Kirchner & Bercher,

2018;Kurisu,2016). The discretization of a TPP con-

sists in projecting each event ti

non a regular grid G=

{0,∆,2∆, . . . , G∆}, where G∆ = T. We refer to ∆as

the stepsize of the discretization. Let f

Tbe the set of pro-

jected timestamps of Fi

Ton the grid G. The intensity func-

FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels

tion of the i-th process of our discretized MHP is deﬁned

as:

λi[s] = µi+

j=1 X

m∈

s∆

ϕij (s∆−˜

=µi+

j=1

τ=1

ϕ∆

ij [τ]zj[s−τ]

| {z }

(ϕ∆

ij ∗zj)[s]

, s ∈J0, GK,(3)

where L=W

∆denotes the number of points on the

discretized support, ϕ∆

ij [s] = ϕij (s∆) is the kernel value

on the grid and zi[s]=#|ti

n−s∆| ≤ ∆

2denotes the

number of events projected on the grid timestamp s. Here

⌊·⌋ denotes the ﬂoor function. From now and through-

out the rest of the paper, we denote ϕij (·) : R+→R+

as a function while ϕ∆

ij [·]represents the discrete vector

ϕ∆

ij ∈RL

+. Compared to the continuous formulation, the

intensity function can be computed more efﬁciently as one

can rely on discrete convolutions, whose worst-case com-

plexity scales as O(NTL). It can also be further acceler-

ated using Fast Fourier Transform when NTis large. An-

other beneﬁt of the discretization is that for kernels whose

values are costly to compute, at most Lvalues need to

be calculated. This can have a strong computational im-

pact when NT≫Las all values can be precomputed and

stored.

While discretization improves the computational efﬁ-

ciency, it also introduces a bias in the computation of the

intensity function and, thus potentially, in estimating the

kernel parameters. The impact of the discretization on the

estimation is considered in Section 2.3 and Section 3.1.

Note that this bias is similar to the one incurred by quantiz-

ing the kernel as histograms for non-parametric estimators.

Loss and precomputations FaDIn aims at minimizing the

discretized ℓ2loss, which approximates the integral on the

left part of (2) by a sum on the grid Gafter projecting times-

tamps of FTon it. It boils down to optimizing the follow-

ing loss LGθ, f

FTdeﬁned as:

i=1 

∆X

s∈J0,GK˜

λi[s]2−2X

n∈

λi˜

∆

.(4)

To ﬁnd the parameters of the intensity function θ, FaDIn

minimizes LGusing a ﬁrst-order gradient-based algorithm.

The computational bottleneck of the proposed algorithm is

thus the computation of the gradient ∇LGregarding param-

eters θ. Using the discretized ﬁnite-support kernel, this gra-

dient can be computed using convolution, giving the same

computational complexity as the computation of the inten-

sity function O(NTL). However, gradient computation can

still be too expensive for long processes with many events

to get reasonable inference times. Using the least squares

error of the process (4), one can further reduce the com-

plexity of computing the gradient by precomputing some

constants Φj(τ;G),Ψj,k(τ, τ ′;G)and Φj(τ;f

T)that do

not depend on the parameter θ. Indeed, by developing and

rearranging the terms in (4), one obtains:

NTLGθ, f

FT=

(T+ ∆)

i=1

µ2

i+ 2∆

i=1

µi

j=1

τ=1

ϕ∆

ij [τ] G

s=1

zj[s−τ]!

| {z }

Φj(τ;G)

+ ∆ X

i,j,k

τ=1

τ′=1

ϕ∆

ij [τ]ϕ∆

ik[τ′] G

s=1

zj[s−τ]zk[s−τ′]!

| {z }

Ψj,k (τ,τ ′;G)

−2 p

i=1

Tµi+X

i,j

τ=1

ϕ∆

ij [τ] X

n∈

zj˜

∆−τ!

| {z }

Φj(τ;

The term Ψj,k(τ, τ ′;G)dominates the computational cost

of our precomputations. It requires O(G)operations for

each tuples (τ, τ′)and (j, k). Thus, it has a total complex-

ity of O(p2L2G)and is the bottleneck of the precomputa-

tion phase. For any m∈ {1, . . . , p}, the gradient of the

loss w.r.t. the baseline parameter is given by:

∂LG

∂µm

= 2(T+∆)µm−2Nm

T+2∆

j=1

τ=1

ϕ∆

mj [τ]Φj(τ;G).

For any tuple (m, l)∈ {1, . . . , p}2, the gradient of ηml is:

∂LG

∂ηml

= 2∆µm

τ=1

∂ϕ∆

m,l[τ]

∂ηm,l

Φl(τ;G)

+ 2∆

k=1

τ=1

τ′=1

ϕ∆

mk[τ′]∂ϕ∆

m,l[τ]

∂ηm,l

Ψl,k(τ, τ ′;G)

−2

τ=1

∂ϕ∆

m,l[τ]

∂ηm,l

Φl(τ;f

T).

Gradients of kernel parameters dominate the computational

cost of gradients. The complexity is of O(pL2)for each

kernel parameter, leading to a total complexity of O(p3L2)

and is independent of the number of events NT. Thus, a

trade-off can be made between the precision of the method

and its computational efﬁciency when varying the size of

the kernel’s support or the discretization.

Remark 2.1. The primary motivation for the ℓ2loss is

the presence of terms that can be precomputed in contrast

to the log-likelihood (Reynaud-Bouret & Rivoirard,2010;

Reynaud-Bouret et al.,2014;Bacry et al.,2020). A com-

parison is performed in Section B.1.

FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels

Optimization The inference is then conducted using gra-

dient descent for the ℓ2loss LG. FaDIn thus allows for very

general parametric kernels, as exact gradients for each pa-

rameter involved in the kernels can be derived efﬁciently

as long as the kernel is differentiable and has ﬁnite support.

Gradient-based optimization algorithms can, therefore, be

used without limitation, in contrast with the EM algorithm,

which requires a close-form solution to zero the gradient,

which is difﬁcult for many kernels. A critical remark is that

the problem is generally non-convex and may converge to

a local minimum.

2.3. Impact of the discretization

While discretization allows for efﬁcient computations, it

also introduces a perturbation in the loss value. In this sec-

tion, we quantify the impact of this perturbation on the pa-

rameter estimation when ∆goes to 0. Through this section,

we observe a process FTwhose intensity function is given

by the parametric form λ(·;θ∗). Note that if the process

FT’s intensity is not in the parametric family λ(·;θ),θ∗is

deﬁned as the best approximation of its intensity function

in the ℓ2sense. The goal of the inference process is thus to

recover the parameters θ∗.

When working with the discrete process f

FT, the events ti

of the original process are replaced with a projection on a

grid ˜

n=ti

n+δi

n. Here, δi

nis uniformly distributed on

[−∆/2,∆/2]. We consider the discrete FaDIn estimator

θ∆deﬁned as b

θ∆= arg min θLG(θ). We can upper-bound

the error incurred by b

θ∆by the decomposition:



b

θ∆−θ∗



2≤



b

θc−θ∗



2

| {z }

(∗)

+



b

θ∆−b

θc



2

| {z }

(∗∗)

,(5)

where b

θc= arg minθL(θ)is the reference estimator for

θ∗based on the standard ℓ2estimator for continuous point

processes. This decomposition involves the statistical error

(∗)and the bias error (∗∗)induced by the discretization.

The statistical term measures how far the parameters ob-

tained by minimizing the ℓ2continuous loss having access

to a ﬁnite amount of data are from the true ones. In contrast,

the term (∗∗)represents the discretization bias induced by

minimizing the discrete loss (4) instead of the continuous

one (2). In the following proposition, we focus on the dis-

cretization error (∗∗), which is related to the computational

trade-off offered by our method and not on the statistical

error of the continuous ℓ2estimator (∗∗). Our work show-

cases that this disregarded estimator can be efﬁciently com-

puted, and we hope it will promote research to describe its

asymptotic behavior. We now study the perturbation of the

loss due to discretization.

Proposition 2.2. Let FTand f

FTbe respectively a MHP

process and its discretized version on a grid Gwith step-

size ∆. Assume that the intensity function of FTpos-

sesses continuously differentiable ﬁnite support kernels on

[0, W ]. Thus, assuming ∆<minti

n,tj

m∈FT|ti

n−tj

m|, for

any i∈J1, pK, it holds:

λi[s] = λi(s∆)−

j=1 X

m∈Fj

s∆

δj

∂ϕij

∂t (s∆−tj

m;θ)+O(∆2),

and

LG(θ) = 2

NTX

i,j X

n∈Fi

m∈Fj

s∆

(δj

m−δi

n)∂ϕij

∂t (ti

n−tj

m;θ)

+L(θ)+∆.h(θ) + O(∆2).

The technical proof is deferred to Section A.1 in the Ap-

pendix. The ﬁrst result is a direct application of the Taylor

expansion of the intensity for the kernels. For the loss, the

ﬁrst perturbation term ∆.h(θ)comes from approximating

the integral with a ﬁnite Euler sum (Tasaki,2009) while

the second one derives from the perturbation of the inten-

sity. This proposition shows that, as the discretization step

∆goes to 0, the perturbed intensity and ℓ2loss are good es-

timates of their continuous counterpart. We now quantify

the discretization error (∗∗)as ∆goes to 0.

Proposition 2.3. We consider the same assumption as

in Proposition 2.2. Then, if the estimators b

θc=

arg minθL(θ)and b

θ∆= arg minθLG(θ)are uniquely de-

ﬁned, b

θ∆converges to b

θcas ∆→0. Moreover, if Lis C2

and its hessian ∇2Lb

θcis positive deﬁnite with ε > 0its

smallest eigenvalue, then 



b

θ∆−b

θc



2≤∆

εgb

θ∆, with

gb

θ∆=O(1).

This proposition shows that asymptotically on ∆, the es-

timator b

θ∆is equivalent to b

θc. It also shows that the dis-

crete estimator converges to the continuous one at the same

speed as ∆decreases. This is conﬁrmed experimentally by

results shown in Figure B.6 in the Appendix. Thus, one

would need to select ∆so that the discretization error is

small compared to the statistical one. Notice that assump-

tions from Proposition 2.3 are not too restrictive. Indeed,

they require the existence of a unique minimizer of L,LG

and L. Moreover, if Lis C2in b

θc, the previous hypothesis

also implies the strong local convexity at this point.

3. Numerical experiments

We present various synthetic data experiments to support

the advantages of the proposed approach. To begin, we

investigate the bias induced by the discretization in Sec-

tion 3.1. Afterwards, the statistical and computational efﬁ-

ciency of FaDIn is highlighted through a benchmark with

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FaDIn:FastDiscretizedInferenceforHawkesProcesseswithGeneralParametricKernelsGuillaumeStaerman*1C´edricAllain*1AlexandreGramfort1ThomasMoreau1AbstractTemporalpointprocesses(TPP)areanaturaltoolformodelingevent-baseddata.AmongallTPPmodels,Hawkesprocesseshaveproventobethemostwidelyused,mainlyduetotheira...

展开>> 收起<<

FaDIn Fast Discretized Inference for Hawkes Processes with General Parametric Kernels Guillaume Staerman 1Cedric Allain 1Alexandre Gramfort1Thomas Moreau1.pdf

共23页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

FaDIn Fast Discretized Inference for Hawkes Processes with General Parametric Kernels Guillaume Staerman 1Cedric Allain 1Alexandre Gramfort1Thomas Moreau1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: