FaDIn Fast Discretized Inference for Hawkes Processes with General Parametric Kernels Guillaume Staerman 1Cedric Allain 1Alexandre Gramfort1Thomas Moreau1

2025-04-27 0 0 2.33MB 23 页 10玖币
侵权投诉
FaDIn: Fast Discretized Inference for Hawkes Processes with General
Parametric Kernels
Guillaume Staerman * 1 C´
edric Allain * 1 Alexandre Gramfort 1Thomas Moreau 1
Abstract
Temporal point processes (TPP) are a natural tool
for modeling event-based data. Among all TPP
models, Hawkes processes have proven to be the
most widely used, mainly due to their adequate
modeling for various applications, particularly
when considering exponential or non-parametric
kernels. Although non-parametric kernels are an
option, such models require large datasets. While
exponential kernels are more data efficient and
relevant for specific applications where events
immediately trigger more events, they are ill-
suited for applications where latencies need to be
estimated, such as in neuroscience. This work
aims to offer an efficient solution to TPP infer-
ence using general parametric kernels with fi-
nite support. The developed solution consists of
a fast 2gradient-based solver leveraging a dis-
cretized version of the events. After theoretically
supporting the use of discretization, the statisti-
cal and computational efficiency of the novel ap-
proach is demonstrated through various numer-
ical experiments. Finally, the method’s effec-
tiveness is evaluated by modeling the occurrence
of stimuli-induced patterns from brain signals
recorded with magnetoencephalography (MEG).
Given the use of general parametric kernels, re-
sults show that the proposed approach leads to an
improved estimation of pattern latency than the
state-of-the-art.
1. Introduction
The statistical framework of Temporal Point Processes
(TPPs; see e.g., Daley & Vere-Jones 2003) is well adapted
for modeling event-based data. It offers a principled way
*Equal contribution 1Universit´
e Paris-Saclay, Inria, CEA,
Palaiseau, 91120, France. Correspondence to: Guillaume Staer-
man <guillaume.staerman@inria.fr>.
Proceedings of the 40 th International Conference on Machine
Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright
2023 by the author(s).
to predict the rate of events as a function of time and the
previous events’ history. TPPs are historically used to
model intervals between events, such as in renewal the-
ory, which studies the sequence of intervals between suc-
cessive replacements of a component susceptible to fail-
ure. TPPs find many applications in neuroscience, in par-
ticular, to model single-cell recordings and neural spike
trains (Truccolo et al.,2005;Okatan et al.,2005;Kim
et al.,2011;Rad & Paninski,2011), occasionally asso-
ciated with spatial statistics (Pillow et al.,2008) or net-
work models (Galves & L¨
ocherbach,2015). Multivari-
ate Hawkes processes (MHP; Hawkes 1971) are likely the
most popular, as they can model interactions between each
univariate process. They also have the peculiarity that a
process can be self-exciting, meaning that a past event will
increase the probability of having another event in the fu-
ture on the same process. The conditional intensity func-
tion is the key quantity for TPPs. With MHP, it is composed
of a baseline parameter and kernels. It describes the proba-
bility of occurrence of an event depending on time. The
kernel function represents how processes influence each
other or themselves. The most commonly used inference
method to obtain the baseline and the kernel parameters of
MHP is the maximum likelihood (MLE; see e.g., Daley &
Vere-Jones,2007 or Lewis & Mohler,2011). One alterna-
tive and often overlooked estimation criterion is the least
squares 2error, inspired by the theory of empirical risk
minimization (ERM; Reynaud-Bouret & Rivoirard 2010;
Hansen et al. 2015;Bacry et al. 2020).
A key feature of MHP modeling is the choice of ker-
nels that can be either non-parametric or parametric. In
the non-parametric setting, kernel functions are approxi-
mated by histograms (Lewis & Mohler,2011;Lemonnier
& Vayatis,2014), by a linear combination of pre-defined
functions (Zhou et al.,2013a;Xu et al.,2016) or, alterna-
tively, by functions lying in a RKHS (Yang et al.,2017).
In addition to the frequentist approach, many Bayesian
approaches, such as Gibbs sampling (Ishwaran & James,
2001) or (stochastic) variational inference (Hoffman et al.,
2013), have been adapted to MHP in particular to fit non-
parametric kernels. Bayesian methods also rely on the
modeling of the kernel by histograms (e.g., Donnet et al.,
2020) or by a linear combination of pre-defined functions
1
arXiv:2210.04635v3 [stat.ML] 2 Aug 2023
FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels
(e.g., Linderman & Adams,2015). These approaches are
designed whether in continuous-time (Rasmussen,2013;
Zhang et al.,2018;Donnet et al.,2020;Sulem et al.,2021)
or in discrete-time (Mohler et al.,2013;Linderman &
Adams,2015;Zhang et al.,2018;Browning et al.,2022).
These functions allow great flexibility for the shape of the
kernel, yet this comes at the risk of poor estimation of it
when only a small amount of data is available (Xu et al.,
2017). Another approach to estimating the intensity func-
tion is to consider parametrized kernels. Although it can
introduce a potential bias by assuming a particular ker-
nel shape, this approach has several benefits. First, it re-
duces inference burden, as the parameter, say η, is typically
lower dimensional than non-parametric kernels. Moreover,
for kernels satisfying the Markov property (Bacry et al.,
2015), computing the conditional intensity function is lin-
ear in the total number of timestamps/events. The most
popular kernel belonging to this family is the exponen-
tial kernel (Ogata,1981). It is defined by η= (α, γ)7→
αγ exp(γt), where αand γare the scaling and the de-
cay parameters, respectively (Veen & Schoenberg,2008;
Zhou et al.,2013b). However, as pointed out by Lemon-
nier & Vayatis (2014), the maximum likelihood estimator
for MHP with exponential kernels is efficient only if the
decay γis fixed. Thus, only the scaling parameter αis usu-
ally inferred. This implies that the hyperparameter γmust
be chosen in advance, usually using a grid search, a random
search, or Bayesian optimization. This leads to a computa-
tional burden when the dimension of the MHP is high. The
second option is to define a γdecay parameter common
to all kernels, which results in a loss of expressiveness of
the model. In both cases, the relevance of the exponential
kernel relies on the choice of the decay parameter, which
may not be adapted to the data (Hall & Willett,2016). For
more general parametric kernels which do not verify the
Markov property, the inference procedure with both MLE
or 2loss scales poorly as they have quadratic computa-
tional scaling with the number of events, making their use
limited in practice (see e.g., Bompaire,2019, Chapter 1).
Recently, neural network-based MHP estimation has been
introduced, offering, with sufficient data, relevant models
at the cost of high computational cost (Mei & Eisner,2017;
Shchur et al.,2019;Pan et al.,2021). These limitations for
parametric and non-parametric kernels prevent their usage
in some applications, as pointed out by Carreira (2021) in
finance or Allain et al. (2021) in neuroscience. A strong
motivation for this work is also neuroscience applications.
The quantitative analysis of electrophysiological signals
such as electroencephalography (EEG) or magnetoen-
cephalography (MEG) is a challenging modern neuro-
science research topic (Cohen,2014). By giving a non-
invasive way to record human neural activity with a high
temporal resolution, EEG and MEG offer a unique oppor-
tunity to study cognitive processes as triggered by con-
trolled stimulation (Baillet,2017). Convolutional dictio-
nary learning (CDL) is an unsupervised algorithm recently
proposed to study M/EEG signals (Jas et al.,2017;Dupr´
e la
Tour et al.,2018). It consists in extracting patterns of in-
terest in M/EEG signals. It learns a combination of time-
invariant patterns – called atoms – and their activation func-
tion to reconstruct the signal sparsely. However, while
CDL recovers the local structure of signals, it does not
provide any global information, such as interactions be-
tween patterns or how their activations are affected by stim-
uli. Atoms typically correspond to transient bursts of neu-
ral activity (Sherman et al.,2016) or artifacts such as eye
blinks or heartbeats. By offering an event-based perspec-
tive on non-invasive electromagnetic brain signals, CDL
makes Hawkes processes amenable to M/EEG-based stud-
ies. Given the estimated events, one important goal is to
uncover potential temporal dependencies between external
stimuli presented to the subject and the appearance of the
atoms in the data. More precisely, one is interested in
statistically quantifying such dependencies, e.g., by esti-
mating the mean and variance of the neural response la-
tency following a stimulus. In Allain et al. (2021), the
authors address this precise problem. Their approach is
based on an EM algorithm and a Truncated Gaussian ker-
nel, which can cope with only a few brain data, as opposed
to non-parametric kernels, which are more data-hungry.
Beyond neuroscience, Carreira (2021) uses a likelihood-
based approach using exponential kernels to model order
book events. Their approach uses high-frequency trading
data, considering the latency at hand in the proposed loss.
This paper proposes a new inference method – named
FaDIn – to estimate any parametric kernels for Hawkes pro-
cesses. Our approach is based on two key features. First,
we use finite-support kernels and a discretization applied to
the ERM-inspired least-squares loss. Second, we propose
to employ some precomputations that significantly reduce
the computational cost. We then show, empirically and the-
oretically, that the implicit bias induced by the discretiza-
tion procedure is negligible compared to the statistical er-
ror. Further, we highlight the efficiency of FaDIn in com-
putation and statistical estimation over the non-parametric
approach. Finally, we demonstrate the benefit of using a
general kernel with MEG data. The flexibility of FaDIn al-
lows us to model neural response to external stimuli with
a much better-adapted kernel than the existing method de-
rived in Allain et al. (2021).
2. Fast Discretized Inference for Hawkes
processes (FaDIn)
After recalling key notions of Hawkes processes, we intro-
duce our proposed framework FaDIn.
2
FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels
2.1. Hawkes processes
Given a stopping time TR+and an observation pe-
riod [0, T ], a temporal point process (TPP) is a stochas-
tic process whose realization consists of a set of distinct
timestamps FT={tn, tn[0, T ]}occurring in continu-
ous time. The behavior of a TPP is fully characterized by its
intensity function that corresponds to the expected infinites-
imal rate at which events are occurring at time t[0, T ].
The values of this function may depend on time (e.g., inho-
mogeneous Poisson processes) or rely on past events such
as self-exciting processes (see Daley & Vere-Jones 2003 for
an excellent account of TPP). For the latter, the occurrence
of one event will modify the probability of having a new
event in the near future. The conditional intensity function
λ: [0, T ]R+has the following form:
λ(t|Ft):= lim
dt0
P(Nt+dtNt= 1|Ft)
dt,
where Nt:=Pn11tntis the counting process asso-
ciated to the PP. Among this family, Multivariate Hawkes
processes (MHP; Hawkes,1971) model the interactions of
pNself-exciting TPPs. Given psets of timestamps
Fi
T={ti
n, ti
n[0, T ]}Ni
T
n=1, i = 1, . . . , p , each process is
described by the following intensity function:
λi(t) = µi+
p
X
j=1 Zt
0
ϕij (ts) dNj
s,(1)
where µiis the baseline parameter, Nt= [N1
t, . . . , Np
t]the
associated multivariate counting process and ϕij : [0, T ]
R+the excitation function – called kernel – representing
the influence of j-th process’ past events onto i-th process’
future events. From an inference perspective, the goal is to
estimate the baseline and kernels associated with the MHP
from the data. In this paper, we focus on the ERM-inspired
least squares loss. Assuming a class of parametric kernel
parametrized by η, the objective is to find parameters that
minimize (see e.g., Eq. (I.2) in Bompaire,2019, Chapter 1):
L(θ, FT) = 1
NT
p
X
i=1
ZT
0
λi(s)2ds2X
ti
nFi
T
λiti
n
,
(2)
where NT=Pp
i=1 Ni
Tis the total number of timestamps,
and where θ:= (µ, η). Interestingly, when used with an
exponential kernel, this loss benefits from some precom-
putations of complexity O(NT), making the subsequent it-
erative optimization procedure independent of NT. This
computational ease is the main advantage of the loss L
over the log-likelihood function. However, when using a
general parametric kernel, these precomputations require
O((NT)2)operations killing the computational benefit of
the 2loss Lover the log-likelihood. It is worth noting
that this loss differs from the quadratic error minimized be-
tween the counting processes and the integral of the inten-
sity function, as used in Wang et al. (2016); Eichler et al.
(2017) and Xu et al. (2018).
2.2. FaDIn
The approach we propose in this paper fills the need for
general parametric kernels in many applications. We pro-
vide a computationally and statistically efficient solver –
coined FaDIn – that works with many parametric kernels
using gradient-based algorithms. Precisely, it relies on
three key ideas: (i) the use of parametric finite-support ker-
nels, (ii) a discretization of the time interval [0, T ], and (iii)
precomputations allowing an efficient optimization proce-
dure detailed below.
Finite support kernels A core bottleneck for MLE or 2
estimation of parametric kernels is the need to compute the
intensity function for all events. For general kernels, the
intensity function usually requires O((NT)2)operations,
which makes it intractable for long-time-length processes.
To make this computation more efficient, we consider finite
support kernels. Using a finite support kernel amounts to
setting a limit in time for the influence of a past event on the
intensity, i.e., t /[0 , W ], ϕij (t)=0, where Wdenotes
the length of the kernel’s support. This assumption matches
applications where an event cannot have influence far in
the future, such as in neuroscience (Krumin et al.,2010;
Eichler et al.,2017;Allain et al.,2021), genetics (Reynaud-
Bouret & Schbath,2010) or high-frequency trading (Bacry
et al.,2015;Carreira,2021). The intensity function (1) can
then be reformulated as a convolution between the kernel
ϕij and the sum of Dirac functions zi(t) = Pti
nFi
tδti
n(t)
located at the event occurrences ti
n:
λi(t) = µi+
p
X
j=1
ϕij zj(t), t [0 , T ].
As ϕij has finite support, the intensity can be computed
efficiently with this formula. Indeed, only events in the
interval [tW , t]need to be considered. See Section B.2
for more details.
Discretization To make these computations even more ef-
ficient, we propose to rely on discretized processes. Most
Hawkes processes estimation procedures involve a contin-
uous paradigm to minimize (2) or its log-likelihood coun-
terpart. Discretization has been investigated so far for non-
parametric kernels (Kirchner,2016;Kirchner & Bercher,
2018;Kurisu,2016). The discretization of a TPP con-
sists in projecting each event ti
non a regular grid G=
{0,,2∆, . . . , G}, where G∆ = T. We refer to as
the stepsize of the discretization. Let f
Fi
Tbe the set of pro-
jected timestamps of Fi
Ton the grid G. The intensity func-
3
FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels
tion of the i-th process of our discretized MHP is defined
as:
˜
λi[s] = µi+
p
X
j=1 X
˜
tj
m
e
Fj
s
ϕij (s˜
tj
m)
=µi+
p
X
j=1
L
X
τ=1
ϕ
ij [τ]zj[sτ]
| {z }
(ϕ
ij zj)[s]
, s J0, GK,(3)
where L=W
denotes the number of points on the
discretized support, ϕ
ij [s] = ϕij (s∆) is the kernel value
on the grid and zi[s]=#|ti
ns| ≤
2denotes the
number of events projected on the grid timestamp s. Here
⌊·⌋ denotes the floor function. From now and through-
out the rest of the paper, we denote ϕij (·) : R+R+
as a function while ϕ
ij [·]represents the discrete vector
ϕ
ij RL
+. Compared to the continuous formulation, the
intensity function can be computed more efficiently as one
can rely on discrete convolutions, whose worst-case com-
plexity scales as O(NTL). It can also be further acceler-
ated using Fast Fourier Transform when NTis large. An-
other benefit of the discretization is that for kernels whose
values are costly to compute, at most Lvalues need to
be calculated. This can have a strong computational im-
pact when NTLas all values can be precomputed and
stored.
While discretization improves the computational effi-
ciency, it also introduces a bias in the computation of the
intensity function and, thus potentially, in estimating the
kernel parameters. The impact of the discretization on the
estimation is considered in Section 2.3 and Section 3.1.
Note that this bias is similar to the one incurred by quantiz-
ing the kernel as histograms for non-parametric estimators.
Loss and precomputations FaDIn aims at minimizing the
discretized 2loss, which approximates the integral on the
left part of (2) by a sum on the grid Gafter projecting times-
tamps of FTon it. It boils down to optimizing the follow-
ing loss LGθ, f
FTdefined as:
1
NT
p
X
i=1
X
sJ0,GK˜
λi[s]22X
˜
ti
n
e
Fi
T
˜
λi˜
ti
n
.(4)
To find the parameters of the intensity function θ, FaDIn
minimizes LGusing a first-order gradient-based algorithm.
The computational bottleneck of the proposed algorithm is
thus the computation of the gradient ∇LGregarding param-
eters θ. Using the discretized finite-support kernel, this gra-
dient can be computed using convolution, giving the same
computational complexity as the computation of the inten-
sity function O(NTL). However, gradient computation can
still be too expensive for long processes with many events
to get reasonable inference times. Using the least squares
error of the process (4), one can further reduce the com-
plexity of computing the gradient by precomputing some
constants Φj(τ;G),Ψj,k(τ, τ ;G)and Φj(τ;f
Fi
T)that do
not depend on the parameter θ. Indeed, by developing and
rearranging the terms in (4), one obtains:
NTLGθ, f
FT=
(T+ ∆)
p
X
i=1
µ2
i+ 2∆
p
X
i=1
µi
p
X
j=1
L
X
τ=1
ϕ
ij [τ] G
X
s=1
zj[sτ]!
| {z }
Φj(τ;G)
+ ∆ X
i,j,k
L
X
τ=1
L
X
τ=1
ϕ
ij [τ]ϕ
ik[τ] G
X
s=1
zj[sτ]zk[sτ]!
| {z }
Ψj,k (τ;G)
2 p
X
i=1
Ni
Tµi+X
i,j
L
X
τ=1
ϕ
ij [τ] X
˜
ti
n
f
Fi
T
zj˜
ti
n
τ!
| {z }
Φj(τ;
f
Fi
T)
!.
The term Ψj,k(τ, τ ;G)dominates the computational cost
of our precomputations. It requires O(G)operations for
each tuples (τ, τ)and (j, k). Thus, it has a total complex-
ity of O(p2L2G)and is the bottleneck of the precomputa-
tion phase. For any m∈ {1, . . . , p}, the gradient of the
loss w.r.t. the baseline parameter is given by:
NT
LG
µm
= 2(T+∆)µm2Nm
T+2∆
p
X
j=1
L
X
τ=1
ϕ
mj [τj(τ;G).
For any tuple (m, l)∈ {1, . . . , p}2, the gradient of ηml is:
NT
LG
ηml
= 2∆µm
L
X
τ=1
ϕ
m,l[τ]
ηm,l
Φl(τ;G)
+ 2∆
p
X
k=1
L
X
τ=1
L
X
τ=1
ϕ
mk[τ]ϕ
m,l[τ]
ηm,l
Ψl,k(τ, τ ;G)
2
L
X
τ=1
ϕ
m,l[τ]
ηm,l
Φl(τ;f
Fm
T).
Gradients of kernel parameters dominate the computational
cost of gradients. The complexity is of O(pL2)for each
kernel parameter, leading to a total complexity of O(p3L2)
and is independent of the number of events NT. Thus, a
trade-off can be made between the precision of the method
and its computational efficiency when varying the size of
the kernel’s support or the discretization.
Remark 2.1. The primary motivation for the 2loss is
the presence of terms that can be precomputed in contrast
to the log-likelihood (Reynaud-Bouret & Rivoirard,2010;
Reynaud-Bouret et al.,2014;Bacry et al.,2020). A com-
parison is performed in Section B.1.
4
FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels
Optimization The inference is then conducted using gra-
dient descent for the 2loss LG. FaDIn thus allows for very
general parametric kernels, as exact gradients for each pa-
rameter involved in the kernels can be derived efficiently
as long as the kernel is differentiable and has finite support.
Gradient-based optimization algorithms can, therefore, be
used without limitation, in contrast with the EM algorithm,
which requires a close-form solution to zero the gradient,
which is difficult for many kernels. A critical remark is that
the problem is generally non-convex and may converge to
a local minimum.
2.3. Impact of the discretization
While discretization allows for efficient computations, it
also introduces a perturbation in the loss value. In this sec-
tion, we quantify the impact of this perturbation on the pa-
rameter estimation when goes to 0. Through this section,
we observe a process FTwhose intensity function is given
by the parametric form λ(·;θ). Note that if the process
FTs intensity is not in the parametric family λ(·;θ),θis
defined as the best approximation of its intensity function
in the 2sense. The goal of the inference process is thus to
recover the parameters θ.
When working with the discrete process f
FT, the events ti
n
of the original process are replaced with a projection on a
grid ˜
ti
n=ti
n+δi
n. Here, δi
nis uniformly distributed on
[/2,/2]. We consider the discrete FaDIn estimator
b
θdefined as b
θ= arg min θLG(θ). We can upper-bound
the error incurred by b
θby the decomposition:
b
θθ
2
b
θcθ
2
| {z }
()
+
b
θb
θc
2
| {z }
(∗∗)
,(5)
where b
θc= arg minθL(θ)is the reference estimator for
θbased on the standard 2estimator for continuous point
processes. This decomposition involves the statistical error
()and the bias error (∗∗)induced by the discretization.
The statistical term measures how far the parameters ob-
tained by minimizing the 2continuous loss having access
to a finite amount of data are from the true ones. In contrast,
the term (∗∗)represents the discretization bias induced by
minimizing the discrete loss (4) instead of the continuous
one (2). In the following proposition, we focus on the dis-
cretization error (∗∗), which is related to the computational
trade-off offered by our method and not on the statistical
error of the continuous 2estimator (∗∗). Our work show-
cases that this disregarded estimator can be efficiently com-
puted, and we hope it will promote research to describe its
asymptotic behavior. We now study the perturbation of the
loss due to discretization.
Proposition 2.2. Let FTand f
FTbe respectively a MHP
process and its discretized version on a grid Gwith step-
size . Assume that the intensity function of FTpos-
sesses continuously differentiable finite support kernels on
[0, W ]. Thus, assuming <minti
n,tj
mFT|ti
ntj
m|, for
any iJ1, pK, it holds:
e
λi[s] = λi(s∆)
p
X
j=1 X
tj
mFj
s
δj
m
ϕij
t (stj
m;θ)+O(∆2),
and
LG(θ) = 2
NTX
i,j X
ti
nFi
T
tj
mFj
s
(δj
mδi
n)ϕij
t (ti
ntj
m;θ)
+L(θ)+∆.h(θ) + O(∆2).
The technical proof is deferred to Section A.1 in the Ap-
pendix. The first result is a direct application of the Taylor
expansion of the intensity for the kernels. For the loss, the
first perturbation term .h(θ)comes from approximating
the integral with a finite Euler sum (Tasaki,2009) while
the second one derives from the perturbation of the inten-
sity. This proposition shows that, as the discretization step
goes to 0, the perturbed intensity and 2loss are good es-
timates of their continuous counterpart. We now quantify
the discretization error (∗∗)as goes to 0.
Proposition 2.3. We consider the same assumption as
in Proposition 2.2. Then, if the estimators b
θc=
arg minθL(θ)and b
θ= arg minθLG(θ)are uniquely de-
fined, b
θconverges to b
θcas 0. Moreover, if Lis C2
and its hessian 2Lb
θcis positive definite with ε > 0its
smallest eigenvalue, then
b
θb
θc
2
εgb
θ, with
gb
θ=O(1).
This proposition shows that asymptotically on , the es-
timator b
θis equivalent to b
θc. It also shows that the dis-
crete estimator converges to the continuous one at the same
speed as decreases. This is confirmed experimentally by
results shown in Figure B.6 in the Appendix. Thus, one
would need to select so that the discretization error is
small compared to the statistical one. Notice that assump-
tions from Proposition 2.3 are not too restrictive. Indeed,
they require the existence of a unique minimizer of L,LG
and L. Moreover, if Lis C2in b
θc, the previous hypothesis
also implies the strong local convexity at this point.
3. Numerical experiments
We present various synthetic data experiments to support
the advantages of the proposed approach. To begin, we
investigate the bias induced by the discretization in Sec-
tion 3.1. Afterwards, the statistical and computational effi-
ciency of FaDIn is highlighted through a benchmark with
5
摘要:

FaDIn:FastDiscretizedInferenceforHawkesProcesseswithGeneralParametricKernelsGuillaumeStaerman*1C´edricAllain*1AlexandreGramfort1ThomasMoreau1AbstractTemporalpointprocesses(TPP)areanaturaltoolformodelingevent-baseddata.AmongallTPPmodels,Hawkesprocesseshaveproventobethemostwidelyused,mainlyduetotheira...

展开>> 收起<<
FaDIn Fast Discretized Inference for Hawkes Processes with General Parametric Kernels Guillaume Staerman 1Cedric Allain 1Alexandre Gramfort1Thomas Moreau1.pdf

共23页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:23 页 大小:2.33MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 23
客服
关注