
rameters of the Hawkes process correspond to the
linguistic influence of each paper, aggregated over
thousands of linguistic changes. The changes them-
selves are identified through analysis of contex-
tual embeddings, with the goal of finding words
whose meaning has shifted over time (Traugott and
Dasher,2001). Though there are several compu-
tational methods to detect semantic changes (e.g.,
Kim et al.,2014;Hamilton et al.,2016;Rosenfeld
and Erk,2018;Dubossarsky et al.,2019), including
methods based on contextual embeddings (e.g., Ku-
tuzov and Giulianelli,2020), our proposed method
focuses on detecting smooth, non-bursty semantic
changes; we also go further than other methods by
distinguishing old and contemporary usages of an
identified semantic change.
We show through a multivariate regression that
our estimates of semantic influence of each paper
are positively correlated with their long-term cita-
tions, even after controlling for the initial citations,
the content of the paper in terms of topics, and the
lexical influence of the paper (see Figure 1). Fur-
ther, we formulate long-term citation prediction as
an online prediction task, constructing test sets for
successive years. The addition of semantic influ-
ence as features to a model once again improves the
predictive performance of the model over baselines.
In summary, our contributions are as follows:1
•
We empirically demonstrate a link between
long-term citation count and short-term lin-
guistic influence, using both regression anal-
ysis (§ 5.3) and an online prediction task
(§ 5.4).
•
We present a method to estimate semantic
influence using a parametric Hawkes process
(§ 2.1). To achieve this, we find semantic
changes and convert the usage of each change
into a cascade (§ 2.2). We also show that
the method can be applied to quantify lexical
influence.
•
We present a method to identify monotonic se-
mantic changes from timestamped text using
contextual embeddings (see § 2.2.1).
2 Methodology
This section describes our method for estimating
the linguistic influence of each document in a times-
1
The code and relevant data from our paper
can be found at
http://github.com/sandeepsoni/
contextual-leadership
tamped collection. Our work builds on the theory
of point process models (Daley et al.,2003), in
which the basic unit of data is a set of marked
event timestamps. In our case, the events corre-
spond to the use of an innovative word or usage;
the mark corresponds to the document in which
word or usage appears. To estimate linguistic influ-
ence of individual documents, we fit a parametric
model in which per-document influence parameters
explain the density of events in subsequent docu-
ments. We first describe the modeling framework
in which these influence parameters are estimated
(§ 2.1) and then describe how event cascades are
constructed (§ 2.2) from semantic changes (§ 2.2.1)
and lexical innovations (§ 2.2.2).
2.1 Estimating document influence from
timestamped events
A marked cascade is a set of marked events
{e1, e2, . . . , eN}
, in which each event
ei= (ti, pi)
corresponds to a tuple of a timestamp
ti
and a mark
pi
. Assume a set of marked cascades, indexed by
w∈ W
, with each mark belonging to a finite set
that is shared across all cascades. In our applica-
tion, each cascade corresponds to the appearances
of an individual word or word sense, and each mark
is the identity of the document in which the word
or word sense appears.
Point process models define probability distribu-
tions over cascades. In an inhomogeneous point
process, the distribution of the count of events be-
tween any two timestamps
(t1, t2)
is governed by
the integral of an intensity function
λ(t, w)
. A
Hawkes process is a special case in which the in-
tensity function is the sum of terms associated with
previous events (Hawkes,1971). We choose the
following special form,
λ(t, w) = cw+X
i:t(w)
i<t
αp(w)
i
κ(t−t(w)
i),(1)
where
κ
is a time-decay kernel such as the expo-
nential kernel
κ(∆t) = e−γ∆t
and
cw
is a constant.
The parameter of interest is
α
, which quantifies the
influence exerted by the document
p(w)
i
on subse-
quent events.2
2
In the more general multivariate Hawkes process, the
intensity function can depend on the identity of “receiver” of
influence. This enables the estimation of pairwise excitation
parameters
αi,j
, as in the work of Lukasik et al. (2016), to
give an example from the NLP literature. However, it would
be difficult to estimate pairwise excitation between thousands
of documents, as required by our setting.