
of the historical, expectation or consequence sen-
tences.
We first summarize the distributional association
between the position of reference mentions and dis-
course content types in
§
2.3. Then, we propose
a knowledge distillation-based method to incorpo-
rate discourse knowledge into the TDG system.
We experiment with the BERT Devlin et al. (2019)
and RoBERTa Liu et al. (2019) pre-trained lan-
guages models and find that the proposed knowl-
edge distillation-based TDG system is effective in
using discourse-level cues and achieves improved
performance on identifying cross-sentence refer-
ence mentions while retaining performance on the
intra-sentence mention pairs.
2 Background and Analysis
2.1 News Discourse Profiling (DP)
Following the news content schemata proposed by
Van Dijk (Teun A,1986;Van Dijk,1988a,b), DP
(Choubey et al.,2020) defines eight content types.
Each content type describes the functional role of a
sentence in describing the main news event. Main
event (M1) sentence describes the major events and
subjects of the news article. Consequence (M2) de-
scribes events that are triggered by the main event.
Previous Event (C1) describes recent events that are
a possible cause of the main event. Current Context
(C2) describes remaining contextual information.
Historical Event (D1) describes past events that
precede the main events in months and years, Anec-
dotal Event (D2) describes unverifiable facts, Eval-
uation (D3) describes opinionated contents from
immediate participants, experts or journalists, and
Expectation (D4) describes speculations or possible
consequences of the main or context events.
2.2 Temporal Dependency Graph (TDG)
TDG (Yao et al.,2020) is a directed edge-labeled
graph in which each node is either an event, a
timex, or a meta node (e.g. document creation
time). The reference for each timex/event node is
another timex node or a meta node. Optionally,
the temporal position of some events can be more
precisely determined by referencing them to an-
other event, and thus they can also have a reference
event node. For instance, in Figure 2, the event
incident can only be temporally positioned with
respect to the timex August 23 while the tempo-
ral order of event broke can be determined with
respect to both the timex later and the event oc-
Figure 2: An example TDG.
curred. The edges between event/ timex node pairs
are labeled with one of the overlap,after,before
and included temporal relations while the edges
between a timex node and a meta node is assigned
a generic depend-on label. In this work, we focus
exclusively on identifying the reference timex (and
event) for each timex (event) without predicting the
temporal relations between them.
2.3 Analysis of TDG Structures w.r.t. DP
Sentence Types
As illustrated in Figure 1, discourse roles have tem-
poral interpretations that are useful to locate event
and timex relations in a document. Therefore, we
use the recently proposed discourse profiling sys-
tem by Choubey and Huang (2021)
3
to assign con-
tent type labels to all sentences in the training data
and analyze the distribution of reference timex and
event mentions across different content types. Note
that our analyses are based on a neural network
model-predicted discourse content types which are
noisy. Additionally, a sentence often contains more
than one event and timex mentions and its content
type can only provide a broad temporal ordering
for constituent mentions.
First, we observe that reference timex for both
timex (66% to 100%) and event (54% to 80%) men-
tions from all content types, except the historical, is
majorly the DCT. Further, among the events from
non-historical sentences that are not referenced to
DCT, we observe that majority (71% to 89%) of
them are referenced to a time expression from main,
3
The discourse profiling system was obtained from
https://github.com/prafulla77/Discoure_
Profiling_RL_EMNLP21Findings.