Detecting Narrative Elements in Informational Text Effi Levi1 Guy Mor2 Tamir Sheafer23 Shaul R. Shenhav2 1Institute of Computer Science The Hebrew University of Jerusalem

2025-04-27 0 0 237.48KB 11 页 10玖币
侵权投诉
Detecting Narrative Elements in Informational Text
Effi Levi1, Guy Mor2, Tamir Sheafer2,3, Shaul R. Shenhav2
1Institute of Computer Science, The Hebrew University of Jerusalem
2Department of Political Science, The Hebrew University of Jerusalem
3Department of Communication and Journalism, The Hebrew University of Jerusalem
efle@cs.huji.ac.il
{guy.mor|tamir.sheafer|shaul.shenhav}@mail.huji.ac.il
Abstract
Automatic extraction of narrative elements
from text, combining narrative theories with
computational models, has been receiving in-
creasing attention over the last few years. Pre-
vious works have utilized the oral narrative the-
ory by Labov and Waletzky to identify vari-
ous narrative elements in personal stories texts.
Instead, we direct our focus to informational
texts, specifically news stories.
We introduce NEAT (Narrative Elements An-
noTation) – a novel NLP task for detecting nar-
rative elements in raw text. For this purpose,
we designed a new multi-label narrative anno-
tation scheme, better suited for informational
text (e.g. news media), by adapting elements
from the narrative theory of Labov and Walet-
zky (Complication and Resolution)
and adding a new narrative element of our own
(Success). We then used this scheme to an-
notate a new dataset of 2,209 sentences, com-
piled from 46 news articles from various cate-
gory domains1. We trained a number of super-
vised models in several different setups over
the annotated dataset to identify the different
narrative elements, achieving an average F1
score of up to 0.77. The results demonstrate
the holistic nature of our annotation scheme as
well as its robustness to domain category.
1 Introduction
Automatic extraction of narrative elements from
texts is a multidisciplinary field of research, com-
bining narrative theories with computational mod-
els, which has been receiving increasing attention
over the last few years. Examples include modeling
narrative structures for story generation (Gervás
et al.,2006), using unsupervised methods to de-
tect narrative event chains (Chambers and Juraf-
sky,2008) and detecting content zones (Baiamonte
et al.,2016) in news articles, using semantic fea-
tures to detect narreme boundaries in fictitious
1https://github.com/efle/NEAT
prose (Delmonte and Marchesini,2017), identi-
fying turning points in movie plots (Papalampidi
et al.,2019) and using temporal word embeddings
to analyze the evolution of characters in the context
of a narrative plot (Volpetti et al.,2020).
A recent and more specific line of work focuses
on using the theory laid out by Labov and Walet-
zky (1967) and later refined by Labov (2013) to
characterize narrative elements in personal experi-
ence texts. Swanson et al. (2014) relied on Labov
and Waletzky (1967) to annotate a corpus of 50
personal stories from weblogs posts, and tested sev-
eral models over hand-crafted features to classify
clauses into three narrative clause types: orienta-
tion,evaluation and action.Ouyang and McKeown
(2014) constructed a corpus from 20 oral narratives
of personal experience collected by Labov (2013),
and utilized logistic regression over hand-crafted
features to detect instances of complicating actions.
More recently, Li et al. (2017) utilized a combi-
nation of ideas from Labov and Waletzky (1967)
and Freytag (1894) to annotate a collection of short
stories, and Saldias and Roy (2020) used convolu-
tional neural networks (CNNs) to classify clauses
from spoken personal texts into the same three nar-
rative clause types as Swanson et al. (2014).
While these works concentrated their effort on
narrative analysis of personal experience texts, we
direct our focus to detecting narrative patterns in
informational texts, such as news stories. The so-
cial impact of news stories distributed by the media
and their role in creating and shaping of public
opinion incentivized our efforts to adapt narrative
analysis approaches to this domain. To the best of
our knowledge, this is the first attempt to automati-
cally detect narrative elements based on Labov and
Waletzky (1967) and later works by Labov (1972,
2013) in news articles.
In this work, we introduce NEAT (Narrative Ele-
ments AnnoTation) – a novel NLP task for detect-
ing narrative elements in raw text. For this pur-
arXiv:2210.03028v1 [cs.CL] 6 Oct 2022
pose, we adapted two elements from the narrative
theory presented in Labov and Waletzky (1967);
Labov (1972,2013), namely
Complication
and
Resolution
, while adding a new narrative
element,
Success
, to create a new multi-label
narrative annotation scheme. This scheme was de-
signed with two main objectives in mind. First, cap-
turing elements oriented towards discourse struc-
ture, rather than semantic content. Second, pos-
sessing the flexibility required to capture narrative
characteristics within a wide variety of text types,
specifically informational text (as opposed to per-
sonal experience), and not only literary and well-
structured stories. We used this scheme to anno-
tate a newly-constructed dataset of 2,209 sentences,
compiled from 46 English news articles; each sen-
tence was tagged with a subset of the three narrative
elements (or, in some cases, none of them), thus
defining a novel multi-label classification task.
We explored two different approaches towards
solving our new task: splitting into three unre-
lated binary classification tasks (
Complication
,
Resolution
and
Success
), and jointly learn-
ing the three narrative categories as a multi-label
classification task. We experimented with three
supervised models, each based on fine-tuning a dif-
ferent pre-trained language model: BERT (Devlin
et al.,2018), RoBERTa (Liu et al.,2019) and Dis-
tilBERT (Sanh et al.,2020), achieving an average
F1
score of up to 0.77. An analysis of the results
indicates that our narrative categories are strongly
connected and form a coherent narrative scheme
which is more than just the sum of its parts. Addi-
tional experimentation with cross-domain classifi-
cation demonstrates the task’s robustness to domain
category, suggesting that our annotation scheme is
more grounded in discourse characteristics rather
than semantic context.
The remainder of this paper is organized as fol-
lows: Section 2gives a theoretical background
and describes the adjustments we have made to
the scheme in Labov (2013) in order to adapt it to
informational text. Section 3provides a complete
description of the dataset and of the processes and
methodologies which were used to construct and
annotate it, along with a short analysis and some
examples for annotated sentences. Section 4de-
scribes the experiments conducted on the dataset,
and Section 5provides an analysis and a discus-
sion of the results. Finally, Section 6contains a
summary of our contributions as well as several
potential directions for future work.
2 Narrative Analysis
2.1 Background
Ever since the emergence of formalism and
structuralistic literary criticism (Propp,1968)
and throughout the development of narratology
(Genette,1980;Fludernik,2009;Chatman,1978;
Rimmon-Kenan,2003), narrative structure has
been the focus of extensive theoretical and em-
pirical research. While most of these studies were
conducted in the context of literary analysis, the
interest in narrative structures has made inroads
into social sciences (Shenhav,2015). The classi-
cal work by Labov and Waletzky (1967) on oral
narratives, as well as later works (Labov,1972,
2013), signify this stream of research by provid-
ing a schema for an overall structure of narratives,
according to which a narrative construction encom-
passes the following building blocks (Labov,1972,
2013): abstract (what is the narrative about), ori-
entation (information on the time, the place, the
persons and the behavior involved), complicating
action (or simply complication; the forward pro-
gression of narrative clauses), evaluation (estab-
lishing the narrative’s "point"), resolution (what
finally happened), and coda (bringing the time of
reference back to the present time of narration).
These building blocks provide useful and influen-
tial guidelines for oral narratives analysis.
2.2 Adaptation
Despite the substantial influence of Labov and
Waletzky (1967) and Labov (2013), scholars in
the field of communication have noticed that this
overall structure does not necessarily comply with
the form of informational text, such as news sto-
ries (Thornborrow and Fitzgerald,2004;Van Dijk,
1988), and consequently proposed modified narra-
tive structures (Thornborrow and Fitzgerald,2004).
Unlike well-tailored narrative texts, such as per-
sonal experience texts, narrativity in informational
text is somewhat more challenging as it does
not necessarily follow conventional or predefined
genre-related structures. This requires a flexible
coding scheme, unconstrained by a specific type
of text. Instead, it should be open to a wide range
of text types (such as informational text), and al-
low the presence of micro stories, encompassing
any combination of all narrative categories even
at the sentence level. We set to accomplish that
Complication Resolution Success
# Sentences 1,092 541 312
Proportion in Dataset 49% 24% 14%
Table 1: Overview of the NEAT dataset. Note that the categories are not mutually exclusive, due to the multi-
labeled nature of the annotation scheme.
via two objectives: first, formalizing narrative cate-
gories which are oriented towards discourse struc-
ture, rather than semantic context. Second, defining
our task as a multi-labeled one, to allow the flexibil-
ity required to capture sentence-level narrative char-
acteristics. A special consideration was given to the
variety of contents, forms and writing styles typical
for media texts. For example, we required a coding
scheme that would fit laconic or problem-driven
short reports (too short for full-fledged “Labovian”
narrative style), as well as complicated texts with
multiple story-lines moving from one story to an-
other. We addressed this challenge by focusing on
two of Labov’s six elements - complicating action
and resolution, considered to be the most funda-
mental and relevant for informational text analysis
(Labov,2013). There are several reasons for our fo-
cus on these particular elements: first, it goes in line
with the understanding that worth-telling stories
usually consist of protagonists facing and resolving
problematic experiences (Eggins and Slade,2005).
Moreover, these elements resonate with what is
considered by Entman (2004) to be the most impor-
tant Framing Functions - problem definition and
remedy.
In order to adapt the original complicating ac-
tion and resolution categories to informational con-
tent, we designed our annotation scheme as follows.
Complicating action – hence,
Complication
was defined in our narrative scheme as an event,
series of events or situation, that point at problems
or tensions.
Resolution
refers to the way the
story is resolved or to the release of the tension. An
improvement from – or a manner of coping with
– an existing or a hypothetical situation was also
considered to be a
Resolution
. This choice was
made in order to follow the often tentative or spec-
ulative notion of future resolutions in news stories
(Thornborrow and Fitzgerald,2004;Bell,1991).
We have therefore included in this category any
temporary or partial resolutions. The transitional
characteristic of the
Resolution
motivated us
to add a new category defined as
Success
. Un-
like
Resolution
, which refers, implicitly or ex-
plicitly, to a prior situation, this category was de-
signed to capture any description or indication of
an achievement or a desirable outcome.
3 The Dataset
3.1 Pilot Study
We started by conducting a pilot study, for the pur-
pose of formalizing an annotation scheme and train-
ing our annotators. For this study, sample sentences
were gathered from print news articles, published
between 1995 and 2017 and collected via Lexis-
Nexis. These were used to refine the annotation
scheme described in Section 2.2, as well as per-
form extensive training for our annotators.
Following the conclusion of the pilot study, we
used the sentences which were collected and manu-
ally annotated during the pilot to train a multi-label
classifier, later used to provide labeled candidates
for the annotators during the annotation stage of
the NEAT dataset, in order to optimize annotation
rate and accuracy. The pilot samples were then
discarded.
3.2 News Articles
The news articles for the dataset were sampled from
leading news websites in the English language, all
published between 2017 and 2020. The result is
a corpus of 2,209 sentences taken from 46 news
articles, with an average of 48 sentences per article
(
σ2= 39.44
), and an average of 20.2 tokens per
sentence (
σ2= 11.2
). The articles are semantically
diverse, as they were sampled from a wide array of
domain categories.
3.3 Preprocessing
The news articles’ content was extracted using diff-
bot. The texts were scraped and split into sentences
using the Punkt unsupervised sentence segmenter
(Kiss and Strunk,2006). Remaining segmentation
errors were manually corrected.
摘要:

DetectingNarrativeElementsinInformationalTextEfLevi1,GuyMor2,TamirSheafer2,3,ShaulR.Shenhav21InstituteofComputerScience,TheHebrewUniversityofJerusalem2DepartmentofPoliticalScience,TheHebrewUniversityofJerusalem3DepartmentofCommunicationandJournalism,TheHebrewUniversityofJerusalemefle@cs.huji.ac.il{...

展开>> 收起<<
Detecting Narrative Elements in Informational Text Effi Levi1 Guy Mor2 Tamir Sheafer23 Shaul R. Shenhav2 1Institute of Computer Science The Hebrew University of Jerusalem.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:237.48KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注