He Said She Said Style Transfer for Shifting the Perspective of Dialogues Amanda Bertsch

2025-05-06 0 0 282.43KB 18 页 10玖币
侵权投诉
He Said, She Said: Style Transfer for
Shifting the Perspective of Dialogues
Amanda Bertsch
Carnegie Mellon University
abertsch@cs.cmu.edu
Graham Neubig
Carnegie Mellon University
gneubig@cs.cmu.edu
Matthew R. Gormley
Carnegie Mellon University
mgormley@cs.cmu.edu
Abstract
In this work, we define a new style trans-
fer task: perspective shift, which reframes a
dialouge from informal first person to a for-
mal third person rephrasing of the text. This
task requires challenging coreference resolu-
tion, emotion attribution, and interpretation of
informal text. We explore several baseline ap-
proaches and discuss further directions on this
task when applied to short dialogues. As a
sample application, we demonstrate that ap-
plying perspective shifting to a dialogue sum-
marization dataset (SAMSum) substantially
improves the zero-shot performance of ex-
tractive news summarization models on this
data. Additionally, supervised extractive mod-
els perform better when trained on perspective
shifted data than on the original dialogues. We
release our code publicly.1
1 Introduction
Style transfer models change surface attributes of
text while preserving the content. Previous work
on style transfer has focused on controlling the for-
mality, authorial style, and sentiment of text (Jin
et al.,2022). We propose a new style transfer task:
perspective shift from dialogue to 3rd person con-
versational accounts (§2). In this task, we seek to
convert from an informal 1st person transcription of
the dialogue to a 3rd person rephrasing of the con-
versation, where each line captures the information
of a single utterance with relevant contextualizing
information added. Table 1demonstrates an exam-
ple conversion and its perspective shifted version.
This task is challenging because it requires the
interpretation of many discourse phenomena. In di-
alogue, speakers commonly use 1st and 2nd person
pronouns and casual speech. Speakers also convey
their own emotions and opinions in their speech.
Converting a multi-party conversation to a single-
perspective rephrasing requires pronoun resolution,
1https://github.com/abertsch72/
perspective-shifting
formalization, and attribution of emotion/stance
markers to individuals. While coreference resolu-
tion, stance detection, and formalization are often
treated as separate tasks, the signal for these objec-
tives is commingled in the dialogues. A pipeline
approach would discard information necessary for
any one task in the completion of the other two.
We create a dataset for this task by annotating
dialogues from the SAMSum corpus (Gliwa et al.,
2019), a dialogue summarization corpus of syn-
thetic text message conversations (§3). For each
conversation, annotators rephrase the utterances
line-by-line into one or more sentences in 3rd per-
son. Unlike a summary, which condenses informa-
tion to highlight the most important points, the goal
of this transformation is to capture as much of the
information from the original utterance as possible
in a more standardized form.
We fine-tune BART on this dataset as a super-
vised baseline under several different problem for-
mulations, and we experiment with incorporating
formality data into the training process (§4). As a
motivating use case, we demonstrate that extractive
summarization over perspective-shifted dialogue is
more fluent and has higher ROUGE scores than ex-
tractive summarization over the original dialogues
5). This trend holds for zero-shot performance of
extractive summarization models trained on news
corpora and for fully supervised training on model-
generated perspective shift data.
Perspective shift can be a useful operation for
extractive summarization when annotation time is
limited; when additional data from out-of-domain
is available; when the exact length and content of
the summary is not known at annotation time; or
when high faithfulness is important to the end task,
but fluency is also a concern (§5.3).
2 Task definition
We define perspective shift as an utterance-level
rephrasing task. Given a dialogue and a single
arXiv:2210.15462v1 [cs.CL] 27 Oct 2022
Original Perspective shifted
Laura: I need a new printer :/ Laura is frustrated that she needs a new printer.
Laura: thinking about this one Laura is thinking about a specific printer.
Laura: <file_other> Laura sends a file.
Jamie: you’re sure you need a new one? Jamie asks if Laura is sure she needs a new one.
Jamie: I mean you can buy a second hand
one
Jamie clarifies that Laura could buy a secondhand printer.
Laura: could be Laura says that’s possible.
Table 1: An example conversation from the SAMSum dataset with the associated perspective shift.
selected utterance, the goal of the task is to rewrite
that utterance as a formal third person statement.
Four operations are required to accomplish this
change: coreference resolution, syntactic rewriting,
formalization, and emotion attribution. Table 1
shows an example conversation and perspective
shift, demonstrating each of these challenges.
First-person singular and second-person pro-
nouns are usually easily resolved in a conversa-
tional context—first-person singular refers to the
speaker, while second-person pronouns generally
refer to the other conversational parties—plural
first-person pronouns can be less obvious to re-
solve. When a party in a conversation uses the
pronoun “we,” this plural may be referring to the
other parties in the conversation, some but not all
of the parties in the conversation, or a party not
present in the conversation, e.g. in the utterance “I
need to talk to my husband. We might have other
plans. In our hand-annotated dataset, we resolve
these pronouns wherever possible; if it is not clear
what group the pronoun refers to, we resolve the
pronoun as referring to “<the current speaker> and
others,” e.g. “Laura: we are busy” becomes “Laura
and others are busy”. Other entities in the text may
also be difficult to resolve, such as those defined
only at the beginning of the conversation, many
turns prior to the current reference.
Syntactic rewriting is the problem of converting
the syntax of the utterance to reflect 3rd rather than
1st person. This may involve re-conjugating verbs,
e.g. converting “Sam: I am busy” to “Sam is busy.
Formalization and emotion attribution are related
problems, as much of the emotion and stance infor-
mation in the text is contained in informal phrases,
unconventional punctuation, and emojis (Tagg,
2016). Typical formalization eliminates these mark-
ers without replacement (Rao and Tetreault,2018).
However, this makes formalization a highly lossy
conversion, which may be undesirable for down-
stream tasks. We aim to limit the information lost
in the perspective shift operation by encoding the
meanings of such informal language in the output.
Often this takes the form of an adverb (e.g. “Sam
angrily says”) or a short descriptive sentence (e.g.
“Cam is amused”). This requires interpretation of
the informal elements of the text.
Clearly, this task is far more complex than sim-
ply swapping pronouns for speaker names. We
curate a dataset for the perspective shift operation.
3 Dataset creation
The dataset is an annotated subset of the SAMSum
(Gliwa et al.,2019) dataset for dialogue summa-
rization. SAMSum is a dataset of simulated text
message conversations, ranging from 3 to 30 lines
in length and with between 2 and 20 speakers. The
dataset consists of 314 conversations from the train
set, 368 conversations from the validation set, and
151 conversations from the test set
2
. We set aside
the 151 conversations from test as a test split and
use the other 682 conversations as training and val-
idation data.
Annotators were instructed to convert each utter-
ance individually to a formal 3rd person rephrasing,
while preserving as much of the tone of the utter-
ance as possible. Annotators were required to insert
the speaker’s name in each rewritten utterance and
remove all 1st-person pronouns. Annotators were
also asked to standardize grammar, remove ques-
tions, and add additional context (e.g. descriptive
adverbs) to convey emotions previously expressed
by emoticons. Further information about annota-
tor selection and pay, as well as a full copy of the
annotation instructions, is available in Appendix
D.
2
Due to SAMSum’s restrictive licensing, we are unable to
release the dataset at this time. The SAMSum authors did not
approve our requested exception.
Method ROUGE-1 ROUGE-2 ROUGE-L BARTScore
no context 62.57 40.45 61.41 -2.38
left context only 60.80 37.50 59.27 -2.39
left and right context 63.57 40.74 62.04 -2.36
conversation-level 63.20 35.04 51.80 -2.67
Table 2: Scores on the test set for models trained with different problem formulations.
3.1 Dataset statistics
The perspective shifted conversations differ from
the original in several ways. The number of turns
in each conversation is preserved, but the average
turn length varies: for the perspective shifts, the
mean number of words per turn is 11.0, while
the mean for the original dialogues is 8.4. (Note
that the simplest heuristic would increase each
utterance’s word count by 1, as the colon next to
the speaker name is swapped out with the word
“says”).
The average word-wise edit distance between
original and perspective-shifted utterances is 8.5
words. This is partially due to the insertion of a
dialogue tag (e.g. “says”) in each utterance, the
removal of emojis (average 0.1 per utterance), and
the resolving of first and second person pronouns
(average 0.9 per utterance). The part of speech
3
distribution of the conversations also changes, with
a strong (65.8%) decrease in interjections and a
slight (5.1%) decrease in adjectives and adverbs.
However, in utterances that contain at least one
emoji, the number of adjectives and adverbs present
increases 12.8%. This is consistent with the an-
notation guidelines, which instruct annotators to
capture the meaning of informal markers such as
emoji with descriptors.
4 Perspective shifting
4.1 Formulation of the Prediction Problem
Methods
We consider several formulations of
the perspective shifting task as a prediction prob-
lem with different input and output styles. Below,
the first three approaches formulate the problem
as a line-by-line task: each input example consists
of the full conversation with one utterance desig-
nated as the utterance to be perspective shifted. The
fourth approach below formulates the problem as
conversation-level task in which the entire conver-
3
Part-of-speech related statistics are calculated using the
spaCy POS tagger (Honnibal et al.,2020).
sation is perspective shifted at once.
1.
no context: The input to the model is the ut-
terance
ut
, and the output is the perspective
shifted version, yt.
2.
left context only: The input is the dialogue up
to and including utterance
ut
, and the output is
the perspective shifted version,
yt
. A
[SEP]
token delimits the left context,
u1,...,ut1
,
from the utterance ut.
3.
left and right context: The input is the full
conversation, with
[SEP]
tokens around the
utterance
ut
, and the output is the perspective
shifted version, yt.
4.
conversation-level: The input is a complete
dialogue
u1,...,uT
, and the output is a com-
plete perspective shift y1,...,yT.
For each formulation, we finetune a BART-large
(Devlin et al.,2019) model for 15 epochs, using
early stopping, an effective batch size of 8, and a
learning rate of 5e-5.
Results
ROUGE 1/2/L scores and BARTScore
for each model are listed in Table 2.
The no context model treats this as a purely
utterance-level task, but fully precludes the addi-
tion of context from other utterances. This means
that second-person and first-person plural pronouns
cannot be resolved clearly. While this model scores
quite highly on all 4 metrics, we observe a high
rate of named entity hallucination in the converted
outputs. For instance, for the input utterance “Han-
nah: Hey, do you have Betty’s number?”, the no
context model outputs “Hannah asks John if he
has Betty’s number. However, the other conver-
sational partner in this dialogue is “Amanda,” not
“John. Because the gold perspective shifts were
annotated with the full conversation available for
reference, this model often hallucinates to fill in
named entity slots that it does not have the context
to resolve.
Approach ROUGE-1 ROUGE-2 ROUGE-L BARTScore
PS ONLY 63.57 40.74 62.04 -2.36
FORMALITY + PS 62.00 39.14 60.38 -2.37
FORMALITY ONLY 51.25 22.12 49.96 -2.57
RULES-BASED HEURISTIC 61.77 35.93 55.34 -2.80
HEURISTIC +FORMALITY 56.98 31.91 55.72 -2.59
Table 3: Scores for each of the perspective shift models.
By contrast, the conversation-level model has
the clear advantage of referencing the entire con-
versation at generation time. However, the model
does not have a requirement to produce the same
number of lines as the input and must learn this
property during training. We conjecture that this is
the reason for this model’s relatively weak perfor-
mance relative to the left and right context model.
Additionally, if the model generates more or less
lines than the input dialogue, this can be a conflat-
ing factor in the extractive summarization example
we discuss in Section 5. If the model generates
less lines than the input, it has performed some
part of the summarization process by abstracting
the input into a shorter output; if it has generated
more lines than the input, it has produced a harder
problem for the extractive summarization system
by creating more lines to choose the summary from.
Because of this model’s weaker performance and
this conflating factor, we restrict our remaining ex-
periments in this paper to models that perspective
shift one utterance at a time.
The model with left context only mimics how
a human might read the conversation for the first
time, from top to bottom. This choice of model also
imposes the constraint that the output is the same
number of lines as the input, as desired. However,
the dialogues frequently contain cataphora, espe-
cially in the start of the conversations, where the
first speaker may be addressing a second speaker
who has not yet spoken. For instance, in the exam-
ple “Hannah: Hey, do you have Betty’s number?”,
this is the first utterance of the dialogue. A model
with only left context cannot resolve the word “you”
here any better than the no context model.
The left and right context model addresses this
concern by providing the full conversation as input,
but restricting the output generation to a perspective
shift for a single (marked) utterance. This imposes
the output length constraint without sacrificing con-
textual information. This model performs best on
all 4 metrics. As the scores for left and right con-
text and no context models are relatively close, we
conduct a human evaluation comparing these two
cases. In our blind comparison of 22 conversations,
the left and right context model was preferred over
the no context model 86% of the time (2 annotators,
Cohen’s kappa 0.62).
The conversation-level model may be a good
choice for some applications, where output length
is less important to the downstream task. This
model has a higher degree of abstractiveness, which
can lead to increased fluency but also increased
hallucination. For tasks where this is a concern, the
left and right context model achieves reasonable
fluency while adhering more closely to the task, as
measured by the automatic metrics.
4.2 Formality and Perspective Shift
Approaches
We observe that the perspective
shifting task requires a high degree of formaliza-
tion. We consider several models ranging from
simple rule-based approaches to those relying on
an external formalization dataset in order to better
understand the role of formalization in perspec-
tive shifting. The external dataset we consider is
the Grammarly Yahoo Answers Formality Corpus
(GYAFC) (Rao and Tetreault,2018): a dataset of
approximately 100,000 lines from Yahoo Answers
and formal rephrasings of each line.
Our core method is the BART model trained
under the left and right context formulation (PS
ONLY).
We also consider a heuristic baseline (RULES-
BASED HEURISTIC). For each message, we
prepend the speaker’s name and the word “says” to
the utterance. We replace each instance of the pro-
noun “I” in the message with the speaker’s name.
After observing that most messages are not well-
punctuated, we also append a period to the end of
each utterance. While this heuristic is simple and
ignores many pronoun resolution conflicts, it has
the clear advantage of being highly efficient.
We incorporate the GYAFC corpus as part of
摘要:

HeSaid,SheSaid:StyleTransferforShiftingthePerspectiveofDialoguesAmandaBertschCarnegieMellonUniversityabertsch@cs.cmu.eduGrahamNeubigCarnegieMellonUniversitygneubig@cs.cmu.eduMatthewR.GormleyCarnegieMellonUniversitymgormley@cs.cmu.eduAbstractInthiswork,wedeneanewstyletrans-fertask:perspectiveshift,w...

展开>> 收起<<
He Said She Said Style Transfer for Shifting the Perspective of Dialogues Amanda Bertsch.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:282.43KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注