He Said She Said Style Transfer for Shifting the Perspective of Dialogues Amanda Bertsch

2025-05-06 0 0 282.43KB 18 页 10玖币

侵权投诉

He Said, She Said: Style Transfer for

Shifting the Perspective of Dialogues

Amanda Bertsch

Carnegie Mellon University

abertsch@cs.cmu.edu

Graham Neubig

Carnegie Mellon University

gneubig@cs.cmu.edu

Matthew R. Gormley

Carnegie Mellon University

mgormley@cs.cmu.edu

Abstract

In this work, we deﬁne a new style trans-

fer task: perspective shift, which reframes a

dialouge from informal ﬁrst person to a for-

mal third person rephrasing of the text. This

task requires challenging coreference resolu-

tion, emotion attribution, and interpretation of

informal text. We explore several baseline ap-

proaches and discuss further directions on this

task when applied to short dialogues. As a

sample application, we demonstrate that ap-

plying perspective shifting to a dialogue sum-

marization dataset (SAMSum) substantially

improves the zero-shot performance of ex-

tractive news summarization models on this

data. Additionally, supervised extractive mod-

els perform better when trained on perspective

shifted data than on the original dialogues. We

release our code publicly.1

1 Introduction

Style transfer models change surface attributes of

text while preserving the content. Previous work

on style transfer has focused on controlling the for-

mality, authorial style, and sentiment of text (Jin

et al.,2022). We propose a new style transfer task:

perspective shift from dialogue to 3rd person con-

versational accounts (§2). In this task, we seek to

convert from an informal 1st person transcription of

the dialogue to a 3rd person rephrasing of the con-

versation, where each line captures the information

of a single utterance with relevant contextualizing

information added. Table 1demonstrates an exam-

ple conversion and its perspective shifted version.

This task is challenging because it requires the

interpretation of many discourse phenomena. In di-

alogue, speakers commonly use 1st and 2nd person

pronouns and casual speech. Speakers also convey

their own emotions and opinions in their speech.

Converting a multi-party conversation to a single-

perspective rephrasing requires pronoun resolution,

1https://github.com/abertsch72/

perspective-shifting

formalization, and attribution of emotion/stance

markers to individuals. While coreference resolu-

tion, stance detection, and formalization are often

treated as separate tasks, the signal for these objec-

tives is commingled in the dialogues. A pipeline

approach would discard information necessary for

any one task in the completion of the other two.

We create a dataset for this task by annotating

dialogues from the SAMSum corpus (Gliwa et al.,

2019), a dialogue summarization corpus of syn-

thetic text message conversations (§3). For each

conversation, annotators rephrase the utterances

line-by-line into one or more sentences in 3rd per-

son. Unlike a summary, which condenses informa-

tion to highlight the most important points, the goal

of this transformation is to capture as much of the

information from the original utterance as possible

in a more standardized form.

We ﬁne-tune BART on this dataset as a super-

vised baseline under several different problem for-

mulations, and we experiment with incorporating

formality data into the training process (§4). As a

motivating use case, we demonstrate that extractive

summarization over perspective-shifted dialogue is

more ﬂuent and has higher ROUGE scores than ex-

tractive summarization over the original dialogues

(§5). This trend holds for zero-shot performance of

extractive summarization models trained on news

corpora and for fully supervised training on model-

generated perspective shift data.

Perspective shift can be a useful operation for

extractive summarization when annotation time is

limited; when additional data from out-of-domain

is available; when the exact length and content of

the summary is not known at annotation time; or

when high faithfulness is important to the end task,

but ﬂuency is also a concern (§5.3).

2 Task deﬁnition

We deﬁne perspective shift as an utterance-level

rephrasing task. Given a dialogue and a single

arXiv:2210.15462v1 [cs.CL] 27 Oct 2022

Original Perspective shifted

Laura: I need a new printer :/ Laura is frustrated that she needs a new printer.

Laura: thinking about this one Laura is thinking about a speciﬁc printer.

Laura: <ﬁle_other> Laura sends a ﬁle.

Jamie: you’re sure you need a new one? Jamie asks if Laura is sure she needs a new one.

Jamie: I mean you can buy a second hand

one

Jamie clariﬁes that Laura could buy a secondhand printer.

Laura: could be Laura says that’s possible.

Table 1: An example conversation from the SAMSum dataset with the associated perspective shift.

selected utterance, the goal of the task is to rewrite

that utterance as a formal third person statement.

Four operations are required to accomplish this

change: coreference resolution, syntactic rewriting,

formalization, and emotion attribution. Table 1

shows an example conversation and perspective

shift, demonstrating each of these challenges.

First-person singular and second-person pro-

nouns are usually easily resolved in a conversa-

tional context—ﬁrst-person singular refers to the

speaker, while second-person pronouns generally

refer to the other conversational parties—plural

ﬁrst-person pronouns can be less obvious to re-

solve. When a party in a conversation uses the

pronoun “we,” this plural may be referring to the

other parties in the conversation, some but not all

of the parties in the conversation, or a party not

present in the conversation, e.g. in the utterance “I

need to talk to my husband. We might have other

plans.” In our hand-annotated dataset, we resolve

these pronouns wherever possible; if it is not clear

what group the pronoun refers to, we resolve the

pronoun as referring to “<the current speaker> and

others,” e.g. “Laura: we are busy” becomes “Laura

and others are busy”. Other entities in the text may

also be difﬁcult to resolve, such as those deﬁned

only at the beginning of the conversation, many

turns prior to the current reference.

Syntactic rewriting is the problem of converting

the syntax of the utterance to reﬂect 3rd rather than

1st person. This may involve re-conjugating verbs,

e.g. converting “Sam: I am busy” to “Sam is busy.”

Formalization and emotion attribution are related

problems, as much of the emotion and stance infor-

mation in the text is contained in informal phrases,

unconventional punctuation, and emojis (Tagg,

2016). Typical formalization eliminates these mark-

ers without replacement (Rao and Tetreault,2018).

However, this makes formalization a highly lossy

conversion, which may be undesirable for down-

stream tasks. We aim to limit the information lost

in the perspective shift operation by encoding the

meanings of such informal language in the output.

Often this takes the form of an adverb (e.g. “Sam

angrily says”) or a short descriptive sentence (e.g.

“Cam is amused”). This requires interpretation of

the informal elements of the text.

Clearly, this task is far more complex than sim-

ply swapping pronouns for speaker names. We

curate a dataset for the perspective shift operation.

3 Dataset creation

The dataset is an annotated subset of the SAMSum

(Gliwa et al.,2019) dataset for dialogue summa-

rization. SAMSum is a dataset of simulated text

message conversations, ranging from 3 to 30 lines

in length and with between 2 and 20 speakers. The

dataset consists of 314 conversations from the train

set, 368 conversations from the validation set, and

151 conversations from the test set

. We set aside

the 151 conversations from test as a test split and

use the other 682 conversations as training and val-

idation data.

Annotators were instructed to convert each utter-

ance individually to a formal 3rd person rephrasing,

while preserving as much of the tone of the utter-

ance as possible. Annotators were required to insert

the speaker’s name in each rewritten utterance and

remove all 1st-person pronouns. Annotators were

also asked to standardize grammar, remove ques-

tions, and add additional context (e.g. descriptive

adverbs) to convey emotions previously expressed

by emoticons. Further information about annota-

tor selection and pay, as well as a full copy of the

annotation instructions, is available in Appendix

Due to SAMSum’s restrictive licensing, we are unable to

release the dataset at this time. The SAMSum authors did not

approve our requested exception.

Method ROUGE-1 ROUGE-2 ROUGE-L BARTScore

no context 62.57 40.45 61.41 -2.38

left context only 60.80 37.50 59.27 -2.39

left and right context 63.57 40.74 62.04 -2.36

conversation-level 63.20 35.04 51.80 -2.67

Table 2: Scores on the test set for models trained with different problem formulations.

3.1 Dataset statistics

The perspective shifted conversations differ from

the original in several ways. The number of turns

in each conversation is preserved, but the average

turn length varies: for the perspective shifts, the

mean number of words per turn is 11.0, while

the mean for the original dialogues is 8.4. (Note

that the simplest heuristic would increase each

utterance’s word count by 1, as the colon next to

the speaker name is swapped out with the word

“says”).

The average word-wise edit distance between

original and perspective-shifted utterances is 8.5

words. This is partially due to the insertion of a

dialogue tag (e.g. “says”) in each utterance, the

removal of emojis (average 0.1 per utterance), and

the resolving of ﬁrst and second person pronouns

(average 0.9 per utterance). The part of speech

distribution of the conversations also changes, with

a strong (65.8%) decrease in interjections and a

slight (5.1%) decrease in adjectives and adverbs.

However, in utterances that contain at least one

emoji, the number of adjectives and adverbs present

increases 12.8%. This is consistent with the an-

notation guidelines, which instruct annotators to

capture the meaning of informal markers such as

emoji with descriptors.

4 Perspective shifting

4.1 Formulation of the Prediction Problem

Methods

We consider several formulations of

the perspective shifting task as a prediction prob-

lem with different input and output styles. Below,

the ﬁrst three approaches formulate the problem

as a line-by-line task: each input example consists

of the full conversation with one utterance desig-

nated as the utterance to be perspective shifted. The

fourth approach below formulates the problem as

conversation-level task in which the entire conver-

Part-of-speech related statistics are calculated using the

spaCy POS tagger (Honnibal et al.,2020).

sation is perspective shifted at once.

no context: The input to the model is the ut-

terance

, and the output is the perspective

shifted version, yt.

left context only: The input is the dialogue up

to and including utterance

, and the output is

the perspective shifted version,

. A

[SEP]

token delimits the left context,

u1,...,ut−1

from the utterance ut.

left and right context: The input is the full

conversation, with

[SEP]

tokens around the

utterance

, and the output is the perspective

shifted version, yt.

conversation-level: The input is a complete

dialogue

u1,...,uT

, and the output is a com-

plete perspective shift y1,...,yT.

For each formulation, we ﬁnetune a BART-large

(Devlin et al.,2019) model for 15 epochs, using

early stopping, an effective batch size of 8, and a

learning rate of 5e-5.

Results

ROUGE 1/2/L scores and BARTScore

for each model are listed in Table 2.

The no context model treats this as a purely

utterance-level task, but fully precludes the addi-

tion of context from other utterances. This means

that second-person and ﬁrst-person plural pronouns

cannot be resolved clearly. While this model scores

quite highly on all 4 metrics, we observe a high

rate of named entity hallucination in the converted

outputs. For instance, for the input utterance “Han-

nah: Hey, do you have Betty’s number?”, the no

context model outputs “Hannah asks John if he

has Betty’s number.” However, the other conver-

sational partner in this dialogue is “Amanda,” not

“John.” Because the gold perspective shifts were

annotated with the full conversation available for

reference, this model often hallucinates to ﬁll in

named entity slots that it does not have the context

to resolve.

Approach ROUGE-1 ROUGE-2 ROUGE-L BARTScore

PS ONLY 63.57 40.74 62.04 -2.36

FORMALITY + PS 62.00 39.14 60.38 -2.37

FORMALITY ONLY 51.25 22.12 49.96 -2.57

RULES-BASED HEURISTIC 61.77 35.93 55.34 -2.80

HEURISTIC +FORMALITY 56.98 31.91 55.72 -2.59

Table 3: Scores for each of the perspective shift models.

By contrast, the conversation-level model has

the clear advantage of referencing the entire con-

versation at generation time. However, the model

does not have a requirement to produce the same

number of lines as the input and must learn this

property during training. We conjecture that this is

the reason for this model’s relatively weak perfor-

mance relative to the left and right context model.

Additionally, if the model generates more or less

lines than the input dialogue, this can be a conﬂat-

ing factor in the extractive summarization example

we discuss in Section 5. If the model generates

less lines than the input, it has performed some

part of the summarization process by abstracting

the input into a shorter output; if it has generated

more lines than the input, it has produced a harder

problem for the extractive summarization system

by creating more lines to choose the summary from.

Because of this model’s weaker performance and

this conﬂating factor, we restrict our remaining ex-

periments in this paper to models that perspective

shift one utterance at a time.

The model with left context only mimics how

a human might read the conversation for the ﬁrst

time, from top to bottom. This choice of model also

imposes the constraint that the output is the same

number of lines as the input, as desired. However,

the dialogues frequently contain cataphora, espe-

cially in the start of the conversations, where the

ﬁrst speaker may be addressing a second speaker

who has not yet spoken. For instance, in the exam-

ple “Hannah: Hey, do you have Betty’s number?”,

this is the ﬁrst utterance of the dialogue. A model

with only left context cannot resolve the word “you”

here any better than the no context model.

The left and right context model addresses this

concern by providing the full conversation as input,

but restricting the output generation to a perspective

shift for a single (marked) utterance. This imposes

the output length constraint without sacriﬁcing con-

textual information. This model performs best on

all 4 metrics. As the scores for left and right con-

text and no context models are relatively close, we

conduct a human evaluation comparing these two

cases. In our blind comparison of 22 conversations,

the left and right context model was preferred over

the no context model 86% of the time (2 annotators,

Cohen’s kappa 0.62).

The conversation-level model may be a good

choice for some applications, where output length

is less important to the downstream task. This

model has a higher degree of abstractiveness, which

can lead to increased ﬂuency but also increased

hallucination. For tasks where this is a concern, the

left and right context model achieves reasonable

ﬂuency while adhering more closely to the task, as

measured by the automatic metrics.

4.2 Formality and Perspective Shift

Approaches

We observe that the perspective

shifting task requires a high degree of formaliza-

tion. We consider several models ranging from

simple rule-based approaches to those relying on

an external formalization dataset in order to better

understand the role of formalization in perspec-

tive shifting. The external dataset we consider is

the Grammarly Yahoo Answers Formality Corpus

(GYAFC) (Rao and Tetreault,2018): a dataset of

approximately 100,000 lines from Yahoo Answers

and formal rephrasings of each line.

Our core method is the BART model trained

under the left and right context formulation (PS

ONLY).

We also consider a heuristic baseline (RULES-

BASED HEURISTIC). For each message, we

prepend the speaker’s name and the word “says” to

the utterance. We replace each instance of the pro-

noun “I” in the message with the speaker’s name.

After observing that most messages are not well-

punctuated, we also append a period to the end of

each utterance. While this heuristic is simple and

ignores many pronoun resolution conﬂicts, it has

the clear advantage of being highly efﬁcient.

We incorporate the GYAFC corpus as part of

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HeSaid,SheSaid:StyleTransferforShiftingthePerspectiveofDialoguesAmandaBertschCarnegieMellonUniversityabertsch@cs.cmu.eduGrahamNeubigCarnegieMellonUniversitygneubig@cs.cmu.eduMatthewR.GormleyCarnegieMellonUniversitymgormley@cs.cmu.eduAbstractInthiswork,wedeneanewstyletrans-fertask:perspectiveshift,w...

展开>> 收起<<

He Said She Said Style Transfer for Shifting the Perspective of Dialogues Amanda Bertsch.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

He Said She Said Style Transfer for Shifting the Perspective of Dialogues Amanda Bertsch

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: