Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters Vinayshekhar Bannihatti Kumar Rashmi Gangadharaiah Dan Roth

2025-05-06 1 0 991.66KB 14 页 10玖币

侵权投诉

Unsupervised Neural Stylistic Text Generation using Transfer learning

and Adapters

Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah, Dan Roth

AWS AI Labs

{vinayshk,rgangad,drot}@amazon.com

Abstract

Research has shown that personality is a key

driver to improve engagement and user ex-

perience in conversational systems (Smestad

and Volden,2018). Conversational agents

should also maintain a consistent persona to

have an engaging conversation with a user

(Gan et al.,2017). However, text gener-

ation datasets are often crowd sourced and

thereby have an averaging effect where the

style of the generation model is an average

style of all the crowd workers that have con-

tributed to the dataset. While one can col-

lect persona-speciﬁc datasets for each task,

it would be an expensive and time consum-

ing annotation effort. In this work, we

propose a novel transfer learning framework

which updates only 0.3% of model parame-

ters to learn style speciﬁc attributes for re-

sponse generation. For the purpose of this

study, we tackle the problem of stylistic story

ending generation using the ROC stories Cor-

pus (Mostafazadeh et al.,2016). We learn style

speciﬁc attributes from the PERSONALITY-

CAPTIONS dataset (Shuster et al.,2019).

Through extensive experiments and evaluation

metrics we show that our novel training pro-

cedure can improve the style generation by

200% over Encoder-Decoder baselines while

maintaining on-par content relevance metrics

with the baseline. We also conducted a pilot

human subject study to solidify the ﬁndings

from our study and ground it to the metrics we

proposed.

1 Introduction

Developing models that are capable of producing

content relevant and stylistic responses is crucial

in Conversational AI systems (Zhu et al.,2021b).

Such systems provide experiences that are more

natural and less robotic. Studies have also shown

that customers continue to engage with a conver-

sational system if it has the ability to consistently

generate responses in the same style (Gan et al.,

2017). Furthermore, physiological studies show

that humans tend to interact with each other in sim-

ilar linguistic styles (Kabbara and Cheung,2016).

Hence, we need a way to train machine learning

models that not only produce content that is rele-

vant but also engage with humans in the style of

their choice (aka persona).

State Of The Art (SOTA) generation models

exhibit inconsistent personas as these models are

trained on data from crowd workers having mul-

tiple personalities (Zhang et al.,2018). An obvi-

ous way to circumvent this problem is to collect

parallel data for each of the persona you want to

associate with the agent and train response gen-

eration conditioned on each persona (Tsai et al.,

2021a). However, such an approach is expensive,

time consuming and not scalable. Hence we need

an efﬁcient mechanism to transfer style attributes

from a style speciﬁc textual corpus that is present

in different domain (Krishna et al.,2020a;Niu and

Bansal,2018a). Although we target the task of

story ending generation in this paper, we take a

more holistic approach in this study to control the

style of Natural Language Generation (NLG) mod-

els. This makes our approach applicable to many

other tasks that require NLG, such as, summariza-

tion and machine translation. Figure 1shows an

example of two different endings (positive and neg-

ative style endings) to the same story context.

Supervised style transfer has shown promise in

story generation and other ﬁelds (Peng et al.,2018;

Tsai et al.,2021a). Unsupervised style transfer has

recently gained momentum as these approaches do

not require parallel corpora. However, many of

these approaches hurt content relevance when there

is an increase in style coefﬁcient (Niu and Bansal,

2018a). Discriminator based loss techniques (Prab-

humoye et al.,2018) tend to suffer in NLG because

of the use of the argmax operation to generate the

next token in sequence. Other techniques require

paraphrasing of data which is not only an expen-

arXiv:2210.03264v1 [cs.CL] 7 Oct 2022

Context

Nicolas hated eating grapes a lot .

He had not eaten them since he was a kid .

One day , he went to a vineyard .

He saw so many grapes that he had to eat one .

Positive Style Ending

He was happy that he had ﬁnally tasted

a new kind of fruit.

Negative Style Ending

The next day, he was so sick he couldn’t

eat any food.

Table 1: Given a story’s context, we can either generate

a positive or a negative ending based on the reader’s

preference.

sive two step transfer process but they also fail to

handle ﬁner linguistic phenomenon that capture

persona (Krishna et al.,2020a).

In this work we study the effects of unsupervised

style transfer using only a handful of samples from

the target style. We propose a three phase training

procedure to generate story endings with a required

style. We use the PERSONALITY-CAPTIONS

Dataset (Shuster et al.,2019) to generate our style

speciﬁc textual corpus. We learn one set of param-

eters that capture style semantics and another set

of parameters to capture content semantics within

the same model. Through extensive evaluation, we

show that our approach improves style of generated

story endings by more than

200%

over the base-

line while maintaining parity with SOTA models

on content relevance. The major contributions of

our work are as follows:

•

A three phase transfer learning procedure that

enables the model to learn style attributes from

style speciﬁc textual corpus and those attributes

for the ﬁnal downstream task. We call this the

learn, learn and relearn (LLR) procedure.

•

We separate style parameters from content pa-

rameters enabling practitioners to plug and play

adapters of different styles while keeping the con-

tent parameters as is. We also show the working

of this approach on more nuanced styles.

•

We design evaluation metrics that show the efﬁ-

cacy of our model against the SOTA baselines.

We also notice a similar results in our human

evaluations.

2 Related Work

Style transfer research has gained signiﬁcant pop-

ularity due to their ability to make text more

user-focused and personalized. Such an ability

has impact on numerous applications (McDonald

and Pustejovsky,1985), such as, persona-based

generation (Huang et al.,2018;Niu and Bansal,

2018b), language modeling to imitate speciﬁc au-

thors (Syed et al.,2019), stylistic summarization

(Jin et al.,2020b).

There have been two paradigms in the area of

style transfer. The distinctions arise in the way

each paradigm treats style and content (Jin et al.,

2020a). The ﬁrst paradigm treats non-functional

linguistic features (such as, formality) as style the

semantics as content. These approaches model

style transfer task as a paraphrase generation task

(Madnani and Dorr,2010;Androutsopoulos and

Malakasiotis,2009;Krishna et al.,2020b). The

second paradigm treats differences in parallel cor-

pora (such as, happy vs. sad, positive vs. negative)

as style and the invariance in the parallel corpora

as content (Mou and Vechtomova,2020).

Traditional style transfer methods were based on

token replacement and templates (Sripada et al.,

2004;Reiter et al.,2005;Gkatzia et al.,2017).

These approaches were difﬁcult to scale as they

required hand-crafted domain-speciﬁc templates.

With recent advances in deep learning, most recent

approaches have proposed neural methods for style

transfer (Zhu et al.,2021a;Syed et al.,2019;Huang

et al.,2018;Niu and Bansal,2018b;Krishna et al.,

2020a;Tsai et al.,2021b).

Early neural methods relied on the availability

of parallel corpora, where sequence-to-sequence

models were applied to perform generation (Rao

and Tetreault,2018). More recently, style transfer

on non-parallel corpora has gained signiﬁcant at-

tention (Krishna et al.,2020a;Zhu et al.,2021a;

Niu and Bansal,2018a;Reid and Zhong,2021).

Recent works on unsupervised style transfer has

shown that they can use monolingual data from an-

other domain to generate stylized responses. How-

ever, these methods tend to suffer from lack of

relevance as they only look to do interpolation be-

tween the two domains that they use. Another line

of work looks at disentangling the style variable

from the content variable (Hu et al.,2017;Shen

et al.,2017). However, owing to the nature of their

training procedure they cannot take advantage of

Large Language Models (LLM) like GPT-2 and

hence cannot start from a decoder state that has

the capability to generate ﬂuent sentences. Also,

they use corpus level representations to disentan-

gle style from content. A common theme in all of

Figure 1: The complete setup of the models and the three phases of our training procedure is shown. The blue

blocks represent the training parameters in each phase. The weights of the orange blocks are not updated. In

phase 1 we ﬁne tune the encoder decoder model on the ROC stories corpus. In Phase 2 we only tune the adapter

weights(AW) using the style corpus with the LM objective. In Phase 3 we ﬁne the adapter weights learned in phase

2 to complete stories using ROC stories corpus.

the above approaches is that they do not have style

speciﬁc parameters in the model and use the same

set of parameters to encode both the style as well

as the content.

Generating interesting story endings has also

been looked at in the past (Gupta et al.,2019a;

Chen et al.,2019;Guan et al.,2018). Most of these

approaches try to either bring in commonsense rea-

soning (Chen et al.,2019) to make for better story

endings or they try to make these endings diverse

across the corpus. Other works have looked at us-

ing a discriminator trained on a parallel corpus to

generate endings with a particular valance but the

training of those systems is not stable owing to the

argmax function one uses while decoding (Peng

et al.,2018).

3 Datasets

3.1 ROC Stories Corpus

We used the ROC stories corpus (Mostafazadeh

et al.,2016) for the task of story ending generation.

Each story in this dataset comprises of

sentences.

We used the ﬁrst

sentences as context of the story

or the input to the model and the

5th

sentence as

the ending of the story which we want to predict.

This led to a total of

90,000

samples in the train

set and 4081 samples in validation and test sets.

3.2 PERSONALITY-CAPTIONS Dataset

This dataset (Shuster et al.,2019) contains

241,848

captions to images in 215 different styles. This

dataset is a 3 tuple of <Image, Persona, Caption>.

For the purpose of this study, we ignored the image

and only considered a corpus of text conditioned on

a style so that we can pre-train Large LMs (LLM)

on that corpus. We grouped together different per-

sonas into a style to increase the size of this corpus.

We had a total of

4,431

captions that we put to-

gether for "Negative style". The "Negative style"

corpus consists of the following ﬁner grained styles:

"Arrogant", "Boyish", "Irritable", "Gloomy", "Fa-

talistic (Bleak, Gloomy)"

4 Models

In this section we describe the LLR procedure and

compare a model trained using this procedure to

the SOTA baselines. Through this procedure the

model learns two different language tasks (story

generation and stylized language generation) and

then re-parameterizes only few of the model param-

eters to adapt to the ﬁnal end task. Figure 1shows

the model and the way it was trained.

4.1 Training Encoder-Decoder

As part of the ﬁrst learning phase, we trained an en-

coder decoder model that learns to predict the end-

ing of the story based on the context(

Storycontext

)

provided to it. We chose BERT (Devlin et al.,

2018) encoder and GPT-2 (Radford et al.,2019)

decoder to train the model. We provide the con-

text of the story to the BERT model to obtain en-

coder embeddings(

Encemb

). Then provide these

embeddings to the GPT-2 model that is trained with

teacher forcing using the story ending. The training

procedures is summarized in Equations 1, 2 and 3.

Encemb =BERT Encoder(Storycontext)(1)

t=GP T _2(Encemb, S1...t)(2)

lossdecoder =1

Tdecoder

t=1

−logP (Wtrue

t)(3)

4.2 Training Adapters to understand style

Adapters are task speciﬁc layers that we can add

to the base transformer models in order to perform

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UnsupervisedNeuralStylisticTextGenerationusingTransferlearningandAdaptersVinayshekharBannihattiKumar,RashmiGangadharaiah,DanRothAWSAILabs{vinayshk,rgangad,drot}@amazon.comAbstractResearchhasshownthatpersonalityisakeydrivertoimproveengagementanduserex-perienceinconversationalsystems(SmestadandVolden,...

展开>> 收起<<

Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters Vinayshekhar Bannihatti Kumar Rashmi Gangadharaiah Dan Roth.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters Vinayshekhar Bannihatti Kumar Rashmi Gangadharaiah Dan Roth

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: