Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters Vinayshekhar Bannihatti Kumar Rashmi Gangadharaiah Dan Roth

2025-05-06 0 0 991.66KB 14 页 10玖币
侵权投诉
Unsupervised Neural Stylistic Text Generation using Transfer learning
and Adapters
Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah, Dan Roth
AWS AI Labs
{vinayshk,rgangad,drot}@amazon.com
Abstract
Research has shown that personality is a key
driver to improve engagement and user ex-
perience in conversational systems (Smestad
and Volden,2018). Conversational agents
should also maintain a consistent persona to
have an engaging conversation with a user
(Gan et al.,2017). However, text gener-
ation datasets are often crowd sourced and
thereby have an averaging effect where the
style of the generation model is an average
style of all the crowd workers that have con-
tributed to the dataset. While one can col-
lect persona-specific datasets for each task,
it would be an expensive and time consum-
ing annotation effort. In this work, we
propose a novel transfer learning framework
which updates only 0.3% of model parame-
ters to learn style specific attributes for re-
sponse generation. For the purpose of this
study, we tackle the problem of stylistic story
ending generation using the ROC stories Cor-
pus (Mostafazadeh et al.,2016). We learn style
specific attributes from the PERSONALITY-
CAPTIONS dataset (Shuster et al.,2019).
Through extensive experiments and evaluation
metrics we show that our novel training pro-
cedure can improve the style generation by
200% over Encoder-Decoder baselines while
maintaining on-par content relevance metrics
with the baseline. We also conducted a pilot
human subject study to solidify the findings
from our study and ground it to the metrics we
proposed.
1 Introduction
Developing models that are capable of producing
content relevant and stylistic responses is crucial
in Conversational AI systems (Zhu et al.,2021b).
Such systems provide experiences that are more
natural and less robotic. Studies have also shown
that customers continue to engage with a conver-
sational system if it has the ability to consistently
generate responses in the same style (Gan et al.,
2017). Furthermore, physiological studies show
that humans tend to interact with each other in sim-
ilar linguistic styles (Kabbara and Cheung,2016).
Hence, we need a way to train machine learning
models that not only produce content that is rele-
vant but also engage with humans in the style of
their choice (aka persona).
State Of The Art (SOTA) generation models
exhibit inconsistent personas as these models are
trained on data from crowd workers having mul-
tiple personalities (Zhang et al.,2018). An obvi-
ous way to circumvent this problem is to collect
parallel data for each of the persona you want to
associate with the agent and train response gen-
eration conditioned on each persona (Tsai et al.,
2021a). However, such an approach is expensive,
time consuming and not scalable. Hence we need
an efficient mechanism to transfer style attributes
from a style specific textual corpus that is present
in different domain (Krishna et al.,2020a;Niu and
Bansal,2018a). Although we target the task of
story ending generation in this paper, we take a
more holistic approach in this study to control the
style of Natural Language Generation (NLG) mod-
els. This makes our approach applicable to many
other tasks that require NLG, such as, summariza-
tion and machine translation. Figure 1shows an
example of two different endings (positive and neg-
ative style endings) to the same story context.
Supervised style transfer has shown promise in
story generation and other fields (Peng et al.,2018;
Tsai et al.,2021a). Unsupervised style transfer has
recently gained momentum as these approaches do
not require parallel corpora. However, many of
these approaches hurt content relevance when there
is an increase in style coefficient (Niu and Bansal,
2018a). Discriminator based loss techniques (Prab-
humoye et al.,2018) tend to suffer in NLG because
of the use of the argmax operation to generate the
next token in sequence. Other techniques require
paraphrasing of data which is not only an expen-
arXiv:2210.03264v1 [cs.CL] 7 Oct 2022
Context
Nicolas hated eating grapes a lot .
He had not eaten them since he was a kid .
One day , he went to a vineyard .
He saw so many grapes that he had to eat one .
Positive Style Ending
He was happy that he had finally tasted
a new kind of fruit.
Negative Style Ending
The next day, he was so sick he couldn’t
eat any food.
Table 1: Given a story’s context, we can either generate
a positive or a negative ending based on the reader’s
preference.
sive two step transfer process but they also fail to
handle finer linguistic phenomenon that capture
persona (Krishna et al.,2020a).
In this work we study the effects of unsupervised
style transfer using only a handful of samples from
the target style. We propose a three phase training
procedure to generate story endings with a required
style. We use the PERSONALITY-CAPTIONS
Dataset (Shuster et al.,2019) to generate our style
specific textual corpus. We learn one set of param-
eters that capture style semantics and another set
of parameters to capture content semantics within
the same model. Through extensive evaluation, we
show that our approach improves style of generated
story endings by more than
200%
over the base-
line while maintaining parity with SOTA models
on content relevance. The major contributions of
our work are as follows:
A three phase transfer learning procedure that
enables the model to learn style attributes from
style specific textual corpus and those attributes
for the final downstream task. We call this the
learn, learn and relearn (LLR) procedure.
We separate style parameters from content pa-
rameters enabling practitioners to plug and play
adapters of different styles while keeping the con-
tent parameters as is. We also show the working
of this approach on more nuanced styles.
We design evaluation metrics that show the effi-
cacy of our model against the SOTA baselines.
We also notice a similar results in our human
evaluations.
2 Related Work
Style transfer research has gained significant pop-
ularity due to their ability to make text more
user-focused and personalized. Such an ability
has impact on numerous applications (McDonald
and Pustejovsky,1985), such as, persona-based
generation (Huang et al.,2018;Niu and Bansal,
2018b), language modeling to imitate specific au-
thors (Syed et al.,2019), stylistic summarization
(Jin et al.,2020b).
There have been two paradigms in the area of
style transfer. The distinctions arise in the way
each paradigm treats style and content (Jin et al.,
2020a). The first paradigm treats non-functional
linguistic features (such as, formality) as style the
semantics as content. These approaches model
style transfer task as a paraphrase generation task
(Madnani and Dorr,2010;Androutsopoulos and
Malakasiotis,2009;Krishna et al.,2020b). The
second paradigm treats differences in parallel cor-
pora (such as, happy vs. sad, positive vs. negative)
as style and the invariance in the parallel corpora
as content (Mou and Vechtomova,2020).
Traditional style transfer methods were based on
token replacement and templates (Sripada et al.,
2004;Reiter et al.,2005;Gkatzia et al.,2017).
These approaches were difficult to scale as they
required hand-crafted domain-specific templates.
With recent advances in deep learning, most recent
approaches have proposed neural methods for style
transfer (Zhu et al.,2021a;Syed et al.,2019;Huang
et al.,2018;Niu and Bansal,2018b;Krishna et al.,
2020a;Tsai et al.,2021b).
Early neural methods relied on the availability
of parallel corpora, where sequence-to-sequence
models were applied to perform generation (Rao
and Tetreault,2018). More recently, style transfer
on non-parallel corpora has gained significant at-
tention (Krishna et al.,2020a;Zhu et al.,2021a;
Niu and Bansal,2018a;Reid and Zhong,2021).
Recent works on unsupervised style transfer has
shown that they can use monolingual data from an-
other domain to generate stylized responses. How-
ever, these methods tend to suffer from lack of
relevance as they only look to do interpolation be-
tween the two domains that they use. Another line
of work looks at disentangling the style variable
from the content variable (Hu et al.,2017;Shen
et al.,2017). However, owing to the nature of their
training procedure they cannot take advantage of
Large Language Models (LLM) like GPT-2 and
hence cannot start from a decoder state that has
the capability to generate fluent sentences. Also,
they use corpus level representations to disentan-
gle style from content. A common theme in all of
Figure 1: The complete setup of the models and the three phases of our training procedure is shown. The blue
blocks represent the training parameters in each phase. The weights of the orange blocks are not updated. In
phase 1 we fine tune the encoder decoder model on the ROC stories corpus. In Phase 2 we only tune the adapter
weights(AW) using the style corpus with the LM objective. In Phase 3 we fine the adapter weights learned in phase
2 to complete stories using ROC stories corpus.
the above approaches is that they do not have style
specific parameters in the model and use the same
set of parameters to encode both the style as well
as the content.
Generating interesting story endings has also
been looked at in the past (Gupta et al.,2019a;
Chen et al.,2019;Guan et al.,2018). Most of these
approaches try to either bring in commonsense rea-
soning (Chen et al.,2019) to make for better story
endings or they try to make these endings diverse
across the corpus. Other works have looked at us-
ing a discriminator trained on a parallel corpus to
generate endings with a particular valance but the
training of those systems is not stable owing to the
argmax function one uses while decoding (Peng
et al.,2018).
3 Datasets
3.1 ROC Stories Corpus
We used the ROC stories corpus (Mostafazadeh
et al.,2016) for the task of story ending generation.
Each story in this dataset comprises of
5
sentences.
We used the first
4
sentences as context of the story
or the input to the model and the
5th
sentence as
the ending of the story which we want to predict.
This led to a total of
90,000
samples in the train
set and 4081 samples in validation and test sets.
3.2 PERSONALITY-CAPTIONS Dataset
This dataset (Shuster et al.,2019) contains
241,848
captions to images in 215 different styles. This
dataset is a 3 tuple of <Image, Persona, Caption>.
For the purpose of this study, we ignored the image
and only considered a corpus of text conditioned on
a style so that we can pre-train Large LMs (LLM)
on that corpus. We grouped together different per-
sonas into a style to increase the size of this corpus.
We had a total of
4,431
captions that we put to-
gether for "Negative style". The "Negative style"
corpus consists of the following finer grained styles:
"Arrogant", "Boyish", "Irritable", "Gloomy", "Fa-
talistic (Bleak, Gloomy)"
4 Models
In this section we describe the LLR procedure and
compare a model trained using this procedure to
the SOTA baselines. Through this procedure the
model learns two different language tasks (story
generation and stylized language generation) and
then re-parameterizes only few of the model param-
eters to adapt to the final end task. Figure 1shows
the model and the way it was trained.
4.1 Training Encoder-Decoder
As part of the first learning phase, we trained an en-
coder decoder model that learns to predict the end-
ing of the story based on the context(
Storycontext
)
provided to it. We chose BERT (Devlin et al.,
2018) encoder and GPT-2 (Radford et al.,2019)
decoder to train the model. We provide the con-
text of the story to the BERT model to obtain en-
coder embeddings(
Encemb
). Then provide these
embeddings to the GPT-2 model that is trained with
teacher forcing using the story ending. The training
procedures is summarized in Equations 1, 2 and 3.
Encemb =BERT Encoder(Storycontext)(1)
PV
t=GP T _2(Encemb, S1...t)(2)
lossdecoder =1
Tdecoder
Tdecoder
X
t=1
logP (Wtrue
t)(3)
4.2 Training Adapters to understand style
Adapters are task specific layers that we can add
to the base transformer models in order to perform
摘要:

UnsupervisedNeuralStylisticTextGenerationusingTransferlearningandAdaptersVinayshekharBannihattiKumar,RashmiGangadharaiah,DanRothAWSAILabs{vinayshk,rgangad,drot}@amazon.comAbstractResearchhasshownthatpersonalityisakeydrivertoimproveengagementanduserex-perienceinconversationalsystems(SmestadandVolden,...

展开>> 收起<<
Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters Vinayshekhar Bannihatti Kumar Rashmi Gangadharaiah Dan Roth.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:991.66KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注