
Context
Nicolas hated eating grapes a lot .
He had not eaten them since he was a kid .
One day , he went to a vineyard .
He saw so many grapes that he had to eat one .
Positive Style Ending
He was happy that he had finally tasted
a new kind of fruit.
Negative Style Ending
The next day, he was so sick he couldn’t
eat any food.
Table 1: Given a story’s context, we can either generate
a positive or a negative ending based on the reader’s
preference.
sive two step transfer process but they also fail to
handle finer linguistic phenomenon that capture
persona (Krishna et al.,2020a).
In this work we study the effects of unsupervised
style transfer using only a handful of samples from
the target style. We propose a three phase training
procedure to generate story endings with a required
style. We use the PERSONALITY-CAPTIONS
Dataset (Shuster et al.,2019) to generate our style
specific textual corpus. We learn one set of param-
eters that capture style semantics and another set
of parameters to capture content semantics within
the same model. Through extensive evaluation, we
show that our approach improves style of generated
story endings by more than
200%
over the base-
line while maintaining parity with SOTA models
on content relevance. The major contributions of
our work are as follows:
•
A three phase transfer learning procedure that
enables the model to learn style attributes from
style specific textual corpus and those attributes
for the final downstream task. We call this the
learn, learn and relearn (LLR) procedure.
•
We separate style parameters from content pa-
rameters enabling practitioners to plug and play
adapters of different styles while keeping the con-
tent parameters as is. We also show the working
of this approach on more nuanced styles.
•
We design evaluation metrics that show the effi-
cacy of our model against the SOTA baselines.
We also notice a similar results in our human
evaluations.
2 Related Work
Style transfer research has gained significant pop-
ularity due to their ability to make text more
user-focused and personalized. Such an ability
has impact on numerous applications (McDonald
and Pustejovsky,1985), such as, persona-based
generation (Huang et al.,2018;Niu and Bansal,
2018b), language modeling to imitate specific au-
thors (Syed et al.,2019), stylistic summarization
(Jin et al.,2020b).
There have been two paradigms in the area of
style transfer. The distinctions arise in the way
each paradigm treats style and content (Jin et al.,
2020a). The first paradigm treats non-functional
linguistic features (such as, formality) as style the
semantics as content. These approaches model
style transfer task as a paraphrase generation task
(Madnani and Dorr,2010;Androutsopoulos and
Malakasiotis,2009;Krishna et al.,2020b). The
second paradigm treats differences in parallel cor-
pora (such as, happy vs. sad, positive vs. negative)
as style and the invariance in the parallel corpora
as content (Mou and Vechtomova,2020).
Traditional style transfer methods were based on
token replacement and templates (Sripada et al.,
2004;Reiter et al.,2005;Gkatzia et al.,2017).
These approaches were difficult to scale as they
required hand-crafted domain-specific templates.
With recent advances in deep learning, most recent
approaches have proposed neural methods for style
transfer (Zhu et al.,2021a;Syed et al.,2019;Huang
et al.,2018;Niu and Bansal,2018b;Krishna et al.,
2020a;Tsai et al.,2021b).
Early neural methods relied on the availability
of parallel corpora, where sequence-to-sequence
models were applied to perform generation (Rao
and Tetreault,2018). More recently, style transfer
on non-parallel corpora has gained significant at-
tention (Krishna et al.,2020a;Zhu et al.,2021a;
Niu and Bansal,2018a;Reid and Zhong,2021).
Recent works on unsupervised style transfer has
shown that they can use monolingual data from an-
other domain to generate stylized responses. How-
ever, these methods tend to suffer from lack of
relevance as they only look to do interpolation be-
tween the two domains that they use. Another line
of work looks at disentangling the style variable
from the content variable (Hu et al.,2017;Shen
et al.,2017). However, owing to the nature of their
training procedure they cannot take advantage of
Large Language Models (LLM) like GPT-2 and
hence cannot start from a decoder state that has
the capability to generate fluent sentences. Also,
they use corpus level representations to disentan-
gle style from content. A common theme in all of