PCAE A Framework of Plug-in Conditional Auto-Encoder for Controllable Text Generation

2025-05-02 0 0 2.06MB 14 页 10玖币
侵权投诉
PCAE: A Framework of Plug-in Conditional Auto-Encoder for
Controllable Text Generation
Haoqin Tua, Zhongliang Yanga, Jinshuai Yanga, Siyu Zhangaand Yongfeng Huanga
aDepartment of Electronic Engineering, Tsinghua University, Beijing, 100084, China
ARTICLE INFO
Keywords:
controllable text generation, plug-and-
play, model-agnostic, transformers
ABSTRACT
Controllable text generation has taken a gigantic step forward these days. Yet existing methods are
either constrained in a one-off pattern or not efficient enough for receiving multiple conditions at
every generation stage. We propose a model-agnostic framework Plug-in Conditional Auto-Encoder
for Controllable Text Generation (PCAE) towards flexible and semi-supervised text generation. Our
framework is “plug-and-play” with partial parameters to be fine-tuned in the pre-trained model (less
than a half). Crucial to the success of PCAE is the proposed broadcasting label fusion network for
navigating the global latent code to a specified local and confined space. Visualization of the local
latent prior well confirms the primary devotion in hidden space of the proposed model. Moreover,
extensive experiments across five related generation tasks (from 2 conditions up to 10 conditions)
on both RNN-based and pre-trained BART [26] based auto-encoders reveal the high capability of
PCAE, which enables generation that is highly manipulable, syntactically diverse and time-saving
with minimum labeled samples. We will release our code in https://github.com/ImKeTT/pcae.
1. Introduction
Obtainingsystemstoautomaticallyproducerealistic-looking
texts has been a goal pursued since the early stage of artifi-
cialintelligence[37]. Inreallifescenarios,toapproachmore
human-like contexts, the generated sentences should be tai-
lored to their specific audience [13]. As a result, control-
labletextgeneration(CTG)hasdrawngreatattentionnowa-
days [10,20,4]. Controllable text generation aims at gen-
erating coherent and grammatically correct texts whose at-
tributes can be controlled [7], and/or abide by user-defined
rules which reflect the particular interests of system users
[13].
Withthesuccessfuldeploymentofdeepneuralnetworks,
recent proposed methods have brought us closer to this ob-
jective by producing texts with specified attributes. A gen-
eral idea is to embed given conditions into an end-to-end
training scheme [24,20,28] in order to produce sentences
that fulfill given conditions, which has been illustrated in
Figure 1. Nevertheless, there are two main defects of these
models that limit the application of these methods in real-
ity. Firstly, these methods cannot deal well with real-world
cases where conditions are not distributed at one time, i.e.,
new conditions for new using circumstance. In this sce-
nario, models like SVAE [24], OPTIMUS [28] need to ac-
tivate all model parameters to be trained for these new con-
ditions, which are time-wasting, thus are not the ideal re-
deployments for practical use [19]. Secondly, these mod-
els are mostly restricted to custom and well-designed lan-
guagemodels,whichmeansitisinconvenient toapplythem
directly to other more advanced language models for bet-
ter modeling results. To address these problems for more
practical application, another line for CTG follows the Pre-
train and Plug-in (PnP) paradigm [4] has raised a lot re-
yangzl15@tsinghua.org.cn ( Zhongliang Yang)
ORCID(s):
Business
Health
Tech.
Encoder
!
zbu sin ess
!
zhealth
!
ztech.
Decoder
1
0
0
1
0
0
!
zg
sentence1
sentence2
sentence3
Figure 1: A running example of the CTG task using auto-
encoders. For controllable generation, we only need to input
control signals (i.e., one-hot class label) and a global latent
vector 𝒛𝒈sampled from standard Gaussian. Then the model
produces texts that fulfill given conditions by creating specified
local latent spaces.
search focus in recent years. By freezing the base language
model (LM) and modifying few or no plug-in parameters,
this paradigmismoreflexibleandpowerfulfor controllable
generation since it is parameter-efficient and can be applied
toanyadvancedLM.Despiteitssuccess,therearetwomain
defects regard to existing PnP works: one is that they are
not convenient for creating texts with numerous categories
at one time. Take currently the best-performed PnP lan-
guage model PPVAE [6] as an example, when conditions
comeinatsomepoint,itdemandstotrainadditionalplug-
in AEs to produce controlled texts. This drawback makes
the whole system verbose and time-wasting when it meets
a great amount of conditions. Another issue is that these
PnP methods with only hidden mapping functions to be up-
dated during plug-in process may be incapable of reaching
a high degree of control. Since auto-encoders have shown
favourablelearningabilityoftextintegralpropertiesthatare
beneficialforcontrollablegeneration[8],weextendfromex-
istingtextAE-based[2]PnPframeworks,andisolatethetex-
H. Tu, Z. Yang, J. Yang et al.: Preprint submitted to Elsevier Page 1 of 14
arXiv:2210.03496v1 [cs.CL] 7 Oct 2022
Knowledge Based Systems
tual syntax module from the input condition representation
module by building the BaseAE and PluginAE separately.
Formally,theBaseAEcanbeanykindoftextauto-encoder,
which takes the main responsibility to formulize the basic
sentence generation guidance as a standard LM. To bene-
fit PluginAE in its high-dimensional hidden space, BaseAE
is also expected to expatiate a robust and continuous latent
manifold. AsforPluginAE,itisamodel-agnosticlightweight
insertedcomponentforBaseAE.OurPluginAEarchitecture
is designed with efficient broadcasting label infuser Broad-
cast Net, incorporating label prior to BaseAE’s latent space
and enabling the plug-in model to learn all the conditions
with one single training procedure. In purpose to achieve
a higher level of control for our model, we choose to acti-
vatethedecoderoriginatedfromtheBaseAEduringplug-in
training. Our contributions can be listed as follow:
1. We explored a novel model-agnostic controllable text
generation method PCAE. It is based on PnP frame-
work and can be easily adopted to any kind of ad-
vancedauto-encodersfortextcontrollablegeneration.
2. We devised the Broadcast Net for efficient fusion be-
tweenconditions(labels)andlatentspace,sothemodel
can generate controllable texts with very few labeled
samples and time.
3. To explain the advantageous improvement of PCAE,
weevaluatedour model onfive different relatedtasks
with conditions ranging from 2 to 10. We further uti-
lizedbothRNN-basedandpre-trainedBART[26]based
auto-encoder to verify the effectiveness of proposed
framework.
Inspiring results demonstrate that our model is both time-
saving(reduceupto35%)andhighlycontrollable(near90%
accuracywith100 labelsforeachclassinthebestcase)com-
paredwithbothcompetentRNN-basedandBART-basedbase-
line language models.
2. Related Work
2.1. Text Auto-Encoders with Latent Variables
Latent variable models (LVM) have drawn massive at-
tentionintextgenerationfield[2,53,47,6]. Thelatentspace
geometry of LVMs can conduct multiple view of knowl-
edge in a given corpus (i.e., style, topic, and high-level lin-
guistic or semantic features). There are two famous cate-
gories for text modeling with auto-encoders (AE), namely
variational auto-encoders (VAE) [2] and adversarial auto-
encoders(AAE)[36]. Theycommonlyemploytheevidence
lower bound(ELBO) maximizationofdata(𝑿)toupdate
theholisticmodel. Amajordistinctbetweenthesetwomod-
els lies in the regularization term of their ELBOs. While
VAEtakesaKullback-Leibler(KL)penaltyasitslatentreg-
ulator, AAE introduces a discriminator to judge latent dif-
ferences as illustrated below,
log (𝑿)𝔼(𝒛𝑿)[log (𝑿𝒛)]

reconstruction term
𝔻KL((𝒛𝑿)(𝒛))

KL penalty
ELBO of VAE
𝔼()[− log (𝒛)] ELBO of AAE
+𝔼(𝑿)[− log(1 ((𝑿)))]

Discriminator penalty
,
(1)
where function ()and ()for the ELBO of AAE de-
note its discriminator and encoder respectively. The VAE
as a general tool is widely used in continuous generation
(e.g.,imagegeneration). However,whenitcomestothedis-
cretedomain(i.e.,textgeneration),VAEisfacingnumerous
plights, such as latent vacancy dilemma [54], latent vanish-
ing problem [2], etc. The main reason is that VAE often
neglectslatentinformationprovidedbytheencoder. In con-
trast to VAEs, AAEs maintain a strong coupling between
their encoder and decoder, ensuring that the decoder does
not ignore representations in the latent space, which makes
it robust for latent knowledge interpretation and interpola-
tion [49,36]. However, Li et al. [28] proved that a strong
encoder such as pre-trained BERT in a VAE is very help-
ful to remit such issue. As a result, we employed AAE loss
for RNN-based PCAE and VAE loss for pre-trained BART-
based PCAE to show our framework is model-agnostic and
effective under any auto-encoder.
2.2. Auto-Encoders with Pre-trained Language
Models
Large pre-trained language models (PLMs) are gaining
more and more popularity these days. With enormous re-
sourcesbeingdevoted,experiencedencoders&decoderssuch
as BERT [5], GPT-2 [41] and T5 [42] are devised to fully
understandtextualcontentsandcreatehuman-likesentences
respectively. Incorporating these mighty PLMs as encoder
and decoder of a variational auto-encoder can largely miti-
gatetheKLcollapseproblembyofferingthedecoderanon-
negligiblelatentspacefrom its encoder[28]. Severalworks
toincoporatethesePLMsintolatentauto-encodershavebeen
explored nowadays [32,28,9,38,48], which have shown
promising potential in a varied multitude of tasks including
unsupervised latent interpolation [28,38], controllable text
generation [28] and prompt story generation [9], etc.
2.3. Controllable Text Generation
Thecoreidea ofcontrollabletextgenerationisto gener-
atetextual contents with designatedconditionsto cope with
specified circumstances and auditors. Formally, we follow
the problem setting in previous works [20,6] to define the
task: Given a set of conditions 𝑳= {1, 2, ..., }(e.g.,
specific topics, sentiment labels), conditional text data 𝒀=
{𝒀1,𝒀2, ..., 𝒀}and unlabeled corpus 𝑿, where each text
corpus 𝒀corresponds to its label . With condition label
H. Tu, Z. Yang, J. Yang et al.: Preprint submitted to Elsevier Page 2 of 14
Knowledge Based Systems
as input, we aim at learning a language model ()to
calculate the distribution over the text samples 𝒀. Thus,
when the condition is specified, the model could generate
realistic text samples that fulfill the given condition. And
in practice, we usually leverage a trained text classifier to
distinguish texts with different concepts (see Sec. 4.4.1 for
controllability analysis).
Tosupportgeneratingsentencesthatfulfillsuchrequest,
recent researches are mainly divided into threefold accord-
ing to their training paradigm: supervised, self-supervised
and semi-supervised. For fully supervised methods, adver-
sarial components like specified discriminators are widely
employed [3,51]. In spite of their high controllability, they
require abundant labeled data and enormous computational
resources, which is unpractical for real world applications.
Self-supervised methods commonlyexplorethehiddenem-
beddingsofLMs [51,47] andapply themselvestocatch the
underlying control rules during training, yet they normally
provide sequences with a low degree of control.
The third party is semi-supervised, which requires only
limited labeled data for controllable generation. SVAE [24]
asthefirstsemi-supervisedVAEmodel,wasinitiallyapplied
to visual domain. Duan etal. [6] explored its modeling for-
mulation into language domain, which treats the label em-
beddingasanextendedpartofthelatentvariablewhenthere
are label-text pairs available. Li et al. [28] proposed OP-
TIMUS with BERT and GPT-2 as encoder and decoder re-
spectively. They conducted controllable text generation via
alatentspaceadversarialnetworkusingatwo-stagetraining,
which only requires labeled data at the second stage.
ApartfromSVAEandOPTIMUS,oneimportantbranch
named“Pre-trainandPlug-in(alsoknownasplug-and-play)
is rising recently. Since labeled samples are generally re-
quired only at “Plug-in” stage in PnP models, their train-
ing fashion is categorized as semi-supervised. Keskar et al.
[22]usedhuman-defined “controlcodetopre-trained LMs
in order to generate controllable texts, but needs full-scale
fine-tuning. To reduce training time, [4] firstly proposed
theconceptofplug-and-playforconditionaltextgeneration,
which generates controlled sentences by pulling the gradi-
ents of LMs along the desired path using extra components
with few parameters. However, it was proposed based on
large pre-trainedlanguagemodelsandstillrequireshoursto
be trained. What followed was the PPVAE [6], which can
beinsertedtoanypre-trainedAEtocreateconditionaltexts.
Nevertheless, it does not equip label infuser to incorporate
condition knowledge explicitly into generation, thus has to
train plug-in VAEs when new conditions come in. To
focus on a fine-grained generation, Mai et al. [35] further
extended the paradigm of PnP to text style transfer, which
treats target texts as labels and employs a novel “offset” net
as well as the latent adversarial loss for generation. Other
lines of PnP controllable generation either targets at chang-
ing the prompts/prefix to be fed into the base LMs during
training procedure [50,30], or shifting output probabilities
fromtrainedLMsatinferencetime[25,39]. Thesemethods
are mostly based on large pre-trained models and generally
Variable Description
𝑿Input unlabeled text corpus
𝒀Input labeled text corpus
The -th word from a data point in 𝒀
𝑳Task label set
The -th label from the label set
𝒀𝒊Labeled text corpus with label
𝒁𝒈Global latent space
𝒛𝒈Global latent vector from 𝒁𝒈
𝒁𝒍Local latent space
𝒛𝒍Local latent vector from 𝒁𝒍
Label embedding network
𝒆𝒍𝒊Label embedding of label
()The -th latent transformation network
𝒛𝒍(𝒕)The local latent vector after ()
𝒉𝒊The -th hidden state of the decoder
()The encoder of models
()The latent discriminator of AAE models
(,)The kernel function
()The prior distribution
()The posterior distribution
Table 1
The main variable denotations in our method.
takehoursto be fullytamed(sometimestheir training times
are even longer than fine-tuning) [25,30,15].
3. PCAE Methodology
WepresentthemainvariabledenotationsinTable1. The
key idea of our framework is to reduce the resource con-
sumption of training a language model with high control-
lability. The PnP framework with one full model training
andplug-incontrollablecomponentsisanefficientandflex-
ible for this demand. Thus our model is separated into two
disconnected sections: BaseAE and PluginAE, which cor-
responds to pre-training and plug-in training stage respec-
tively. The model’s workflow is in Figure 2: the first figure
representsthemodelstructureof BaseAE,whilethesecond
figureisthestructureof PluginAE.Asforthethirdfigure,it
istheprocessforcontrollabletextgeneration,whichrequires
components from both BaseAE and PluginAE.
Forpre-trainingstage,weuseunlabeledtextualdata𝑿to
traintheBaseAE languagemodel(trainfromthescratchfor
RNN-basedmodel andfine-tuningforBART-basedmodel).
Forplug-intraining,weinputtext-labelpair{𝒀,𝑳}={𝒀, },
where𝒀isthetrainingcorpusfrom𝒀withlabel. Weuse
thelabeleddatapairsforconditionaltraininginorder toob-
tain the controllable decoder of PluginAE, which takes the
latentvariableandlabelconditiontogeneratecontrollable
texts. Thus, once we trained the PluginAE, we only need
toinputthesampledglobal latent vectorfromits prior𝒛𝒈
(0, )andacontrollabel(one-hotlabel)tothemodelfor
controlled generation. This training process makes PCAE
only access to labels at the second stage, which makes it
semi-supervised.
H. Tu, Z. Yang, J. Yang et al.: Preprint submitted to Elsevier Page 3 of 14
摘要:

PCAE:AFrameworkofPlug-inConditionalAuto-EncoderforControllableTextGenerationHaoqinTua,ZhongliangYanga,JinshuaiYanga,SiyuZhangaandYongfengHuangaaDepartmentofElectronicEngineering,TsinghuaUniversity,Beijing,100084,ChinaARTICLEINFOKeywords:controllabletextgeneration,plug-and-play,model-agnostic,transfo...

展开>> 收起<<
PCAE A Framework of Plug-in Conditional Auto-Encoder for Controllable Text Generation.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:14 页 大小:2.06MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注