PCAE A Framework of Plug-in Conditional Auto-Encoder for Controllable Text Generation

2025-05-02 1 0 2.06MB 14 页 10玖币

侵权投诉

PCAE: A Framework of Plug-in Conditional Auto-Encoder for

Controllable Text Generation

Haoqin Tua, Zhongliang Yanga, Jinshuai Yanga, Siyu Zhangaand Yongfeng Huanga

aDepartment of Electronic Engineering, Tsinghua University, Beijing, 100084, China

ARTICLE INFO

Keywords:

controllable text generation, plug-and-

play, model-agnostic, transformers

ABSTRACT

Controllable text generation has taken a gigantic step forward these days. Yet existing methods are

either constrained in a one-oﬀ pattern or not eﬃcient enough for receiving multiple conditions at

every generation stage. We propose a model-agnostic framework Plug-in Conditional Auto-Encoder

for Controllable Text Generation (PCAE) towards ﬂexible and semi-supervised text generation. Our

framework is “plug-and-play” with partial parameters to be ﬁne-tuned in the pre-trained model (less

than a half). Crucial to the success of PCAE is the proposed broadcasting label fusion network for

navigating the global latent code to a speciﬁed local and conﬁned space. Visualization of the local

latent prior well conﬁrms the primary devotion in hidden space of the proposed model. Moreover,

extensive experiments across ﬁve related generation tasks (from 2 conditions up to 10 conditions)

on both RNN-based and pre-trained BART [26] based auto-encoders reveal the high capability of

PCAE, which enables generation that is highly manipulable, syntactically diverse and time-saving

with minimum labeled samples. We will release our code in https://github.com/ImKeTT/pcae.

1. Introduction

Obtainingsystemstoautomaticallyproducerealistic-looking

texts has been a goal pursued since the early stage of artiﬁ-

cialintelligence[37]. Inreallifescenarios,toapproachmore

human-like contexts, the generated sentences should be tai-

lored to their speciﬁc audience [13]. As a result, control-

labletextgeneration(CTG)hasdrawngreatattentionnowa-

days [10,20,4]. Controllable text generation aims at gen-

erating coherent and grammatically correct texts whose at-

tributes can be controlled [7], and/or abide by user-deﬁned

rules which reﬂect the particular interests of system users

[13].

Withthesuccessfuldeploymentofdeepneuralnetworks,

recent proposed methods have brought us closer to this ob-

jective by producing texts with speciﬁed attributes. A gen-

eral idea is to embed given conditions into an end-to-end

training scheme [24,20,28] in order to produce sentences

that fulﬁll given conditions, which has been illustrated in

Figure 1. Nevertheless, there are two main defects of these

models that limit the application of these methods in real-

ity. Firstly, these methods cannot deal well with real-world

cases where conditions are not distributed at one time, i.e.,

new conditions for new using circumstance. In this sce-

nario, models like SVAE [24], OPTIMUS [28] need to ac-

tivate all model parameters to be trained for these new con-

ditions, which are time-wasting, thus are not the ideal re-

deployments for practical use [19]. Secondly, these mod-

els are mostly restricted to custom and well-designed lan-

guagemodels,whichmeansitisinconvenient toapplythem

directly to other more advanced language models for bet-

ter modeling results. To address these problems for more

practical application, another line for CTG follows the Pre-

train and Plug-in (PnP) paradigm [4] has raised a lot re-

yangzl15@tsinghua.org.cn ( Zhongliang Yang)

ORCID(s):

Business

Health

Tech.

Encoder

zbu sin ess

zhealth

ztech.

Decoder

()

sentence1

sentence2

sentence3

•sentence1: tinder may not be worth $5b, but it’s way more valuable than you think.

•sentence2: 2780-calorie french toast: cheesecake factory tops the (calorie) charts.

•sentence3: vintage-look electric car that could replace horse-drawn.

Local latent spaces

Figure 1: A running example of the CTG task using auto-

encoders. For controllable generation, we only need to input

control signals (i.e., one-hot class label) and a global latent

vector 𝒛𝒈sampled from standard Gaussian. Then the model

produces texts that fulﬁll given conditions by creating speciﬁed

local latent spaces.

search focus in recent years. By freezing the base language

model (LM) and modifying few or no plug-in parameters,

this paradigmismoreﬂexibleandpowerfulfor controllable

generation since it is parameter-eﬃcient and can be applied

toanyadvancedLM.Despiteitssuccess,therearetwomain

defects regard to existing PnP works: one is that they are

not convenient for creating texts with numerous categories

at one time. Take currently the best-performed PnP lan-

guage model PPVAE [6] as an example, when conditions

comeinatsomepoint,itdemandstotrainadditionalplug-

in AEs to produce controlled texts. This drawback makes

the whole system verbose and time-wasting when it meets

a great amount of conditions. Another issue is that these

PnP methods with only hidden mapping functions to be up-

dated during plug-in process may be incapable of reaching

a high degree of control. Since auto-encoders have shown

favourablelearningabilityoftextintegralpropertiesthatare

beneﬁcialforcontrollablegeneration[8],weextendfromex-

istingtextAE-based[2]PnPframeworks,andisolatethetex-

H. Tu, Z. Yang, J. Yang et al.: Preprint submitted to Elsevier Page 1 of 14

arXiv:2210.03496v1 [cs.CL] 7 Oct 2022

Knowledge Based Systems

tual syntax module from the input condition representation

module by building the BaseAE and PluginAE separately.

Formally,theBaseAEcanbeanykindoftextauto-encoder,

which takes the main responsibility to formulize the basic

sentence generation guidance as a standard LM. To bene-

ﬁt PluginAE in its high-dimensional hidden space, BaseAE

is also expected to expatiate a robust and continuous latent

manifold. AsforPluginAE,itisamodel-agnosticlightweight

insertedcomponentforBaseAE.OurPluginAEarchitecture

is designed with eﬃcient broadcasting label infuser Broad-

cast Net, incorporating label prior to BaseAE’s latent space

and enabling the plug-in model to learn all the conditions

with one single training procedure. In purpose to achieve

a higher level of control for our model, we choose to acti-

vatethedecoderoriginatedfromtheBaseAEduringplug-in

training. Our contributions can be listed as follow:

1. We explored a novel model-agnostic controllable text

generation method PCAE. It is based on PnP frame-

work and can be easily adopted to any kind of ad-

vancedauto-encodersfortextcontrollablegeneration.

2. We devised the Broadcast Net for eﬃcient fusion be-

tweenconditions(labels)andlatentspace,sothemodel

can generate controllable texts with very few labeled

samples and time.

3. To explain the advantageous improvement of PCAE,

weevaluatedour model onﬁve diﬀerent relatedtasks

with conditions ranging from 2 to 10. We further uti-

lizedbothRNN-basedandpre-trainedBART[26]based

auto-encoder to verify the eﬀectiveness of proposed

framework.

Inspiring results demonstrate that our model is both time-

saving(reduceupto35%)andhighlycontrollable(near90%

accuracywith100 labelsforeachclassinthebestcase)com-

paredwithbothcompetentRNN-basedandBART-basedbase-

line language models.

2. Related Work

2.1. Text Auto-Encoders with Latent Variables

Latent variable models (LVM) have drawn massive at-

tentionintextgenerationﬁeld[2,53,47,6]. Thelatentspace

geometry of LVMs can conduct multiple view of knowl-

edge in a given corpus (i.e., style, topic, and high-level lin-

guistic or semantic features). There are two famous cate-

gories for text modeling with auto-encoders (AE), namely

variational auto-encoders (VAE) [2] and adversarial auto-

encoders(AAE)[36]. Theycommonlyemploytheevidence

lower bound(ELBO) maximizationofdata(𝑿)toupdate

theholisticmodel. Amajordistinctbetweenthesetwomod-

els lies in the regularization term of their ELBOs. While

VAEtakesaKullback-Leibler(KL)penaltyasitslatentreg-

ulator, AAE introduces a discriminator to judge latent dif-

ferences as illustrated below,

log (𝑿)≥𝔼(𝒛∣𝑿)[log (𝑿∣𝒛)]



reconstruction term

−

⎧

⎪

⎨

⎪

⎩

𝔻KL((𝒛∣𝑿)‖(𝒛))



KL penalty

ELBO of VAE

𝔼()[− log (𝒛)] ELBO of AAE

+𝔼(𝑿)[− log(1 − ((𝑿)))]



Discriminator penalty

(1)

where function (⋅)and (⋅)for the ELBO of AAE de-

note its discriminator and encoder respectively. The VAE

as a general tool is widely used in continuous generation

(e.g.,imagegeneration). However,whenitcomestothedis-

cretedomain(i.e.,textgeneration),VAEisfacingnumerous

plights, such as latent vacancy dilemma [54], latent vanish-

ing problem [2], etc. The main reason is that VAE often

neglectslatentinformationprovidedbytheencoder. In con-

trast to VAEs, AAEs maintain a strong coupling between

their encoder and decoder, ensuring that the decoder does

not ignore representations in the latent space, which makes

it robust for latent knowledge interpretation and interpola-

tion [49,36]. However, Li et al. [28] proved that a strong

encoder such as pre-trained BERT in a VAE is very help-

ful to remit such issue. As a result, we employed AAE loss

for RNN-based PCAE and VAE loss for pre-trained BART-

based PCAE to show our framework is model-agnostic and

eﬀective under any auto-encoder.

2.2. Auto-Encoders with Pre-trained Language

Models

Large pre-trained language models (PLMs) are gaining

more and more popularity these days. With enormous re-

sourcesbeingdevoted,experiencedencoders&decoderssuch

as BERT [5], GPT-2 [41] and T5 [42] are devised to fully

understandtextualcontentsandcreatehuman-likesentences

respectively. Incorporating these mighty PLMs as encoder

and decoder of a variational auto-encoder can largely miti-

gatetheKLcollapseproblembyoﬀeringthedecoderanon-

negligiblelatentspacefrom its encoder[28]. Severalworks

toincoporatethesePLMsintolatentauto-encodershavebeen

explored nowadays [32,28,9,38,48], which have shown

promising potential in a varied multitude of tasks including

unsupervised latent interpolation [28,38], controllable text

generation [28] and prompt story generation [9], etc.

2.3. Controllable Text Generation

Thecoreidea ofcontrollabletextgenerationisto gener-

atetextual contents with designatedconditionsto cope with

speciﬁed circumstances and auditors. Formally, we follow

the problem setting in previous works [20,6] to deﬁne the

task: Given a set of conditions 𝑳= {1, 2, ..., }(e.g.,

speciﬁc topics, sentiment labels), conditional text data 𝒀=

{𝒀1,𝒀2, ..., 𝒀}and unlabeled corpus 𝑿, where each text

corpus 𝒀corresponds to its label . With condition label

H. Tu, Z. Yang, J. Yang et al.: Preprint submitted to Elsevier Page 2 of 14

Knowledge Based Systems

as input, we aim at learning a language model ()to

calculate the distribution over the text samples 𝒀. Thus,

when the condition is speciﬁed, the model could generate

realistic text samples that fulﬁll the given condition. And

in practice, we usually leverage a trained text classiﬁer to

distinguish texts with diﬀerent concepts (see Sec. 4.4.1 for

controllability analysis).

Tosupportgeneratingsentencesthatfulﬁllsuchrequest,

recent researches are mainly divided into threefold accord-

ing to their training paradigm: supervised, self-supervised

and semi-supervised. For fully supervised methods, adver-

sarial components like speciﬁed discriminators are widely

employed [3,51]. In spite of their high controllability, they

require abundant labeled data and enormous computational

resources, which is unpractical for real world applications.

Self-supervised methods commonlyexplorethehiddenem-

beddingsofLMs [51,47] andapply themselvestocatch the

underlying control rules during training, yet they normally

provide sequences with a low degree of control.

The third party is semi-supervised, which requires only

limited labeled data for controllable generation. SVAE [24]

astheﬁrstsemi-supervisedVAEmodel,wasinitiallyapplied

to visual domain. Duan etal. [6] explored its modeling for-

mulation into language domain, which treats the label em-

beddingasanextendedpartofthelatentvariablewhenthere

are label-text pairs available. Li et al. [28] proposed OP-

TIMUS with BERT and GPT-2 as encoder and decoder re-

spectively. They conducted controllable text generation via

alatentspaceadversarialnetworkusingatwo-stagetraining,

which only requires labeled data at the second stage.

ApartfromSVAEandOPTIMUS,oneimportantbranch

named“Pre-trainandPlug-in”(alsoknownasplug-and-play)

is rising recently. Since labeled samples are generally re-

quired only at “Plug-in” stage in PnP models, their train-

ing fashion is categorized as semi-supervised. Keskar et al.

[22]usedhuman-deﬁned “controlcode”topre-trained LMs

in order to generate controllable texts, but needs full-scale

ﬁne-tuning. To reduce training time, [4] ﬁrstly proposed

theconceptofplug-and-playforconditionaltextgeneration,

which generates controlled sentences by pulling the gradi-

ents of LMs along the desired path using extra components

with few parameters. However, it was proposed based on

large pre-trainedlanguagemodelsandstillrequireshoursto

be trained. What followed was the PPVAE [6], which can

beinsertedtoanypre-trainedAEtocreateconditionaltexts.

Nevertheless, it does not equip label infuser to incorporate

condition knowledge explicitly into generation, thus has to

train plug-in VAEs when new conditions come in. To

focus on a ﬁne-grained generation, Mai et al. [35] further

extended the paradigm of PnP to text style transfer, which

treats target texts as labels and employs a novel “oﬀset” net

as well as the latent adversarial loss for generation. Other

lines of PnP controllable generation either targets at chang-

ing the prompts/preﬁx to be fed into the base LMs during

training procedure [50,30], or shifting output probabilities

fromtrainedLMsatinferencetime[25,39]. Thesemethods

are mostly based on large pre-trained models and generally

Variable Description

𝑿Input unlabeled text corpus

𝒀Input labeled text corpus

The -th word from a data point in 𝒀

𝑳Task label set

The -th label from the label set

𝒀𝒊Labeled text corpus with label 

𝒁𝒈Global latent space

𝒛𝒈Global latent vector from 𝒁𝒈

𝒁𝒍Local latent space

𝒛𝒍Local latent vector from 𝒁𝒍

Label embedding network

𝒆𝒍𝒊Label embedding of label 

()The -th latent transformation network

𝒛𝒍(𝒕)The local latent vector after ()

𝒉𝒊The -th hidden state of the decoder

(⋅)The encoder of models

(⋅)The latent discriminator of AAE models

(⋅,⋅)The kernel function

(⋅)The prior distribution

(⋅)The posterior distribution

Table 1

The main variable denotations in our method.

takehoursto be fullytamed(sometimestheir training times

are even longer than ﬁne-tuning) [25,30,15].

3. PCAE Methodology

WepresentthemainvariabledenotationsinTable1. The

key idea of our framework is to reduce the resource con-

sumption of training a language model with high control-

lability. The PnP framework with one full model training

andplug-incontrollablecomponentsisaneﬃcientandﬂex-

ible for this demand. Thus our model is separated into two

disconnected sections: BaseAE and PluginAE, which cor-

responds to pre-training and plug-in training stage respec-

tively. The model’s workﬂow is in Figure 2: the ﬁrst ﬁgure

representsthemodelstructureof BaseAE,whilethesecond

ﬁgureisthestructureof PluginAE.Asforthethirdﬁgure,it

istheprocessforcontrollabletextgeneration,whichrequires

components from both BaseAE and PluginAE.

Forpre-trainingstage,weuseunlabeledtextualdata𝑿to

traintheBaseAE languagemodel(trainfromthescratchfor

RNN-basedmodel andﬁne-tuningforBART-basedmodel).

Forplug-intraining,weinputtext-labelpair{𝒀,𝑳}={𝒀, },

where𝒀isthetrainingcorpusfrom𝒀withlabel. Weuse

thelabeleddatapairsforconditionaltraininginorder toob-

tain the controllable decoder of PluginAE, which takes the

latentvariableandlabelconditiontogeneratecontrollable

texts. Thus, once we trained the PluginAE, we only need

toinputthesampledglobal latent vectorfromits prior𝒛𝒈∼

(0, )andacontrollabel(one-hotlabel)tothemodelfor

controlled generation. This training process makes PCAE

only access to labels at the second stage, which makes it

semi-supervised.

H. Tu, Z. Yang, J. Yang et al.: Preprint submitted to Elsevier Page 3 of 14

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PCAE:AFrameworkofPlug-inConditionalAuto-EncoderforControllableTextGenerationHaoqinTua,ZhongliangYanga,JinshuaiYanga,SiyuZhangaandYongfengHuangaaDepartmentofElectronicEngineering,TsinghuaUniversity,Beijing,100084,ChinaARTICLEINFOKeywords:controllabletextgeneration,plug-and-play,model-agnostic,transfo...

展开>> 收起<<

PCAE A Framework of Plug-in Conditional Auto-Encoder for Controllable Text Generation.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

PCAE A Framework of Plug-in Conditional Auto-Encoder for Controllable Text Generation

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: