Multi-objective Deep Data Generation with Correlated Property Control Shiyu Wang1 Xiaojie Guo2 Xuanyang Lin1 Bo Pan1 Yuanqi Du3 Yinkai Wang4 Yanfang

2025-05-02 0 0 1.56MB 21 页 10玖币
侵权投诉
Multi-objective Deep Data Generation with
Correlated Property Control
Shiyu Wang1, Xiaojie Guo2, Xuanyang Lin1, Bo Pan1, Yuanqi Du3, Yinkai Wang4, Yanfang
Ye5, Ashley Ann Petersen6, Austin Leitgeb6, Saleh AlKhalifa7, Kevin Minbiole6, William
Wuest1, Amarda Shehu8, Liang Zhao1,
1Emory University, {shiyu.wang, mike.lin, bo.pan, william.wuest, liang.zhao}@emory.edu
2IBM Thomas.J. Watson Research Center, xguo7@gmu.edu
3Cornell University, yd392@cornell.edu
4Tufts University, yinkai.wang@tufts.edu
5University of Notre Dame, yye7@nd.edu
6Villanova University, {apeter24, austin.leitgeb, kevin.minbiole}@villanova.edu
7Recursiv LLC, salehesam@gmail.com
8George Mason University, ashehu@gmu.edu
Abstract
Developing deep generative models has been an emerging field due to the ability to
model and generate complex data for various purposes, such as image synthesis
and molecular design. However, the advancement of deep generative models is
limited by challenges to generate objects that possess multiple desired properties:
1) the existence of complex correlation among real-world properties is common but
hard to identify; 2) controlling individual property enforces an implicit partially
control of its correlated properties, which is difficult to model; 3) controlling multi-
ple properties under various manners simultaneously is hard and under-explored.
We address these challenges by proposing a novel deep generative framework,
CorrVAE, that recovers semantics and the correlation of properties through disen-
tangled latent vectors. The correlation is handled via an explainable mask pooling
layer, and properties are precisely retained by generated objects via the mutual
dependence between latent vectors and properties. Our generative model preserves
properties of interest while handling correlation and conflicts of properties under a
multi-objective optimization framework. The experiments demonstrate our model’s
superior performance in generating data with desired properties. The code of
CorrVAE is available at https://github.com/shi-yu-wang/CorrVAE.
1 Introduction
Developing powerful deep generative models has been an emerging field due to its capability to
model and generate high-dimensional complex data for various purposes, such as image synthesis
[
4
,
30
], molecular design [
24
,
47
,
9
], protein design [
14
,
16
], co-authorship network analysis [
6
]
and natural language generation [
22
,
32
]. Extensive efforts have been spent on learning underly-
ing low-dimensional representation and the generation process of high-dimensional data through
deep generative models such as variational autoencoders (VAE) [
27
,
35
,
9
], generative adversarial
networks (GANs) [
11
,
12
], normalizing flows [
40
,
5
], etc [
48
,
17
,
8
]. Particularly, enhancing the
disentanglement and independence of latent dimensions has been attracting the attention of the
community [
4
,
43
,
3
,
34
,
45
,
23
], enabling controllable generation that generates data with desired
properties by interpolating latent variables [
44
,
13
,
29
,
25
,
38
,
20
,
7
,
49
]. For instance, CSVAE
Corresponding author.
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.01796v3 [cs.LG] 17 Oct 2022
transfers image attributes by correlating latent variables with desired properties [
28
]. Semi-VAE pairs
latent space with properties by minimizing the mean-square-error (MSE) between latent variables
and desired properties [
31
]. Property-controllable VAE (PCVAE) synthesizes image objects with
desired positions and scales [
13
] by enforcing the mutual dependence between disentangled latent
variables and properties. Conditional Transformer Language (CTRL) model generates text with
task-specific style and contents [
26
]. Despite of the rapid growth of research regarding property
mature
mature
Day: 3
Size: 5cm
Color: green
Day: 32
Size: 10cm
Color: yellow
Day: 61
Size: 13cm
Color: red Solubility: > 0.5
Toxicity: as low as possible
Potential: 3V
(a) (b)
Figure 1: (a) Correlated properties are common for the real-world object, such as the day (growth
time), size and color of wild pepper; (b) Generation of molecules that satisfy desired properties can
be viewed as a multi-objective optimization task
controllable generation in various domains, critical challenges still remain such as: 1)
Difficulty
in identifying property correlation
. Existing models for data property control typically map each
property to its exclusive latent variables. Also, all the latent variables are inherently enforced to
be independent of each other. Therefore such complete distentanglement among the consideration
of properties disallows the model to characterize the correlation among properties and hence can
only work for controlling properties that are independent of each other. However, properties in
real data objects are usually correlated (Figure 1 (a)). For example, in a human face image, the
face width has a correlation with eye size. The color of the wild pepper has a correlation with the
growth time (Figure 1 (a)). The correlation of properties has been under-explored, which largely
impairs the effectiveness of generative models; 2)
Controlling individual property also enforces
an implicit partial control of its correlated properties, which is difficult to model.
For those
correlated properties, controlling one of them will also constrain the others into some subspace such
as a hyperplane or even a non-convex set. For example, when generating face images, if we constrain
the width of the face is 100 pixels then the size of the eyes will be constrained to a reasonable range;
3)
Difficulty in simultaneously controlling multiple properties under various manners
. The
real-world application usually requires generated object to satisfy multiple constraints of properties
simultaneously. One may want to maximize a property’s value, fix another property to a certain value,
and constrain the third property within a range. Therefore, the data generation problem is entangled
with and hardened by the multi-objective optimization goal, which has not been well explored. For
example, chemists may design a molecule that has specific potential, minimizes its toxicity and
meanwhile possesses solubility within a range (Figure 1 (b)). We overcome these challenges by
proposing a novel deep generative model, CorrVAE, that recovers semantics and the correlation of
properties via disentangled latent vectors. The correlation is handled by an explainable mask pooling
layer, and properties are precisely retained by the generated data via the mutual dependence between
latent vectors and properties. Our generative model preserves multiple properties of interest while
handling correlation and conflicts of properties under a multi-objective optimization framework. The
contributions of this paper are summarized as follows:
A novel deep generative model for multi-objective control of correlated properties.
Beyond
disentangled representation learning, we aim at corresponding latent variables to target properties
for better interpretability and controllability. The model is generic to different types of data such
as images and graphs, together with disentangled terms to obtain independent latent variables to
jointly handle correlated properties.
A correlated invertible mapping is proposed for mapping correlated real properties to latent
independent variables.
An interpretable mask pooling layer has been proposed to explicitly iden-
tify how the real-world properties are generated by the corresponding subsets of latent independent
variables. The information of these latent variables will be aggregated and enforced to be mutually
dependent on the property via an invertible constraint.
A multi-objective optimization framework is formulated for deep data generation problem
.
Corresponding latent variables in the low-dimensional representation are optimized under multiple
objectives and constraints for property control purposes. Our framework is generic to various
multiple objectives such as optimizing a property value, constraining property values into a range,
maximizing or minimizing a property value while maintaining the correlation among properties.
2
Extensive experiments are conducted on real-world datasets.
The proposed model can generate
data with multiple desired properties simultaneously in the generation process, demonstrating the
effectiveness of the model. Moreover, our model shows superior accuracy of generated properties
against target properties compared with comparison models with multiple real-world datasets.
This paper firstly introduces the general framework of the proposed model. Then we will discuss
the details of the model, including the derivation of the overall objective, the mask pooling layer,
and invertible mapping between latent space and correlated properties. Lastly, we conduct the
comprehensive experiment to compare our model with existing methods.
2 Related works
2.1 Disentangled representation learning
Disentangled representation learning aims to encode information of high-dimensional complex
data into a low-dimensional space that consists of mutually independent variables to separate out
independent factors of variation of data distribution in the representation [
43
,
1
,
3
,
34
,
21
,
19
,
10
].
Because of the success of VAE and GANs as deep generative models [
35
,
18
,
36
,
15
], a few techniques
have been developed as variations of VAE or GANs to achieve disentanglement of latent variables
in the representation space. For instance,
β
-VAE modifies the variational evidence lower bound
(ELBO) by adding a hyperparameter
β
before the KL-divergence term to encourage disentangled
latent variables [
21
]. Instead, cycle-consistent VAE was proposed for supervised disentanglement
of specified and unspecified factors of variation using pairwise similarity labels [
23
]. On the other
hand, InfoGAN maximizes the mutual information between the latent variable and the generated
sample under the framework of GANs. Disentangled Representation learning-Generative Adversarial
Network (DR-GAN) disentangles the representation with the pose of the face on the image through
the pose code provided to the generator and pose estimation by the discriminator [43].
2.2 Property controllable deep generative models
Considerable efforts have been spent on developing deep generative models that generate data with
desired properties [
13
,
47
,
20
,
26
,
25
,
42
,
37
]. Techniques for property controllable generation include
but are not limited to 1) reinforcement learning (RL) approach to goal-directed data generation that
preserves target properties, such as Graph Convolutional Policy Network (GCPN) [
47
] and GraphAF
[
41
], and 2) mutual dependence between properties and latent variables to control generation process
by manipulating values of latent variables, such as PCVAE [
13
] and Conditional Subspace VAE
(CSVAE) [
28
]. RL approach nevertheless suffers from the requirement of a large sample size for the
training purpose. Moreover, all the above methods are unable to precisely capture complex correlation
among properties in an explainable way, nor can they generate data that simultaneously satisfy
multiple correlated targets of either values or ranges. To fill the gap between the existing methods
and the need for controllable generation from various domains, we propose a novel controllable deep
generative model that handles the correlation of properties via an explainable mask pooling layer and
generates data with desired properties under a multi-objective optimization framework.
3 Problem formulation
Suppose we have a dataset
D
, in which each sample can be represented as
x
, along with
y=
{y1, y2, ..., ym}
as
m
properties of
x
, which can be either correlated or independent with each other.
For instance, if the data is a molecule, properties can be molecular weight, polarity or solubility. We
further assume that
(x, y)
is generated via some random processes from continuous latent variables
in (w, z), where wcontrols the properties of interest in yand zcontrols all other aspects of x.
We aim to learn a generative model that generates
(x, y)
conditioning on
(w, z)
, where
z
is disen-
tangled with
w
and variables in
w
are disentangled with each other to control either correlated or
independent properties. Once the model is trained, the user can generate data with target values
or ranges of properties via editing the corresponding elements in
w
. For example, we may want
to generate a molecule with a specific value of the weight, and solubility within a range while
minimizing its toxicity by changing values of
w
that contribute to those properties. This goal leads
to the following questions answered by our work: how to automatically identify the correlation
3
among properties, how to control individual property while enforcing an implicit partial control of its
correlated properties, and how to control multiple properties simultaneously?
4 Proposed approach
Object
encoder
Object
decoder
Property
encoder
mask
MLP

 

 

 

 
Mutual dependence between and 
Weight layer
Weight layer
relu
relu
  
Figure 2: Overal framework of CorrVAE. CorrVAE encodes the information of correlated properties
into the latent space
w
and other information of the object into
z
via the property and the object
encoder, respectively. Then the correlation among properties is captured by the mask pooling layer,
where the information to predict a specific property is aggregated into the bridging latent variable
w0
. The mutual dependence between
w0
and the corresponding property is enforced by the invertible
constraint achieved by ResNet. Lastly, the data can be generated from (w, z)via the object decoder.
In general, the proposed approach, CorrVAE, identifies the property correlation via a novel mask
pooling layer. It precisely retains properties via the constraint of mutual dependence between latent
vectors and properties, and simultaneously controls multiple properties under a multi-objective
optimization framework. The overall framework of the model is shown in Figure 2. Specifically,
the model contains two phases: (1)
Learning phase
encodes the information of properties via the
property encoder and other information of the data via the object encoder. As shown in Figure 2,
the correlation information of properties is captured by a novel mask pooling layer. The mutual
dependence between latent variables and properties is enforced via a constraint applied to the learning
objective. The data is generated via the object decoder in Figure 2; (2)
Generation phase
generates
data with desired properties of specific values or within target ranges under the multi-objective
optimization framework. In this section, we will introduce two phases in detail.
4.1 Learning phase
4.1.1 Overall objective for disentangled learning on latent variables
The goal requires us to not only model the dependence between
x
and
(w, z)
for latent representation
learning and data generation, but also model the dependence between
y
and
w
for controlling the
property. We propose to achieve this by maximizing the joint log likelihood
p(x, y)
via its variational
lower bound. Given an approximate posterior
q(z, w|x, y)
, we can use the Jensen’s equality to obtain
the variational lower bound of log p(x, y)as:
log p(x, y) = log Eq(z,w|x,y)[p(x, y, w, z)/q(z, w|x, y)] Eq(z,w|x,y)[log p(x, y, w, z)/q(z, w|x, y)].(1)
The joint likelihood
log p(x, y, w, z)
is further decomposed as
log p(x, y|z, w) + log p(z, w)
given
Two assumptions: (1)
x
and
y
are conditionally independent given
w
since
w
only captures informa-
tion from
y
; (2)
z
is independent from
w
and
y
, equivalent to
yz|w
. This gives us
xy|(w, z)
,
suggesting that
log p(x, y|w, z) = log p(x|w, z) + log p(y|w, z) = log p(x|w, z) + log p(y|w)
. Con-
sequently, we write the joint log-likelihood and maximize its lower bound:
log pθ,γ (x, y, w, z) = log pθ(x|w, z) + log p(w, z) + log pγ(y|w)
= log pθ(x|w, z) + log p(w, z) +
m
X
i=1
log pγ(yi|w0
i),(2)
where we define a next-level latent variable
w0
i
as the set of values in
w
that are independent with
each other and contribute to the
i
-th property to bridge the mapping
wy
while allowing property
controlling. Each value
w0
i
in
w0={w0
1, w0
2, ..., w0
m}
relates to each
yi
in
y
. Since properties are
4
independent conditioning on
w0
, the decomposition of the third term holds in Eq. 2. The relationship
between
y
and
w0
will be further explained in Section 4.1.2. Given
qφ(w, z|x, y) = qφ(w, z|x) =
qφ(w|x)·qφ(z|x)
, we rewrite the joint probability in Eq.
(2)
as the form of the Bayesian variational
inference as the first term of the learning objective:
L1=Eqφ(w,z|x)[log pθ(x|w, z)] Eqφ(w|x)[log pγ(y|w)] + DKL(qφ(w, z|x)||p(w, z)).(3)
Meanwhile, since the objective function in Eq.
(3)
does not contribute to our assumption that
z
is
independent from
w
and
y
, and values in
w
are independent with each other, we decompose the
KL-divergence in Eq. (3) and penalize the term:
L2=ρ1·DKL(q(z, w)||q(z)q(w)) + ρ2·DKL(q(w)|| Y
i
q(wi)),(4)
where
ρ1
and
ρ2
are co-efficient hyper-parameters to penalize the two terms. Details of the proof and
derivation regarding the overall objective can be refereed in Appendix A.
4.1.2 Relating the properties and latent variables
To model the dependence between the correlated properties and the associated latent variables
p(y|w)
in Eq
(3)
as well as to capture the correlation among properties, we propose to directly learn the
specific relationship between disentangled latent variables in
w
and properties
y
. The correlations
among
y
are also captured. Specifically, we design a mask pooling layer achieved by a mask matrix
M∈ {0,1}l×m
, where
l
is the dimension of the latent vector
w
.
M
captures the way how
w
relates
to
y
, where
Mi,j = 1
denotes that
wi
relates to the
j
-th property
yj
, otherwise there is no relation. In
this way, two properties that relate to the same variable in
w
can be regarded as correlated. The binary
elements in
M
are trained with the Gumbel Softmax function. In implementation, the
L1
norm of the
mask matrix is also added to the objective to encourage the sparsity of M.
Next, given the learned mask matrix
M
, we model the mapping from
w
to
y
. For properties
y
, we
can calculate the corresponding
w0
that aggregates the values in
w
that contribute to each property as
w·JTM
, each column of which corresponds to the related latent variables in
w
to be aggregated
to predict the corresponding
y
. For each property
yj
in
y
, we aggregate all the information from its
related latent variable set in winto the next-level latent variable w0
j(i.e., the j-th variable of w0) via
an aggregation function h:
w0=h(w·JTM;β),(5)
where
J
is a vector with all values as one,
represents the element-wise multiplication and
β
is the
parameter of h. Then the property ycan be predicted using w0as:
y=f(w0;γ),(6)
where
f
is the set of prediction functions with
w0=h(w·JTM;β)
as the input and
γ
are the
parameter which will be further explained in the next section. Thus, we have built a one-to-one
mapping between
w0
and
y
. In addition, the correlation of
yi
and
yj
can be recovered if
MT
·i·M·j6= 0
.
4.1.3 Invertible constraint for multiple-property control
As stated in the problem formulation, our proposed model aims to generate a data point
x
that retains
the original property value requirement for the given properties. The most straightforward way to
do this is to model both the mutual dependence between each
yi
and its relevant latent variable set
w0
i
. However, this can incur double errors in this two-way mapping, since there exists a complex
correlation among properties in
y
and there are many cases that
MT
·i·M·j6= 0
. To address it, we
propose an invertible function that mathematically ensures the exact recovery of bridging variables
w0given a group of desired properties ybased on the following deduction.
As in Eq.
(6)
, the set of correlated properties
y={y1, y2, ..., ym}
are correlated with the set of latent
variables
w0={w0
1, w0
2, ..., w0
m}
in a one-to-one mapping fashion. Thus we assume that
y
can be
sampled from a multivariate Gaussian given w0as follows:
p(y|w0) = N(y|f(w0;γ),Σ); y= (y1, y2, ..., ym), w0={w0
1, w0
2, ..., w0
m},ΣRm×m
s.t., f(w0;γ)[j] = ¯
f(w0;γ)[j] + w0
j, Lip(¯
f(w0;γ)[j]) <1if ||Wk||2<1, j = 1, ..., m, (7)
where
Lip
denotes to the
Lipschitz constant
. Namely, to precisely control the properties
y
,
we learn a set of invertible functions
f(w0;γ)
indicated in Eq 6 to model
pγ(y|w0)
.
γ
is the set of
parameters in Eq 6. The constraint enforces
f(w0;γ)[j]
to be an invertible function to achieve mutual
dependence between yjand w0
j[2]. As a result, we have the third term of the objective function:
L3=Ew0p(w0)[N(y|f(w0;γ),Σ)] + ||Lip(¯
f(w0;γ)[j]) 1||2(8)
5
摘要:

Multi-objectiveDeepDataGenerationwithCorrelatedPropertyControlShiyuWang1,XiaojieGuo2,XuanyangLin1,BoPan1,YuanqiDu3,YinkaiWang4,YanfangYe5,AshleyAnnPetersen6,AustinLeitgeb6,SalehAlKhalifa7,KevinMinbiole6,WilliamWuest1,AmardaShehu8,LiangZhao1,y1EmoryUniversity,{shiyu.wang,mike.lin,bo.pan,william.wuest...

展开>> 收起<<
Multi-objective Deep Data Generation with Correlated Property Control Shiyu Wang1 Xiaojie Guo2 Xuanyang Lin1 Bo Pan1 Yuanqi Du3 Yinkai Wang4 Yanfang.pdf

共21页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:21 页 大小:1.56MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 21
客服
关注