Modular Flows Differential Molecular Generation Yogesh Verma Samuel Kaski Markus Heinonen Aalto University

2025-05-06 0 0 2.47MB 20 页 10玖币
侵权投诉
Modular Flows: Differential Molecular Generation
Yogesh Verma, Samuel Kaski, Markus Heinonen
Aalto University
{yogesh.verma, samuel.kaski, markus.heinonen}@aalto.fi
Vikas Garg
YaiYai Ltd and Aalto University
vgarg@csail.mit.edu; vikas@yaiyai.fi
Abstract
Generating new molecules is fundamental to advancing critical applications such
as drug discovery and material synthesis. Flows can generate molecules effectively
by inverting the encoding process, however, existing flow models either require
artifactual dequantization or specific node/edge orderings, lack desiderata such
as permutation invariance, or induce discrepancy between the encoding and the
decoding steps that necessitates post hoc validity correction. We circumvent these
issues with novel continuous normalizing E(3)-equivariant flows, based on a system
of node ODEs coupled as a graph PDE, that repeatedly reconcile locally toward
globally aligned densities. Our models can be cast as message passing temporal
networks, and result in superlative performance on the tasks of density estimation
and molecular generation. In particular, our generated samples achieve state of the
art on both the standard QM9 and ZINC250K benchmarks.
1 Introduction
Figure 1: A toy illustration of
ModFlow
in action
with a two-node graph. The two local flows -
z1
and
z2
- co-evolve toward a more complex joint
density, both driven by the same differential f.
Generative models have rapidly become ubiquitous in
machine learning with advances from image synthesis
(Ramesh et al., 2022) to protein design (Ingraham et al.,
2019). Molecular generation (Stokes et al., 2020) has
also received significant attention owing to its promise
for discovering new drugs and materials. Searching for
valid molecules in prohibitively large discrete spaces is,
however, challenging: estimates for drug-like structures
range between
1023
and
1060
but only a tiny fraction -
on the order of
108
- has been synthesized (Polishchuk
et al., 2013; Merz et al., 2020). Thus, learning repre-
sentations that exploit appropriate molecular inductive
biases (e.g., spatial correlations) becomes crucial.
Earlier models focused on generating sequences based
on the SMILES notation (Weininger, 1988) used in
Chemistry to describe the molecular structures as
strings. However, they were supplanted by genera-
tive models that capture valuable spatial information
such as bond strengths and dihedral angles, e.g., by
embedding molecular graphs via some graph neural
network (GNNs) (Scarselli et al., 2009; Garg et al., 2020). Such models primarily include variants
of Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Normalizing
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.06032v2 [cs.LG] 13 Oct 2022
Flows (Dinh et al., 2014, 2016). Besides known issues with their training, GANs (Goodfellow et al.,
2014; Maziarka et al., 2020) suffer from the well-documented problem of mode collapse, thereby
generating molecules that lack diversity. VAEs (Kingma and Welling, 2013; Lim et al., 2018; Jin
et al., 2018), on the other hand, are susceptible to a distributional shift between the training data and
the generated samples. Moreover, optimizing for likelihood via a surrogate lower bound is likely
insufficient to capture the complex dependencies inherent in the molecules.
Flows are especially appealing since, in principle, they enable estimating (and sampling from)
complex data distributions using a sequence of invertible transformations on samples from a more
tractable continuous distribution. Molecules are discrete, so many flow models (Madhawa et al.,
2019; Honda et al., 2019; Shi et al., 2020) add noise during encoding and later apply a dequantization
procedure. However, dequantization begets distortion and issues related to convergence (Luo et al.,
2021). Moroever, many methods segregate the generation of atoms from bonds, so the decoded
structure is often not a valid molecule and requires post hoc correction to ensure validity (Zang
and Wang, 2020), effecting a discrepancy between the encoding and the decoded distributions.
Permutation dependence is another undesirable artifact of these methods. Some alternatives have
been explored to avoid dequantization, e.g., (Lippe and Gavves, 2021) encodes molecules in a
continuous latent space via variational inference and jointly optimizes a flow model for generation.
Discrete graph flows (Luo et al., 2021) also circumvent the many pitfalls of dequantization by
resorting to discrete latent variables, and performing validity checks during the generative process.
However, discrete flows follow an autoregressive procedure that requires a specific ordering of nodes
and edges during training. In general, one shot methods can generate much faster than discrete flows.
We offer a different flow-based perspective tailored to molecules. Specifically, we suggest coupled
continuous normalizing E(3)-equivariant flows that bestow generative capabilities from neural partial
differential equation (PDE) models on graphs. Graph PDEs have been known to enable designing
new embedding methods such as variants of GNNs (Chamberlain et al., 2021), extending GNNs
to continuous layers as Neural ODEs (Poli et al., 2019), and accommodating spatial information
(Iakovlev et al., 2020). We instead seek to bring to the fore their efficacy and elegance as tools to
help generate complex objects, such as molecules, viewed as outcomes resulting from an interplay of
co-adapting latent trajectories (i.e., underlying dynamics). Concretely, a flow is associated with each
node of the graph, and these flows are conjoined as a joint ODE system conditioned on neighboring
nodes. While these flows originate independently as samples from simple distributions, they adjust
progressively toward more complex joint distributions as they repeatedly interact with the neighboring
flows. We view molecules as samples generated from the globally aligned distributions obtained after
many such local feedback iterations. We call the proposed method Modular Flows (
ModFlow
s) to
underscore that each node can be regarded as a module that coordinates with other modules. Table 1
summarizes the capabilities of ModFlow compared to some previous generative works.
Contributions.
We propose to learn continuous-time, flow based generative models, grounded on
graph PDEs, for generating molecules without resorting to any validity correction. In particular,
we propose
ModFlow
, a novel generative model based on coupled continuous normalizing
E(3)-equivariant flows.
ModFlow
encapsulates essential inductive bias using PDEs, and
defines multiple flows that interact locally toward a globally consistent joint density;
Table 1: A comparison of generative modeling approaches for molecules.
Method One-shot Modular Invertible Continuous-time
JT-VAE 3 3 7 7 Jin et al. (2018)
MRNN 7 7 7 7 Popova et al. (2019)
GraphAF 7 7 37Shi et al. (2020)
GraphDF 7 7 37Luo et al. (2021)
MoFlow 3737Zang and Wang (2020)
GraphNVP 3737Madhawa et al. (2019)
ModFlow 3 3 3 3 this work
2
Figure 2: A demonstration of the modular flow generation. The initial Gaussian distributions
N(0, I)
evolve
into complex densities z(T)under fand are subsequently translated into probabilities and labels.
we encode permutation, translation, rotation, and reflection equivariance with E(3) equivari-
ant GNNs adapted to molecular generation, and can leverage 3D geometric information;
ModFlow
is end-to-end trainable, non-autoregressive, and obviates the need for any external
validity checks or correction;
empirically,
ModFlow
achieves state-of-the-art performance on both the standard QM9 (Ra-
makrishnan et al., 2014) and ZINC250K (Irwin et al., 2012) benchmarks.
2 Related works
Generative models.
Earlier attempts for molecule generation (Kusner et al., 2017; Dai et al., 2018)
aimed at representing molecules as SMILES strings (Weininger, 1988) and developed sequence
generation models. A challenge for these approaches is to learn complicated grammar rules that can
generate syntactically valid sequences of molecules. Recently, representing molecules as graphs has
inspired new deep generative models for molecular generation (Segler et al., 2018; Samanta et al.,
2018; Neil et al., 2018), ranging from VAEs (Jin et al., 2018; Kajino, 2019) to flows (Madhawa et al.,
2019; Luo et al., 2021; Shi et al., 2020). The core idea is to learn to encode molecular graphs into
a latent space, and subsequently decode samples from this latent space to generate new molecules
(Atwood and Towsley, 2016; Xhonneux et al., 2020; You et al., 2018).
Graph partial differential equations.
Graph PDEs is an emerging area that studies PDEs on
structured data encoded as graphs. For instance, one can define a PDE on graphs to track the evolution
of signals defined over the graph nodes under some dynamics. Graph PDEs have enabled, among
others, design of new graph neural networks; see, e.g., works such as GNODE (Poli et al., 2019),
NeuralPDE (Iakovlev et al., 2020), Neural operator (Li et al., 2020), GRAND (Chamberlain et al.,
2021), and PDE-GCN (Eliasof et al., 2021). Different from all these works, we focus on using PDEs
for generative modeling of molecules (as graph-structured objects). Interestingly,
ModFlow
proposed
in this work may be viewed as a new equivariant temporal graph network (Rossi et al., 2020; Souza
et al., 2022).
Validity oracles.
A key challenge of molecular generative models is to be able to generate valid
molecules, according to various criteria for molecular validity or feasibility. It is a common practice
to call on external chemical software as rejection oracles to reduce or exclude invalid molecules, or
do validity checks as part of autoregressive generation (Luo et al., 2021; Shi et al., 2020; Popova
et al., 2019). An important open question has been whether generative models can learn to achieve
high generative validity intrinsically, i.e., without being aided by oracles or resorting to additional
checks. ModFlow takes a major step forward toward that goal.
3
3 Modular Flows
We focus on unsupervised learning of an underlying graph density
p(G)
using a dataset of observed
molecular graphs
D={Gn}N
n=1
. We learn a generative flow model
pθ(G)
specified by flow
parameters θ, and use it to sample novel high-probability molecules.
3.1 Molecular Representation
Graph representation.
We represent each molecular graph
G= (V, E)
of size
M
as a tuple of
vertices
V= (v1, . . . , vM)
and edges
EV×V
. Each vertex takes a value from an alphabet
on atoms:
v∈ A ={C,H,N,O,P,S, . . .}
; while each edge
e∈ B ={1,2,3}
abstracts some bond
type (i.e., single, double, or triple). We assume that, conditioned on the edges, the graph likelihood
factorizes as a product of categorical distributions over vertices given their latent representations:
p(G) := p(V|E, {z}) =
M
Y
i=1
Cat(vi|σ(zi)) ,(1)
where
zi= (ziC, ziH, . . .)R|A|
is a set of atom scores for node
i
such that
zik R
pertains to type
k∈ A, and σis the softmax function
σ(zi)k=exp(zik)
Pk0exp(zik0),(2)
which turns the real-valued scores
zi
into normalized probabilities.
ModFlow
also supports 3D
molecular graphs that contain atomic coordinates and angles as additional information.
Tree representations.
We can obtain an alternative representation for molecules: we can decom-
pose each molecule into a tree-like structure, by contracting certain vertices into a single node
(denoted as a cluster) such that the molecular graph becomes acyclic. Following Jin et al. (2018),
we restrict these clusters to ring substructures present in the molecular data, in addition to the atom
alphabet. Thus, we obtain an extended alphabet
Atree =A∪{C1,C2, . . .}
, where each cluster label
Cr
corresponds to some ring substructure in the label vocabulary
χ
. We then reduce the vocabulary to
the 30 most commonly occurring substructures of Atree. For further details, see Appendix A.2.
3.2 Differential modular flows
Normalizing flows (Kobyzev et al., 2021) provide a general recipe for constructing flexible probability
distributions, used in density estimation (Cramer et al., 2021; Huang et al., 2018) and generative
modeling (Zhen et al., 2020; Zang and Wang, 2020). We propose to model the atom scores
zi(t)
as a
Continuous-time Normalizing Flow (CNF) (Grathwohl et al., 2018) over time
tR+
. We assume
the initial scores at time
t= 0
follow an uninformative Gaussian base distribution
zi(0) N (0, I)
for each node i. Node scores evolve in parallel over time according to the differential equation
˙
zi(t) := zi(t)
t =fθt, zi(t),zNi(t),xi,xNi, i ∈ {1, . . . , M},(3)
where
Ni={j: (i, j)E}
is the set of neighbors of node
i
and
zNi(t) = {zj(t) : j∈ Ni}
the
scores of the neighbors at time
t
;
xi
and
xNi
denote, respectively, the positional (2D/3D) information
of
i
and its neighbours; and
θ
denotes the parameters of the flow function
f
to be learned. Stacking
together all node differentials, we obtain a modular system of coupled ODEs:
˙
z(t) =
˙
z1(t)
.
.
.
˙
zM(t)
=
fθt, z1(t),zN1(t),xi,xNi
.
.
.
fθt, zM(t),zNM(t),xi,xNi
(4)
z(T) = z(0) + ZT
0
˙
z(t)dt . (5)
This coupled system of ODEs may be viewed as a graph PDE (Iakovlev et al., 2020; Chamberlain
et al., 2021), where the evolution of each node depends only on its neighbors.
4
摘要:

ModularFlows:DifferentialMolecularGenerationYogeshVerma,SamuelKaski,MarkusHeinonenAaltoUniversity{yogesh.verma,samuel.kaski,markus.heinonen}@aalto.fiVikasGargYaiYaiLtdandAaltoUniversityvgarg@csail.mit.edu;vikas@yaiyai.fiAbstractGeneratingnewmoleculesisfundamentaltoadvancingcriticalapplicationssuchas...

展开>> 收起<<
Modular Flows Differential Molecular Generation Yogesh Verma Samuel Kaski Markus Heinonen Aalto University.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:2.47MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注