Multibody molecular docking on a quantum annealer Mohit Pandey1Tristan Zaborniak1 2Hans Melo1Alexey Galda1and Vikram Khipple Mulligan3 1Menten AI Inc. San Francisco CA 94111 USA

2025-05-02 0 0 8.62MB 20 页 10玖币
侵权投诉
Multibody molecular docking on a quantum annealer
Mohit Pandey,1, Tristan Zaborniak,1, 2, Hans Melo,1Alexey Galda,1and Vikram Khipple Mulligan3,
1Menten AI, Inc., San Francisco, CA 94111, USA
2Department of Computer Science, University of Victoria, Victoria, BC V8W 2Y2, Canada
3Center for Computational Biology, Flatiron Institute, 162 Fifth Avenue, New York, NY 10010, USA
Molecular docking, which aims to find the most stable interacting configuration of a set of
molecules, is of critical importance to drug discovery. Although a considerable number of clas-
sical algorithms have been developed to carry out molecular docking, most focus on the limiting
case of docking two molecules. Since the number of possible configurations of Nmolecules is ex-
ponential in N, those exceptions which permit docking of more than two molecules scale poorly,
requiring exponential resources to find high-quality solutions. Here, we introduce a one-hot encoded
quadratic unconstrained binary optimization formulation (QUBO) of the multibody molecular dock-
ing problem, which is suitable for solution by quantum annealer. Our approach involves a classical
pre-computation of pairwise interactions, which scales only quadratically in the number of bodies
while permitting well-vetted scoring functions like the Rosetta REF2015 energy function to be used.
In a second step, we use the quantum annealer to sample low-energy docked configurations efficiently,
considering all possible docked configurations simultaneously through quantum superposition. We
show that we are able to minimize the time needed to find diverse low-energy docked configurations
by tuning the strength of the penalty used to enforce the one-hot encoding, demonstrating a 3-4
fold improvement in solution quality and diversity over performance achieved with conventional
penalty strengths. By mapping the configurational search to a form compatible with current- and
future-generation quantum annealers, this work provides an alternative means of solving multibody
docking problems that may prove to have performance advantages for large problems, potentially
circumventing the exponential scaling of classical approaches and permitting a much more efficient
solution to a problem central to drug discovery and validation pipelines.
I. INTRODUCTION
The development of new drugs typically requires
the screening of hundreds of thousands or millions of
molecules spanning a large chemical space. In the last
few decades, computer-aided drug design methods have
attempted to accelerate the discovery of lead drug can-
didates [1,2]. More recently, quantum computing has
emerged as a promising tool that can complement classi-
cal computational methods in the drug discovery process
[35]. Quantum computers take advantage of the unique
properties of quantum mechanical systems, such as su-
perposition and entanglement, to carry out computations
in a manner inaccessible to classical computers. This
permits new and possibly better-scaling means of solving
optimization problems [69], performing quantum chem-
istry simulations [10,11], or producing machine learning
models [12,13]. There is therefore considerable interest
in developing quantum equivalents of the poorest-scaling
classical methods in computational drug discovery.
In the present noisy intermediate-scale quantum era
[14], it is believed that optimization might be one of the
earliest applications to benefit from quantum resources
[15]. For optimization problems relevant to drug discov-
ery involving exponentially large search spaces, classi-
cal simulated annealing [16,17] is one of the most-used
These authors contributed equally to this work and are listed in
alphabetical order.
To whom correspondence should be addressed: vmulli-
gan@flatironinstitute.org
approaches, and has shown great success in finding ei-
ther optimal or near-optimal solutions. Though simu-
lated annealing is a heuristic that surrenders guarantees
of reaching the global optimum in all but the limit of an
infinitely long annealing schedule [18], it has empirically
shown fast convergence to optimal or near-optimal solu-
tions in a wide variety of cases [19,20]. Until the recent
advent of machine learning based methods [21], simu-
lated annealing-based searches of protein conformational
space represented the state of the art for protein structure
prediction [22,23], and even now, simulated annealing-
based exploration of chemical space remains the state
of the art for amino acid sequence design, used in pro-
grams like the Rosetta software suite to create druglike
peptides and proteins [2427]. However, while sampling
the rugged energy landscapes of problems such as protein
folding [28,29] and spin-glass models [30], simulated an-
nealing can get stuck in local minima separated by high
barriers.
Quantum annealing, a hardware-instantiated meta-
heuristic leveraging the quantum adiabatic theorem to
find the global minimum of a given objective function,
is a potentially powerful alternative to addressing such
problems. Quantum annealing algorithms are expected
to tunnel through “high but narrow” barriers in energy
landscapes, but their potential advantages remain un-
der debate [3133]. Though there have been efforts to
study optimization on both quantum annealer and gate-
based quantum hardware, there has not yet been clear
demonstration of significantly-improved scaling for sam-
pling the lowest-energy state on currently available hard-
ware as compared to classical approaches. However, for
arXiv:2210.11401v1 [q-bio.BM] 20 Oct 2022
2
drug discovery efforts, we are interested in finding a large
set of distinct favourably-scoring solutions rather than a
single most favourable solution, for two reasons. First,
optimization requires the use of a model scoring function,
often based on experimental data, that can only approx-
imate physical energies (e.g., Rosetta’s REF2015 scor-
ing function [3437], OSPREY’s energy functions [38],
or molecular dynamics force fields such as CHARMM
[39,40] or AMBER [41,42]). This means that the lowest-
energy state under such a model might not correspond
exactly to the true lowest-energy state, making the set of
slightly higher-scoring states of interest. Second, compu-
tational predictions in drug discovery are best taken as
hypotheses to be tested rather than as reliable facts, and
given that in vitro and in vivo experiments inevitably
have some false negative rate themselves, it is useful to
have a ranked pool of hypotheses (e.g. designed amino
acid sequences likely to perform a particular function, or
predicted docked configurations likely to be maximally
stable) to carry forward for experiments rather than a
single prediction, in order to maximize chances of find-
ing a hit.
Although the importance of both sequence diversity
and molecular structure diversity for drug discovery has
been long understood [4346], only recently has the di-
versity of sampled solutions from quantum algorithms
been studied. King et al., for example, examined solver
performance with respect to diversity. These authors de-
fined diversity in terms of time-to-all-valleys, a metric
that measures a solver’s ability to sample at least once
from all energy landscape wells containing a ground state
[47]. This metric is specific to problems with many de-
generate (i.e. isoenergetic) ground states. On the other
hand, Mohseni et al. and Zucca et al. studied diversity in
terms of number of near-optimal solutions that have mu-
tual Hamming distance larger than a certain threshold
[48,49]. These studies each showed that the D-Wave
quantum annealing processor can outperform state-of-
the-art classical solvers on certain instances of synthetic
problem classes in sampling diverse, low-energy solutions.
The present study offers two main contributions. First,
we map the multibody molecular docking problem to
a quadratic unconstrained binary optimization (QUBO)
formulation, using a one-hot solution encoding, for eval-
uation by a quantum annealer. The multibody docking
problem occurs repeatedly in drug discovery, both when
predicting structures of oligomeric macromolecular tar-
gets, and when validating designed drugs by predicting
their binding mode to their target (often with accompa-
nying water molecules) [5052]. Due to the exponential
scaling of the solution space with the number of bodies,
a more efficient means of solving this problem would rep-
resent a major advance for computational drug design
efforts. Second, we provide insight into the ideal tuning
of hyperparameters to permit the quantum annealer to
sample optimal and diverse near-optimal solutions more
efficiently. Specifically, we show that tuning the one-hot
penalty strength to allow intermixing between meaning-
ful (physical) and non-meaningful (non-physical) basis
states can lead to increased sampling of low-energy and
structurally diverse solutions with the D-Wave Advan-
tage quantum annealer.
A. Mapping multibody docking to a quantum
annealer
Here we consider the multibody molecular docking
problem, both as a real-world problem of considerable bi-
ological interest, and as a case study for studying struc-
tural diversity of solutions of a real-world problem as
sampled on a quantum annealer (the D-Wave Advantage
system). This problem involves searching for the optimal
spatial configurations (or “poses”) of N > 2molecules in
a solution space that scales exponentially with N. Of-
ten, this problem is considered at the level of two bod-
ies, as in implicit-solvent protein-ligand docking [53,54].
However, many biological complexes consist of more than
two bodies: for example, hemoglobin is comprised of four
independent protein subunits which closely associate in
complex [55]. Indeed, a large fraction of proteins have
quaternary structure, and nearly all exist at least tran-
siently in complex with other molecules [56]. Even in the
case of protein-ligand docking, water molecules can play
a key role in protein-ligand recognition, so that this is
natively a multibody docking problem [51,57].
Recent classical computational methods [58] such as
HADDOCK [59], ATTRACT [60,61], EROS-DOCK [62]
and MLSD (Multiple Ligands Simultaneous Docking)
[63] attempt to address the importance of the multibody
docking problem, but their performance scales poorly
when a large number of bodies are involved. For exam-
ple, HADDOCK can only carry out simultaneous multi-
body docking for up to six bodies [59]. To deal with the
intensive computational resources required for simultane-
ous multibody docking in the large Nlimit, HADDOCK
instead docks < N bodies in iterative fashion until all
Nbodies are docked. However, this approach fails to
consider the full N-body configuration space, and so is
prone to missing globally-optimal configurations achiev-
able only through simultaneous docking. Quantum paral-
lelism, in contrast, allows simultaneous consideration of
all possible configurations, offering potential advantage
over classical methods.
In this work, we map the multibody molecular dock-
ing problem to a QUBO formulation, allowing sampling
using the D-Wave quantum annealer. The quality of so-
lutions produced by any multibody docking method of
course depends not only on the sampling method but
also the scoring function used. Our approach works for
any energy or scoring function that can be expressed as
a sum of one-body internal energies and two-body in-
teraction energies. For demonstration purposes, we use
the Rosetta REF2015 energy function [36], the excel-
lent accuracy of which allows us to test our predicted
docked configurations against crystallographic data. We
3
pre-compute one- and two-body energies classically, an
approach that scales favorably (quadratically) with the
number of bodies, N, and reserve the quantum annealer
for the search phase, which requires time that is ex-
ponential in the number of bodies Nwhen performed
classically. This can be contrasted with the approach
of Casares et al. for predicting protein structure on
a quantum computer, in which energies for all possi-
ble configurations were classically pre-computed, losing
any possible advantage gained during the search phase
[9]. Our approach is also distinct from previous studies
of molecular docking using quantum computers, which
predominantly focused on two-body molecular docking.
Mato et al. mapped molecular unfolding, which is one
of the steps of two-body molecular docking, to a quan-
tum annealer [8]. Banchi et al. mapped two-body
molecular docking to finding the maximum weighted
clique in a graph, which can be sampled on a Gaus-
sian Boson Sampler [6]. Their mapping requires heuris-
tically identifying pharmacophore points for a ligand-
receptor pair (a coarsely-grained representation of the
molecular geometry), while our approach considers all
the atoms of the bodies involved in multibody docking.
Importantly, where approaches based on pharmacophore
distance graphs inevitably have degenerate symmetry-
related ground states, our approach respects the chirality
of biomolecules and their complexes.
B. Increasing structural diversity of sampled
solutions on the D-Wave quantum annealer
As a proof of concept for our method, we studied
the diversity of low-energy states of a three helix bun-
dle rigid docking problem. We used an all-atom rep-
resentation of each of the three helices, pre-computing
internal (one-body) energies and helix-helix interaction
(two-body) energies with a custom-written Rosetta ap-
plication. Due to the limited number of qubits and inter-
qubit connectivity available on D-Wave Advantage 4.1,
we considered only translation in the configuration space,
but our approach can be easily extended to include ro-
tational transformations and internal degrees of freedom
for each molecule. While sampling the QUBO formu-
lation on D-Wave, we discretized the translation space,
fixed one helix, and used a “one-hot” encoding to repre-
sent translational gridpoint indices for each mobile helix
relative to the fixed helix. Though there have been ef-
forts to find alternative encodings [64,65], one-hot en-
coding remains popular when sampling discrete integer
(higher-than-binary) quadratic models on the D-Wave
system [6669]. Usually, the one-hot penalty strength
is chosen to be larger than the energy scale of the prob-
lem QUBO to ensure that low-energy states correspond
to physically-meaningful states (i.e., bitstrings with one
1per register, assigning exactly one translational grid-
point for each body) while high-energy states correspond
to non-physical states (i.e. bitstrings assigning no grid-
point or multiple gridpoints to the same body). How-
ever, we find that choosing the penalty strength to be
large reduces both solution quality and low-energy so-
lution diversity sampled on the D-Wave annealer given
the one-hot multibody molecular docking problems that
we test. Instead, we find that choosing moderate values
of the penalty strength, which promotes intermixing of
physical and non-physical states, ensures both improved
solution quality and the largest diversity of physically
meaningful, low-energy solutions when sampling our one-
hot encoded problems on the D-Wave system.
C. New findings
In sum: (1) We contribute a one-hot encoded QUBO
formulation of the multibody molecular docking prob-
lem executable by current quantum annealing hardware,
requiring a classical pre-computation that scales poly-
nomially in the number of bodies and simulation space
size. (2) Sampling our QUBO formulation of multibody
molecular docking on the D-Wave Advantage 4.1 quan-
tum annealer, we find that the lowest-energy predicted
configuration, which by exhaustive enumeration corre-
sponds to the ground state allowed by our model, has a
root mean square deviation (RMSD) of 2.4 Åfrom the
native crystallographic structure of a three helix bun-
dle. (3) We find that choosing moderate values of the
penalty strength parameter of our model improves the
speed of sampling diverse low-energy solutions, including
the ground state, by a factor of 3-4 times over the penalty
strength of common practice.
II. METHODS
We organize our methods as follows. In section II A, we
present a QUBO formulation to the multibody molecu-
lar docking problem; in section II B, we discuss our choice
of energy function; in section II C, we outline the mea-
sures by which we assess the performance of our quan-
tum annealing and simulated annealing approaches; and
in section II D, we present the specific multibody docking
problem that we investigate. Sections II E and II F out-
line our approach to scanning over the hyperparameters
of our quantum annealing protocol and a QUBO simu-
lated annealing protocol to allow their comparison, and
the experiments we execute to judge their performance.
A. QUBO formulation
Quantum annealing employs the adiabatic theorem of
quantum mechanics in an attempt to find the global min-
imum of a given objective function. Specifically, the the-
orem states that adiabatically changing the Hamiltonian
of a quantum mechanical system in the ground state will
4
not perturb the system from its ground state if there per-
sists an energy gap that isolates the ground state from
excited states [70]. Finding the global minimum of an
objective function using this method then amounts to
evolving the initial Hamiltonian and its ground state to
a final Hamiltonian which encodes the objective func-
tion, and reading out the final ground state. In practice,
however, finite annealing time, integrated control errors,
readout error, and noise (thermal and quantum) force
sampling of excited states as well.
The D-Wave quantum annealer used herein offers a
programmable objective function in QUBO form, where
the ground state of the initial Hamiltonian is an equal
superposition over all possible solutions to the QUBO.
Allowing h(bi)to be the one-qubit bias coeffient of the
i-th qubit and J(bi, bj)to be the two-qubit coupling co-
efficient of qubits iand j, the QUBO function is then:
HQUBO =X
i
h(bi)bi+X
i>j
J(bi, bj)bibj,(1)
where biis a binary variable, i.e.,bi∈ {0,1}. The full
set of h(bi)and J(bi, bj)coefficients therefore define the
problem to be solved, and the values that the associated
qubits take on upon measurement represent a solution.
We seek to encode the multibody docking problem ac-
cording to this formulation.
To start, we discretize the continuous, real space in
which multibody molecular docking natively takes place
to Tpoints. For our purposes, given the spatial resource
limitations to our selected quantum annealing hardware
(D-Wave Advantage 4.1), we will ignore internal and ro-
tational degrees of freedom, though the approach is easily
extended to include these degrees of freedom provided
that sufficient qubits are available. Fixing one body
to avoid configurational degeneracy due to translational
symmetries, there exist TN1possible unique configura-
tions of Nbodies.
We enforce a one-hot encoding scheme, so that each of
the N1mobile bodies is associated with one of N1
registers of qubits of length T, each qubit correspond-
ing to a specific point in our discretized space represent-
ing a particular translational offset of the given mobile
body from the fixed body. We therefore require (N1)T
qubits in total. Meaningful physical states correspond to
bit-strings which contain exactly one 1per register, as
this places each body in exactly one location, while non-
meaningful non-physical states correspond to bit-strings
other than those of physical states. In this QUBO for-
mulation of the multibody molecular docking problem,
the number of non-physical states Nnon-physical is expo-
nentially more than number of physical states Nphysical,
i.e.,Nnon-physical =O(2(N1)T)Nphysical, where T1.
One might consider it to be a qubit-wasteful strategy in
having such a large number of non-physical states in the
spectrum, but we later show its presence can allow us to
tune the sampled diversity of low-energy physical states
through intermixing between non-physical and physical
states. We leave the comparison of our approach’s per-
formance with that of more qubit-efficient approaches for
future work.
We designate body Nof the set of bodies X=
{1,2, ..., N}to be the fixed body, so that the set of mobile
bodies is then X0={1,2, ..., N 1}. Given our one-hot
formulation, the pairwise interaction between any xX0
and body Nmust be encoded in the h(bi)qubit bias
terms of Equation 1. Pairwise interactions between any
two bodies in X0are encoded as J(bi, bj)qubit coupling
terms. These interactions between bodies are determined
by pairwise free energy functions; we discuss these in sec-
tion II B. (Note that in the present formulation, the one-
body molecular energies are invariant with arrangement,
and contribute only a constant offset to the energy of
any given configuration. Since they do not alter the rel-
ative energy or location of minima, they can be dropped
from the problem, or used only when reporting solution
energies.)
Whereas previously we identified qubits by a general
index as bi, we now track the body to which the variable
applies explicitly in superscript and its position in sub-
script. Thus, for a mobile body x, its location at the i-th
position within the space of Tpoints is represented as bx
i,
where i1,2, ..., T and x1,2, ..., N 1. Now, to pe-
nalize the occurrence of non-physical solutions we apply
a set of equality constraints to enforce that exactly one
bx
ishould be selected per mobile body xX0. We be-
gin with the constraint that Pibx
i= 1 per mobile body.
To penalize the situation where each bx
iis returned as 0,
we then subtract 1and square both sides, so that our
constraint on body xis:
X
i>j
2bx
ibx
jX
i
bx
i+ 1 = 0.(2)
Adding these constraints to the objective function
while ignoring constant offset and multiplying each by a
tunable one-hot encoding enforcement penalty strength
parameter, γ, yields the problem QUBO:
HMBD =X
xX0
T
X
i=1 h(bx
i)γbx
i
+X
xyX
i>j
(J(bx
i, by
j) + δ)bx
iby
j.
(3)
In the above, we define δto be 2γwhen x=yand 0
otherwise. With N1mobile bodies, the number of
linear terms of the QUBO is (N1)T, and the number
of quadratic terms is (N1)T
2, which scales as O(N2T2).
B. Energy function
We must supply a set of h(bx
i)and J(bx
i, by
j)terms
to Equation 3, where, as noted previously, these terms
摘要:

MultibodymoleculardockingonaquantumannealerMohitPandey,1,TristanZaborniak,1,2,HansMelo,1AlexeyGalda,1andVikramKhippleMulligan3,y1MentenAI,Inc.,SanFrancisco,CA94111,USA2DepartmentofComputerScience,UniversityofVictoria,Victoria,BCV8W2Y2,Canada3CenterforComputationalBiology,FlatironInstitute,162Fifth...

展开>> 收起<<
Multibody molecular docking on a quantum annealer Mohit Pandey1Tristan Zaborniak1 2Hans Melo1Alexey Galda1and Vikram Khipple Mulligan3 1Menten AI Inc. San Francisco CA 94111 USA.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:8.62MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注