Multibody molecular docking on a quantum annealer Mohit Pandey1Tristan Zaborniak1 2Hans Melo1Alexey Galda1and Vikram Khipple Mulligan3 1Menten AI Inc. San Francisco CA 94111 USA

2025-05-02 0 0 8.62MB 20 页 10玖币

侵权投诉

Multibody molecular docking on a quantum annealer

Mohit Pandey,1, ∗Tristan Zaborniak,1, 2, ∗Hans Melo,1Alexey Galda,1and Vikram Khipple Mulligan3, †

1Menten AI, Inc., San Francisco, CA 94111, USA

2Department of Computer Science, University of Victoria, Victoria, BC V8W 2Y2, Canada

3Center for Computational Biology, Flatiron Institute, 162 Fifth Avenue, New York, NY 10010, USA

Molecular docking, which aims to ﬁnd the most stable interacting conﬁguration of a set of

molecules, is of critical importance to drug discovery. Although a considerable number of clas-

sical algorithms have been developed to carry out molecular docking, most focus on the limiting

case of docking two molecules. Since the number of possible conﬁgurations of Nmolecules is ex-

ponential in N, those exceptions which permit docking of more than two molecules scale poorly,

requiring exponential resources to ﬁnd high-quality solutions. Here, we introduce a one-hot encoded

quadratic unconstrained binary optimization formulation (QUBO) of the multibody molecular dock-

ing problem, which is suitable for solution by quantum annealer. Our approach involves a classical

pre-computation of pairwise interactions, which scales only quadratically in the number of bodies

while permitting well-vetted scoring functions like the Rosetta REF2015 energy function to be used.

In a second step, we use the quantum annealer to sample low-energy docked conﬁgurations eﬃciently,

considering all possible docked conﬁgurations simultaneously through quantum superposition. We

show that we are able to minimize the time needed to ﬁnd diverse low-energy docked conﬁgurations

by tuning the strength of the penalty used to enforce the one-hot encoding, demonstrating a 3-4

fold improvement in solution quality and diversity over performance achieved with conventional

penalty strengths. By mapping the conﬁgurational search to a form compatible with current- and

future-generation quantum annealers, this work provides an alternative means of solving multibody

docking problems that may prove to have performance advantages for large problems, potentially

circumventing the exponential scaling of classical approaches and permitting a much more eﬃcient

solution to a problem central to drug discovery and validation pipelines.

I. INTRODUCTION

The development of new drugs typically requires

the screening of hundreds of thousands or millions of

molecules spanning a large chemical space. In the last

few decades, computer-aided drug design methods have

attempted to accelerate the discovery of lead drug can-

didates [1,2]. More recently, quantum computing has

emerged as a promising tool that can complement classi-

cal computational methods in the drug discovery process

[3–5]. Quantum computers take advantage of the unique

properties of quantum mechanical systems, such as su-

perposition and entanglement, to carry out computations

in a manner inaccessible to classical computers. This

permits new and possibly better-scaling means of solving

optimization problems [6–9], performing quantum chem-

istry simulations [10,11], or producing machine learning

models [12,13]. There is therefore considerable interest

in developing quantum equivalents of the poorest-scaling

classical methods in computational drug discovery.

In the present noisy intermediate-scale quantum era

[14], it is believed that optimization might be one of the

earliest applications to beneﬁt from quantum resources

[15]. For optimization problems relevant to drug discov-

ery involving exponentially large search spaces, classi-

cal simulated annealing [16,17] is one of the most-used

∗These authors contributed equally to this work and are listed in

alphabetical order.

†To whom correspondence should be addressed: vmulli-

gan@ﬂatironinstitute.org

approaches, and has shown great success in ﬁnding ei-

ther optimal or near-optimal solutions. Though simu-

lated annealing is a heuristic that surrenders guarantees

of reaching the global optimum in all but the limit of an

inﬁnitely long annealing schedule [18], it has empirically

shown fast convergence to optimal or near-optimal solu-

tions in a wide variety of cases [19,20]. Until the recent

advent of machine learning based methods [21], simu-

lated annealing-based searches of protein conformational

space represented the state of the art for protein structure

prediction [22,23], and even now, simulated annealing-

based exploration of chemical space remains the state

of the art for amino acid sequence design, used in pro-

grams like the Rosetta software suite to create druglike

peptides and proteins [24–27]. However, while sampling

the rugged energy landscapes of problems such as protein

folding [28,29] and spin-glass models [30], simulated an-

nealing can get stuck in local minima separated by high

barriers.

Quantum annealing, a hardware-instantiated meta-

heuristic leveraging the quantum adiabatic theorem to

ﬁnd the global minimum of a given objective function,

is a potentially powerful alternative to addressing such

problems. Quantum annealing algorithms are expected

to tunnel through “high but narrow” barriers in energy

landscapes, but their potential advantages remain un-

der debate [31–33]. Though there have been eﬀorts to

study optimization on both quantum annealer and gate-

based quantum hardware, there has not yet been clear

demonstration of signiﬁcantly-improved scaling for sam-

pling the lowest-energy state on currently available hard-

ware as compared to classical approaches. However, for

arXiv:2210.11401v1 [q-bio.BM] 20 Oct 2022

drug discovery eﬀorts, we are interested in ﬁnding a large

set of distinct favourably-scoring solutions rather than a

single most favourable solution, for two reasons. First,

optimization requires the use of a model scoring function,

often based on experimental data, that can only approx-

imate physical energies (e.g., Rosetta’s REF2015 scor-

ing function [34–37], OSPREY’s energy functions [38],

or molecular dynamics force ﬁelds such as CHARMM

[39,40] or AMBER [41,42]). This means that the lowest-

energy state under such a model might not correspond

exactly to the true lowest-energy state, making the set of

slightly higher-scoring states of interest. Second, compu-

tational predictions in drug discovery are best taken as

hypotheses to be tested rather than as reliable facts, and

given that in vitro and in vivo experiments inevitably

have some false negative rate themselves, it is useful to

have a ranked pool of hypotheses (e.g. designed amino

acid sequences likely to perform a particular function, or

predicted docked conﬁgurations likely to be maximally

stable) to carry forward for experiments rather than a

single prediction, in order to maximize chances of ﬁnd-

ing a hit.

Although the importance of both sequence diversity

and molecular structure diversity for drug discovery has

been long understood [43–46], only recently has the di-

versity of sampled solutions from quantum algorithms

been studied. King et al., for example, examined solver

performance with respect to diversity. These authors de-

ﬁned diversity in terms of time-to-all-valleys, a metric

that measures a solver’s ability to sample at least once

from all energy landscape wells containing a ground state

[47]. This metric is speciﬁc to problems with many de-

generate (i.e. isoenergetic) ground states. On the other

hand, Mohseni et al. and Zucca et al. studied diversity in

terms of number of near-optimal solutions that have mu-

tual Hamming distance larger than a certain threshold

[48,49]. These studies each showed that the D-Wave

quantum annealing processor can outperform state-of-

the-art classical solvers on certain instances of synthetic

problem classes in sampling diverse, low-energy solutions.

The present study oﬀers two main contributions. First,

we map the multibody molecular docking problem to

a quadratic unconstrained binary optimization (QUBO)

formulation, using a one-hot solution encoding, for eval-

uation by a quantum annealer. The multibody docking

problem occurs repeatedly in drug discovery, both when

predicting structures of oligomeric macromolecular tar-

gets, and when validating designed drugs by predicting

their binding mode to their target (often with accompa-

nying water molecules) [50–52]. Due to the exponential

scaling of the solution space with the number of bodies,

a more eﬃcient means of solving this problem would rep-

resent a major advance for computational drug design

eﬀorts. Second, we provide insight into the ideal tuning

of hyperparameters to permit the quantum annealer to

sample optimal and diverse near-optimal solutions more

eﬃciently. Speciﬁcally, we show that tuning the one-hot

penalty strength to allow intermixing between meaning-

ful (physical) and non-meaningful (non-physical) basis

states can lead to increased sampling of low-energy and

structurally diverse solutions with the D-Wave Advan-

tage quantum annealer.

A. Mapping multibody docking to a quantum

annealer

Here we consider the multibody molecular docking

problem, both as a real-world problem of considerable bi-

ological interest, and as a case study for studying struc-

tural diversity of solutions of a real-world problem as

sampled on a quantum annealer (the D-Wave Advantage

system). This problem involves searching for the optimal

spatial conﬁgurations (or “poses”) of N > 2molecules in

a solution space that scales exponentially with N. Of-

ten, this problem is considered at the level of two bod-

ies, as in implicit-solvent protein-ligand docking [53,54].

However, many biological complexes consist of more than

two bodies: for example, hemoglobin is comprised of four

independent protein subunits which closely associate in

complex [55]. Indeed, a large fraction of proteins have

quaternary structure, and nearly all exist at least tran-

siently in complex with other molecules [56]. Even in the

case of protein-ligand docking, water molecules can play

a key role in protein-ligand recognition, so that this is

natively a multibody docking problem [51,57].

Recent classical computational methods [58] such as

HADDOCK [59], ATTRACT [60,61], EROS-DOCK [62]

and MLSD (Multiple Ligands Simultaneous Docking)

[63] attempt to address the importance of the multibody

docking problem, but their performance scales poorly

when a large number of bodies are involved. For exam-

ple, HADDOCK can only carry out simultaneous multi-

body docking for up to six bodies [59]. To deal with the

intensive computational resources required for simultane-

ous multibody docking in the large Nlimit, HADDOCK

instead docks < N bodies in iterative fashion until all

Nbodies are docked. However, this approach fails to

consider the full N-body conﬁguration space, and so is

prone to missing globally-optimal conﬁgurations achiev-

able only through simultaneous docking. Quantum paral-

lelism, in contrast, allows simultaneous consideration of

all possible conﬁgurations, oﬀering potential advantage

over classical methods.

In this work, we map the multibody molecular dock-

ing problem to a QUBO formulation, allowing sampling

using the D-Wave quantum annealer. The quality of so-

lutions produced by any multibody docking method of

course depends not only on the sampling method but

also the scoring function used. Our approach works for

any energy or scoring function that can be expressed as

a sum of one-body internal energies and two-body in-

teraction energies. For demonstration purposes, we use

the Rosetta REF2015 energy function [36], the excel-

lent accuracy of which allows us to test our predicted

docked conﬁgurations against crystallographic data. We

pre-compute one- and two-body energies classically, an

approach that scales favorably (quadratically) with the

number of bodies, N, and reserve the quantum annealer

for the search phase, which requires time that is ex-

ponential in the number of bodies Nwhen performed

classically. This can be contrasted with the approach

of Casares et al. for predicting protein structure on

a quantum computer, in which energies for all possi-

ble conﬁgurations were classically pre-computed, losing

any possible advantage gained during the search phase

[9]. Our approach is also distinct from previous studies

of molecular docking using quantum computers, which

predominantly focused on two-body molecular docking.

Mato et al. mapped molecular unfolding, which is one

of the steps of two-body molecular docking, to a quan-

tum annealer [8]. Banchi et al. mapped two-body

molecular docking to ﬁnding the maximum weighted

clique in a graph, which can be sampled on a Gaus-

sian Boson Sampler [6]. Their mapping requires heuris-

tically identifying pharmacophore points for a ligand-

receptor pair (a coarsely-grained representation of the

molecular geometry), while our approach considers all

the atoms of the bodies involved in multibody docking.

Importantly, where approaches based on pharmacophore

distance graphs inevitably have degenerate symmetry-

related ground states, our approach respects the chirality

of biomolecules and their complexes.

B. Increasing structural diversity of sampled

solutions on the D-Wave quantum annealer

As a proof of concept for our method, we studied

the diversity of low-energy states of a three helix bun-

dle rigid docking problem. We used an all-atom rep-

resentation of each of the three helices, pre-computing

internal (one-body) energies and helix-helix interaction

(two-body) energies with a custom-written Rosetta ap-

plication. Due to the limited number of qubits and inter-

qubit connectivity available on D-Wave Advantage 4.1,

we considered only translation in the conﬁguration space,

but our approach can be easily extended to include ro-

tational transformations and internal degrees of freedom

for each molecule. While sampling the QUBO formu-

lation on D-Wave, we discretized the translation space,

ﬁxed one helix, and used a “one-hot” encoding to repre-

sent translational gridpoint indices for each mobile helix

relative to the ﬁxed helix. Though there have been ef-

forts to ﬁnd alternative encodings [64,65], one-hot en-

coding remains popular when sampling discrete integer

(higher-than-binary) quadratic models on the D-Wave

system [66–69]. Usually, the one-hot penalty strength

is chosen to be larger than the energy scale of the prob-

lem QUBO to ensure that low-energy states correspond

to physically-meaningful states (i.e., bitstrings with one

1per register, assigning exactly one translational grid-

point for each body) while high-energy states correspond

to non-physical states (i.e. bitstrings assigning no grid-

point or multiple gridpoints to the same body). How-

ever, we ﬁnd that choosing the penalty strength to be

large reduces both solution quality and low-energy so-

lution diversity sampled on the D-Wave annealer given

the one-hot multibody molecular docking problems that

we test. Instead, we ﬁnd that choosing moderate values

of the penalty strength, which promotes intermixing of

physical and non-physical states, ensures both improved

solution quality and the largest diversity of physically

meaningful, low-energy solutions when sampling our one-

hot encoded problems on the D-Wave system.

C. New ﬁndings

In sum: (1) We contribute a one-hot encoded QUBO

formulation of the multibody molecular docking prob-

lem executable by current quantum annealing hardware,

requiring a classical pre-computation that scales poly-

nomially in the number of bodies and simulation space

size. (2) Sampling our QUBO formulation of multibody

molecular docking on the D-Wave Advantage 4.1 quan-

tum annealer, we ﬁnd that the lowest-energy predicted

conﬁguration, which by exhaustive enumeration corre-

sponds to the ground state allowed by our model, has a

root mean square deviation (RMSD) of 2.4 Åfrom the

native crystallographic structure of a three helix bun-

dle. (3) We ﬁnd that choosing moderate values of the

penalty strength parameter of our model improves the

speed of sampling diverse low-energy solutions, including

the ground state, by a factor of 3-4 times over the penalty

strength of common practice.

II. METHODS

We organize our methods as follows. In section II A, we

present a QUBO formulation to the multibody molecu-

lar docking problem; in section II B, we discuss our choice

of energy function; in section II C, we outline the mea-

sures by which we assess the performance of our quan-

tum annealing and simulated annealing approaches; and

in section II D, we present the speciﬁc multibody docking

problem that we investigate. Sections II E and II F out-

line our approach to scanning over the hyperparameters

of our quantum annealing protocol and a QUBO simu-

lated annealing protocol to allow their comparison, and

the experiments we execute to judge their performance.

A. QUBO formulation

Quantum annealing employs the adiabatic theorem of

quantum mechanics in an attempt to ﬁnd the global min-

imum of a given objective function. Speciﬁcally, the the-

orem states that adiabatically changing the Hamiltonian

of a quantum mechanical system in the ground state will

not perturb the system from its ground state if there per-

sists an energy gap that isolates the ground state from

excited states [70]. Finding the global minimum of an

objective function using this method then amounts to

evolving the initial Hamiltonian and its ground state to

a ﬁnal Hamiltonian which encodes the objective func-

tion, and reading out the ﬁnal ground state. In practice,

however, ﬁnite annealing time, integrated control errors,

readout error, and noise (thermal and quantum) force

sampling of excited states as well.

The D-Wave quantum annealer used herein oﬀers a

programmable objective function in QUBO form, where

the ground state of the initial Hamiltonian is an equal

superposition over all possible solutions to the QUBO.

Allowing h(bi)to be the one-qubit bias coeﬃent of the

i-th qubit and J(bi, bj)to be the two-qubit coupling co-

eﬃcient of qubits iand j, the QUBO function is then:

HQUBO =X

h(bi)bi+X

i>j

J(bi, bj)bibj,(1)

where biis a binary variable, i.e.,bi∈ {0,1}. The full

set of h(bi)and J(bi, bj)coeﬃcients therefore deﬁne the

problem to be solved, and the values that the associated

qubits take on upon measurement represent a solution.

We seek to encode the multibody docking problem ac-

cording to this formulation.

To start, we discretize the continuous, real space in

which multibody molecular docking natively takes place

to Tpoints. For our purposes, given the spatial resource

limitations to our selected quantum annealing hardware

(D-Wave Advantage 4.1), we will ignore internal and ro-

tational degrees of freedom, though the approach is easily

extended to include these degrees of freedom provided

that suﬃcient qubits are available. Fixing one body

to avoid conﬁgurational degeneracy due to translational

symmetries, there exist TN−1possible unique conﬁgura-

tions of Nbodies.

We enforce a one-hot encoding scheme, so that each of

the N−1mobile bodies is associated with one of N−1

registers of qubits of length T, each qubit correspond-

ing to a speciﬁc point in our discretized space represent-

ing a particular translational oﬀset of the given mobile

body from the ﬁxed body. We therefore require (N−1)T

qubits in total. Meaningful physical states correspond to

bit-strings which contain exactly one 1per register, as

this places each body in exactly one location, while non-

meaningful non-physical states correspond to bit-strings

other than those of physical states. In this QUBO for-

mulation of the multibody molecular docking problem,

the number of non-physical states Nnon-physical is expo-

nentially more than number of physical states Nphysical,

i.e.,Nnon-physical =O(2(N−1)T)Nphysical, where T1.

One might consider it to be a qubit-wasteful strategy in

having such a large number of non-physical states in the

spectrum, but we later show its presence can allow us to

tune the sampled diversity of low-energy physical states

through intermixing between non-physical and physical

states. We leave the comparison of our approach’s per-

formance with that of more qubit-eﬃcient approaches for

future work.

We designate body Nof the set of bodies X=

{1,2, ..., N}to be the ﬁxed body, so that the set of mobile

bodies is then X0={1,2, ..., N −1}. Given our one-hot

formulation, the pairwise interaction between any x∈X0

and body Nmust be encoded in the h(bi)qubit bias

terms of Equation 1. Pairwise interactions between any

two bodies in X0are encoded as J(bi, bj)qubit coupling

terms. These interactions between bodies are determined

by pairwise free energy functions; we discuss these in sec-

tion II B. (Note that in the present formulation, the one-

body molecular energies are invariant with arrangement,

and contribute only a constant oﬀset to the energy of

any given conﬁguration. Since they do not alter the rel-

ative energy or location of minima, they can be dropped

from the problem, or used only when reporting solution

energies.)

Whereas previously we identiﬁed qubits by a general

index as bi, we now track the body to which the variable

applies explicitly in superscript and its position in sub-

script. Thus, for a mobile body x, its location at the i-th

position within the space of Tpoints is represented as bx

where i∈1,2, ..., T and x∈1,2, ..., N −1. Now, to pe-

nalize the occurrence of non-physical solutions we apply

a set of equality constraints to enforce that exactly one

ishould be selected per mobile body x∈X0. We be-

gin with the constraint that Pibx

i= 1 per mobile body.

To penalize the situation where each bx

iis returned as 0,

we then subtract 1and square both sides, so that our

constraint on body xis:

i>j

2bx

ibx

j−X

i+ 1 = 0.(2)

Adding these constraints to the objective function

while ignoring constant oﬀset and multiplying each by a

tunable one-hot encoding enforcement penalty strength

parameter, γ, yields the problem QUBO:

HMBD =X

x∈X0

i=1 h(bx

i)−γbx

x≥yX

i>j

(J(bx

i, by

j) + δ)bx

iby

(3)

In the above, we deﬁne δto be 2γwhen x=yand 0

otherwise. With N−1mobile bodies, the number of

linear terms of the QUBO is (N−1)T, and the number

of quadratic terms is (N−1)T

2, which scales as O(N2T2).

B. Energy function

We must supply a set of h(bx

i)and J(bx

i, by

j)terms

to Equation 3, where, as noted previously, these terms

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MultibodymoleculardockingonaquantumannealerMohitPandey,1,TristanZaborniak,1,2,HansMelo,1AlexeyGalda,1andVikramKhippleMulligan3,y1MentenAI,Inc.,SanFrancisco,CA94111,USA2DepartmentofComputerScience,UniversityofVictoria,Victoria,BCV8W2Y2,Canada3CenterforComputationalBiology,FlatironInstitute,162Fifth...

展开>> 收起<<

Multibody molecular docking on a quantum annealer Mohit Pandey1Tristan Zaborniak1 2Hans Melo1Alexey Galda1and Vikram Khipple Mulligan3 1Menten AI Inc. San Francisco CA 94111 USA.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Multibody molecular docking on a quantum annealer Mohit Pandey1Tristan Zaborniak1 2Hans Melo1Alexey Galda1and Vikram Khipple Mulligan3 1Menten AI Inc. San Francisco CA 94111 USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: