AlignOT An optimal transport based algorithm for fast 3D alignment with applications to cryogenic electron microscopy density maps

2025-04-30 0 0 7.05MB 21 页 10玖币

侵权投诉

AlignOT: An optimal transport based algorithm for fast 3D

alignment with applications to cryogenic electron microscopy

density maps

Aryan Tajmir Riahi1, Geoﬀrey Woollard1, Fr´ed´eric Poitevin2, Anne Condon1, and Khanh Dao Duc1,3,∗

1Department of Computer Science, University of British Columbia, Vancouver, BC V6T 1Z4, Canada

2SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA

3Department of Mathematics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada

Abstract

Aligning electron density maps from Cryogenic electron microscopy (cryo-EM) is a ﬁrst key step

for studying multiple conformations of a biomolecule. As this step remains costly and challenging, with

standard alignment tools being potentially stuck in local minima, we propose here a new procedure, called

AlignOT, which relies on the use of computational optimal transport (OT) to align EM maps in 3D space.

By embedding a fast estimation of OT maps within a stochastic gradient descent algorithm, our method

searches for a rotation that minimizes the Wasserstein distance between two maps, represented as point

clouds. We quantify the impact of various parameters on the precision and accuracy of the alignment,

and show that AlignOT can outperform the standard local alignment methods, with an increased range

of rotation angles leading to proper alignment. We further benchmark AlignOT on various pairs of

experimental maps, which account for diﬀerent types of conformational heterogeneities and geometric

properties. As our experiments show good performance, we anticipate that our method can be broadly

applied to align 3D EM maps.

Introduction

Solving the 3D structures of biomolecules is key to their function and the mechanisms underlying biological

processes. For this purpose, cryogenic electron microscopy (cryo-EM) has become in recent years the most

used technique to solve structures [1]. One main advantage of this technique, in contrast with X-ray crys-

tallography, is that it potentially allows various conformations (or 3D conﬁgurations) of the same molecule

to be solved [2]. Once diﬀerent conformations are obtained as 3D EM density maps (i.e., large 3D grids

of voxels with diﬀerent levels of intensities), aligning these maps in 3D space is needed to further compare

them.

Eﬃcient methods have been developed to align two protein structures [3,4], assuming their atomic com-

position is known. In this case, aligning two conformational structures is tantamount to ﬁnding an optimal

rigid body transformation (i.e. a combination of 3D translation and rotation) that can align homologous

atoms. However, when density maps are only given, one cannot directly establish such a homology corre-

spondence from voxel to voxel, so the same problem becomes more challenging as the grid size increases and

with the computational cost of searching over all possible rigid body transformations.

To solve the rigid body alignment problem for 3D cryo-EM density maps, standard approaches use various

algorithms to maximize correlation [5,6,7]. More recently, Han et al. introduced a new method, which

relies on representing the maps as sets of unit vectors before performing alignment [8]. Overall, both the

choice of the metric to optimize, as well as the representation of the maps, can play important roles in

getting a successful alignment. In this paper, we introduce a novel approach, called AlignOT, that uses a

point-cloud representation of 3D maps, and minimizes the so-called Wasserstein distance between two maps

with a stochastic gradient algorithm. This non-Euclidean distance is associated with the theory of Optimal

Transport (OT) [9], with recent advances that make tractable the computation of transport-based distances

[10,11]. After describing the procedure of AlignOT in detail, we present the results of our experiments that

quantify the precision and accuracy of this method, and benchmark it on a set of representative experimental

∗corresponding author: kdd@math.ubc.ca

arXiv:2210.09361v1 [q-bio.BM] 17 Oct 2022

maps. Overall, the good performance of AlignOT with respect to standard local alignment methods suggests

that our method can be broadly applied, as an alternative for aligning 3D EM maps with a non-Euclidean

metric. We ﬁnally discuss the potential limitations of AlignOT, and connections with other recent methods

and problems in optimal transport that would help to further improve and generalize it.

Material and Methods

Point cloud representation of EM maps

To represent voxelized cryo-EM maps, we use the topology representing network algorithm (TRN) [12], which

reduces a map of d3voxels to a point cloud, i.e., a set of npoints ∈R3. Brieﬂy, the voxelized maps (typically

∼1003to ∼5003voxels in the 3D grid) are ﬁrst thresholded with a noise ﬂoor, to set low intensity regions

to zero, and normalized to a probability mass function Pover the grid points. npoints (ri(0))i=1...n are

initially sampled from P, and then updated in parallel for tfrounds by taking weighted steps, according to

the following equations [12]

ri(t+ 1) = ri(t) + ε(t) exp[−ki/λ(t)](rt−ri(t)),(1)

λ(t) = λ0λf

λ0t/tf,(2)

ε(t) = ε0εf

ε0t/tf,(3)

where rtis a single grid point sampled from P,kidenotes the rank of point ri(t), by its Euclidean

distance to rt, and λ0, ε0, λf, εf, tfare hyperparameters. In practice, we used ε0= 0.3 and εf= 0.05 for the

initial and ﬁnal step sizes, λ0= 0.005 ×nand λf= 0.5 for the initial and ﬁnal scaling ranks, and tf= 8 ×n

for the total number of steps (adapted from [13]).

Optimal Transport and Wasserstein distance

To compare the point cloud representations of EM maps, we use a non-Euclidean metric that derives from

the theory of Optimal Transport [9]. For two given point clouds, A={a1, . . . , an}and B={b1, . . . , bn},

we deﬁne a cost matrix Ci,j =d(ai, bj)2, where dis the Euclidean distance. The entropy regularized 2-

Wasserstein distance between Aand B, denoted by W2,(A,B), is then deﬁned as

W2,(A,B) = [ min

P∈Rn×n

i,j=1

Ci,j Pi,j +H(P)]1/2

s.t. P. =PT.=/n

,(4)

where ∈R+is the regularization parameter and the entropy H(P) is given by

H(P) =

i,j=1

Pi,j log Pi,j .(5)

The minimizer of equation (4) is called the transport plan. For the rest of the Methods section, we will

simply denote the Wasserstein distance as W2,, and Pi(a),i(b)as Pa,b, where i(a) and i(b) are the indices of

the two points aand bin Aand B, respectively.

AlignOT : Algorithm for 3D map alignment

To align two 3D maps from their point cloud representations Aand B, we solve the optimization problem

qopt = argmin

q∈H

W2,(Rq(A),B),(6)

where qis a quaternion (deﬁned over the quaternion space H), that we identify to a 3D rotation Rqin

SO(3), so Rq(A) = {Rq(ai)|ai∈A}. We explain in the Results section and Appendix A why we can only

consider rotations and ignore translations to solve the general alignment problem, and provide more details

on the identiﬁcation of qto Rqin Appendix B. Our stochastic gradient descent procedure to solve (6), called

AlignOT, is detailed in Algorithm 1. At each iteration, the algorithm updates qfrom the transport plan P

between Rq(A) and Bas follows: After sampling one point a∈Rq(A), we evaluate π(a) = argmaxb∈BPa,b,

and compute the gradient in qassociated with d(π(a), a)2, where dis the Euclidean distance, to update q. To

compute the transport plan, we apply the Sinkhorn algorithm [10], with the initial vectors set as the outputs

of the previous iteration. In practice, we also set the convergence condition kd(π(a), a)2k< δ (where δ > 0),

that stops the algorithm before the maximum number of iterations. The hyperparameters of this procedure

are the learning rate αassociated with gradient descent, the regularization parameter associated with the

Wasserstein distance, and a threshold δassociated with the number of iterations. In all our experiments, we

set = 100, δ= 10−10, and the maximum number of iterations equal to 500.

Algorithm 1 AlignOT : 3D density maps alignment with SGD using unit quaternions and Wasserstein

distance

Input two 3D density maps A,B, number of sampled points n∈R, learning rate α∈R, regularization

parameter ∈R, maximum number of iterations L∈N, and gradient threshold δ∈R

1: Sample two sets of npoints A,B⊂R3from A,Brespectively, using TRNs

2: q= 1 + 0i+ 0j+ 0k

3: G=α2

4: while not converged and the number of iterations is at most Ldo

5: Compute Rq(A)

6: Compute Pto be the OT plan matrix between Rq(A) and B

7: Randomly select a∈Rq(A)

8: b=π(a)

9: f(q) = d(Rq(a), b)2(where dis Euclidean distance in R3)

10: G=G+k∇f(q)k2

11: q=q−α

√G× ∇f(q)

12: q=q

kqk

13: end while

14: return q

Implementation

We implemented AlignOT in Python 3.6.4. To sample a point cloud representation of an EM map using

TRN, we adapted code from ProDy [14]. We used the NumPy package for matrix operations and POT’s

implementation of the Sinkhorn algorithm, which was modiﬁed to set the initial vectors (instead of initializing

with uniform vectors). Our code is available in this GitHub repository.

Datasets

We tested AlignOT on various publicly available EM maps available from the EMDB, as listed in Table 1,

and shown in Figure 1. For experiments that involved aligning a pair of distinct maps, we also used the

corresponding pair of structures taken from the PDB [15], and used MM-align [3] to set a ground truth

alignment, before converting the structures into EM maps using the function molmap in Chimera X [16]. All

the datasets used in this study are available at this OSF page.

Results

Formulation of the alignment problem for EM maps with point cloud represen-

tation

We present here a new procedure, called AlignOT, that aligns 3D density maps from cryo-EM. Assuming

that the EM maps are represented by two 3D point clouds A={a1, . . . , an}and B={b1, . . . , bn}, and for a

given distance function ddeﬁned over the space of point clouds, the problem of aligning these maps consists

of ﬁnding a rigid body transformation that minimizes the objective function

Ld(R, T ) = d(moveR,T (A),B)2,(7)

where Ld(R, T ) is deﬁned over rotation matrices R∈SO(3) and translation vectors T∈R3, and the operator

moveR,T (A) is deﬁned as

moveR,T (A) = {Rai+T|ai∈A}.(8)

As the choice of dinﬂuences both the accuracy and the computational cost of the solution to the rigid body

alignment problem, we here use the 2-Wasserstein distance, associated with the theory of Optimal Transport

[9]. This distance can be used to compute distances between probability distributions, and is applied here

more speciﬁcally for two distributions of 3D point clouds of same size. To eﬃciently evaluate this distance,

we consider a regularized version (see equation (4) in the Methods section ), denoted W2,. Besides, given the

centers of mass ¯a=1

nPn

i=1 aiand ¯

b=1

nPn

i=1 bi, and the centered point clouds Ac={aci=ai−¯a|ai∈A}

and Bc={bci=bi−¯

b|bi∈B}, we can show that the optimal translation of the objective function (7) is

Topt =¯

b−Ropt¯a, (9)

where

Ropt = argmin

R∈SO(3)

W2,(R(Ac),Bc).(10)

Thus, the search for an optimal rigid body transformation in (10) can be simpliﬁed to rotations after

matching the centers of mass of A,B, leading to equation (6) of the Methods section . We provide a detailed

proof of this result in Appendix A.

AlignOT optimization procedure and complexity

To search for an optimal rotation that minimizes the Wasserstein distance between two 3D point clouds, we

use an iterative Stochastic Gradient Descent algorithm, detailed in Algorithm 1. Basically, the algorithm

aims to improve the alignment at each iteration, by rotating a map according to an assignment provided

by the evaluation of the Wasserstein distance, with quaternions used to represent the 3D rotations (see

Appendix B for a full description). Since EM maps are deﬁned on a 3D voxelized grid, we also ﬁrst convert

the maps into point clouds, using the TRN algorithm [12]. We cover the application of the TRN application,

the Wasserstein distance and the optimization procedure, as well as its implementation in the Material and

Methods sections .

The complexity of our method (summarized in Algorithm 1) can be evaluated as follows: Assume that

Lis the maximum number of iterations, nthe size of the point cloud, and the regularization parameter of

the W2, distance. Each iteration of this algorithm consists of three steps. First, rotating one point cloud,

second, computing the OT plan matrix, and last, sampling a random point and computing the gradient.

Rotating a point cloud requires computing the coordinates of each point after rotation and takes O(n) time,

where nrepresents the number of points. To compute the OT plan matrix we use the Sinkhorn algorithm

[11] which solves this problem in O(n2log n−3) time, where is the regularization parameter. Finally, the

gradient step takes O(1) time. Overall, the most time-consuming part of this algorithm is computing the OT

plan matrix at each iteration. Therefore, the overall time complexity of the algorithm is O(n2Llog n−3).

Alignment between maps can be obtained by minimizing the Wasserstein dis-

tance

To evaluate our method, we ﬁrst tested AlignOT on aligning two point clouds that diﬀer by a rotation only.

To do so, we used an experimental map of a ribosome from EMDB 1717, shown in Figure 1a (ribosome

structures are broadly studied in cryo-EM [2,17]). First, we sampled two clouds of 500 points, and applied

a rotation deﬁned in its axis-angle representation by an arbitrary axis, and an angle θ= 20◦. Figure 2a

illustrates how the moving point cloud gets closer to the targeted one over the iterations of the algorithm,

until the convergence criterion is reached. To conﬁrm this visual impression, we repeated the procedure with

diﬀerent initial angles θ∈ {10◦,30◦,50◦,70◦}. The corresponding Wasserstein distance obtained across the

iterations is shown in Figure 2b, with all the four trajectories converging to the same value and resulting in a

successful alignment. However, we also observed that as θincreases, it takes more iterations for the algorithm

to converge, with longer periods of slow variations at the beginning of the procedure, suggesting that this

alignment can only be achieved within a certain range of θ. Aside from the initial angle, we also studied in

Figure 2c how the size of the point cloud can aﬀect the convergence plot. For θ= 50◦, decreasing the point

cloud size from 500 to 250 leads to a potential loss of precision (with the Wasserstein distance between two

aligned point clouds increasing from ∼115 to 130), but with faster convergence from the algorithm (from ∼

200 to 50 iterations to converge), indicating some trade-oﬀ between accuracy and speed.

To interpret these results, we further plotted in Figure 2d how the Wasserstein distance varies on average

(after sampling diﬀerent point clouds of same size 250 and 500), as a function of θ(and same rotation axis).

In addition to the global minimum achieved for θ= 0, another local minimum was detected at θ= 180◦,

separated by a peak around 90◦. While the sharp decrease observed towards the global minimum suggests

that the optimization procedure can converge well for θ≤60◦, it is also possible that the algorithm does not

converge to the global minimum above this range. Besides, the higher variability obtained from sampling

point clouds of size 250, compared with 500, suggests that the ﬁnal alignment may be less accurate in this

case. While the same ﬁxed rotation axis was considered in the previous experiments (with diﬀerent values

of θ), we ﬁnally evaluated the probability to successfully align the maps for initial rotations of ﬁxed angle θ

(45, 60, 75 and 90◦), and across diﬀerent axes covering half of the sphere S2. Upon mapping the axes on the

planar disk in Figure 2e, we found local regions of poorer alignment that match with local minima of the

Wasserstein distance. These results also conﬁrm the existence of a limiting range within which the method

can align two maps. While a successful alignment is overall obtained for θ= 45◦, the maps get partially

aligned in diﬀerent regions of the disks for 60◦(see Figure 2e), with the performance worsening as θincreases

(see Figure 3).

Benchmarking performance and accuracy of AlignOT with a pair of identical

maps

We next focused on point cloud size and the range of convergence, and quantiﬁed their impact on the

accuracy and computational cost of AlignOT. We ﬁrst considered the same map and alignment task with

θ= 20◦as previously, with diﬀerent point cloud sizes n(from 50 to 1000). The results obtained with a

standard workstation are shown in Table 2, and conﬁrm the existence of a trade-oﬀ between accuracy and

speed, with the alignment improving as nincreases, but with a larger runtime that goes from a few seconds

for n= 50,100,200 to approximately a minute for n= 1000. While the accuracy of the alignment remains

poor for n= 50 and 100 (with an average error of 12.6◦and 7.72◦respectively), it signiﬁcantly improves for

n= 500 with 2.35±1.23◦error observed, with a runtime suggesting that AlignOT can be used in practice on

standard density maps. We also noted that AlignOT runs slower in comparison with Chimera and Chimera

X’s alignment function fitmap, which performs a steepest ascent optimization to align maps according to

their overlapping score [5,16] (0.3 s on average). On the other hand, we show in our next experiments that

AlignOT outperforms fitmap in accuracy as θincreases.

More precisely, we determined the range of θwithin which the method converges, and compared our

method with fitmap. As shown in Figure 2c, we found that AlignOT can cover a wider range that extends

to 75◦, while fitmap starts failing at approximately 45◦(see also Figure 6for a visualization of some

representative alignments obtained). Beyond this value, AlignOT leads to some variable results up until

100◦, due to the stochasticity of the algorithm. It then converges towards the other local minimum found

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AlignOT:Anoptimaltransportbasedalgorithmforfast3DalignmentwithapplicationstocryogenicelectronmicroscopydensitymapsAryanTajmirRiahi1,GeoreyWoollard1,FredericPoitevin2,AnneCondon1,andKhanhDaoDuc1;3;1DepartmentofComputerScience,UniversityofBritishColumbia,Vancouver,BCV6T1Z4,Canada2SLACNationalAccel...

展开>> 收起<<

AlignOT An optimal transport based algorithm for fast 3D alignment with applications to cryogenic electron microscopy density maps.pdf

共21页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

AlignOT An optimal transport based algorithm for fast 3D alignment with applications to cryogenic electron microscopy density maps

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: