AlignOT An optimal transport based algorithm for fast 3D alignment with applications to cryogenic electron microscopy density maps

2025-04-30 0 0 7.05MB 21 页 10玖币
侵权投诉
AlignOT: An optimal transport based algorithm for fast 3D
alignment with applications to cryogenic electron microscopy
density maps
Aryan Tajmir Riahi1, Geoffrey Woollard1, Fr´ed´eric Poitevin2, Anne Condon1, and Khanh Dao Duc1,3,
1Department of Computer Science, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
2SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA
3Department of Mathematics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
Abstract
Aligning electron density maps from Cryogenic electron microscopy (cryo-EM) is a first key step
for studying multiple conformations of a biomolecule. As this step remains costly and challenging, with
standard alignment tools being potentially stuck in local minima, we propose here a new procedure, called
AlignOT, which relies on the use of computational optimal transport (OT) to align EM maps in 3D space.
By embedding a fast estimation of OT maps within a stochastic gradient descent algorithm, our method
searches for a rotation that minimizes the Wasserstein distance between two maps, represented as point
clouds. We quantify the impact of various parameters on the precision and accuracy of the alignment,
and show that AlignOT can outperform the standard local alignment methods, with an increased range
of rotation angles leading to proper alignment. We further benchmark AlignOT on various pairs of
experimental maps, which account for different types of conformational heterogeneities and geometric
properties. As our experiments show good performance, we anticipate that our method can be broadly
applied to align 3D EM maps.
Introduction
Solving the 3D structures of biomolecules is key to their function and the mechanisms underlying biological
processes. For this purpose, cryogenic electron microscopy (cryo-EM) has become in recent years the most
used technique to solve structures [1]. One main advantage of this technique, in contrast with X-ray crys-
tallography, is that it potentially allows various conformations (or 3D configurations) of the same molecule
to be solved [2]. Once different conformations are obtained as 3D EM density maps (i.e., large 3D grids
of voxels with different levels of intensities), aligning these maps in 3D space is needed to further compare
them.
Efficient methods have been developed to align two protein structures [3,4], assuming their atomic com-
position is known. In this case, aligning two conformational structures is tantamount to finding an optimal
rigid body transformation (i.e. a combination of 3D translation and rotation) that can align homologous
atoms. However, when density maps are only given, one cannot directly establish such a homology corre-
spondence from voxel to voxel, so the same problem becomes more challenging as the grid size increases and
with the computational cost of searching over all possible rigid body transformations.
To solve the rigid body alignment problem for 3D cryo-EM density maps, standard approaches use various
algorithms to maximize correlation [5,6,7]. More recently, Han et al. introduced a new method, which
relies on representing the maps as sets of unit vectors before performing alignment [8]. Overall, both the
choice of the metric to optimize, as well as the representation of the maps, can play important roles in
getting a successful alignment. In this paper, we introduce a novel approach, called AlignOT, that uses a
point-cloud representation of 3D maps, and minimizes the so-called Wasserstein distance between two maps
with a stochastic gradient algorithm. This non-Euclidean distance is associated with the theory of Optimal
Transport (OT) [9], with recent advances that make tractable the computation of transport-based distances
[10,11]. After describing the procedure of AlignOT in detail, we present the results of our experiments that
quantify the precision and accuracy of this method, and benchmark it on a set of representative experimental
corresponding author: kdd@math.ubc.ca
1
arXiv:2210.09361v1 [q-bio.BM] 17 Oct 2022
maps. Overall, the good performance of AlignOT with respect to standard local alignment methods suggests
that our method can be broadly applied, as an alternative for aligning 3D EM maps with a non-Euclidean
metric. We finally discuss the potential limitations of AlignOT, and connections with other recent methods
and problems in optimal transport that would help to further improve and generalize it.
Material and Methods
Point cloud representation of EM maps
To represent voxelized cryo-EM maps, we use the topology representing network algorithm (TRN) [12], which
reduces a map of d3voxels to a point cloud, i.e., a set of npoints R3. Briefly, the voxelized maps (typically
1003to 5003voxels in the 3D grid) are first thresholded with a noise floor, to set low intensity regions
to zero, and normalized to a probability mass function Pover the grid points. npoints (ri(0))i=1...n are
initially sampled from P, and then updated in parallel for tfrounds by taking weighted steps, according to
the following equations [12]
ri(t+ 1) = ri(t) + ε(t) exp[ki(t)](rtri(t)),(1)
λ(t) = λ0λf
λ0t/tf,(2)
ε(t) = ε0εf
ε0t/tf,(3)
where rtis a single grid point sampled from P,kidenotes the rank of point ri(t), by its Euclidean
distance to rt, and λ0, ε0, λf, εf, tfare hyperparameters. In practice, we used ε0= 0.3 and εf= 0.05 for the
initial and final step sizes, λ0= 0.005 ×nand λf= 0.5 for the initial and final scaling ranks, and tf= 8 ×n
for the total number of steps (adapted from [13]).
Optimal Transport and Wasserstein distance
To compare the point cloud representations of EM maps, we use a non-Euclidean metric that derives from
the theory of Optimal Transport [9]. For two given point clouds, A={a1, . . . , an}and B={b1, . . . , bn},
we define a cost matrix Ci,j =d(ai, bj)2, where dis the Euclidean distance. The entropy regularized 2-
Wasserstein distance between Aand B, denoted by W2,(A,B), is then defined as
W2,(A,B) = [ min
PRn×n
+
n
X
i,j=1
Ci,j Pi,j +H(P)]1/2
s.t. P. =PT.=/n
,(4)
where R+is the regularization parameter and the entropy H(P) is given by
H(P) =
n
X
i,j=1
Pi,j log Pi,j .(5)
The minimizer of equation (4) is called the transport plan. For the rest of the Methods section, we will
simply denote the Wasserstein distance as W2,, and Pi(a),i(b)as Pa,b, where i(a) and i(b) are the indices of
the two points aand bin Aand B, respectively.
AlignOT : Algorithm for 3D map alignment
To align two 3D maps from their point cloud representations Aand B, we solve the optimization problem
qopt = argmin
qH
W2,(Rq(A),B),(6)
2
where qis a quaternion (defined over the quaternion space H), that we identify to a 3D rotation Rqin
SO(3), so Rq(A) = {Rq(ai)|aiA}. We explain in the Results section and Appendix A why we can only
consider rotations and ignore translations to solve the general alignment problem, and provide more details
on the identification of qto Rqin Appendix B. Our stochastic gradient descent procedure to solve (6), called
AlignOT, is detailed in Algorithm 1. At each iteration, the algorithm updates qfrom the transport plan P
between Rq(A) and Bas follows: After sampling one point aRq(A), we evaluate π(a) = argmaxbBPa,b,
and compute the gradient in qassociated with d(π(a), a)2, where dis the Euclidean distance, to update q. To
compute the transport plan, we apply the Sinkhorn algorithm [10], with the initial vectors set as the outputs
of the previous iteration. In practice, we also set the convergence condition kd(π(a), a)2k< δ (where δ > 0),
that stops the algorithm before the maximum number of iterations. The hyperparameters of this procedure
are the learning rate αassociated with gradient descent, the regularization parameter associated with the
Wasserstein distance, and a threshold δassociated with the number of iterations. In all our experiments, we
set = 100, δ= 1010, and the maximum number of iterations equal to 500.
Algorithm 1 AlignOT : 3D density maps alignment with SGD using unit quaternions and Wasserstein
distance
Input two 3D density maps A,B, number of sampled points nR, learning rate αR, regularization
parameter R, maximum number of iterations LN, and gradient threshold δR
1: Sample two sets of npoints A,BR3from A,Brespectively, using TRNs
2: q= 1 + 0i+ 0j+ 0k
3: G=α2
4: while not converged and the number of iterations is at most Ldo
5: Compute Rq(A)
6: Compute Pto be the OT plan matrix between Rq(A) and B
7: Randomly select aRq(A)
8: b=π(a)
9: f(q) = d(Rq(a), b)2(where dis Euclidean distance in R3)
10: G=G+k∇f(q)k2
11: q=qα
G× ∇f(q)
12: q=q
kqk
13: end while
14: return q
Implementation
We implemented AlignOT in Python 3.6.4. To sample a point cloud representation of an EM map using
TRN, we adapted code from ProDy [14]. We used the NumPy package for matrix operations and POT’s
implementation of the Sinkhorn algorithm, which was modified to set the initial vectors (instead of initializing
with uniform vectors). Our code is available in this GitHub repository.
Datasets
We tested AlignOT on various publicly available EM maps available from the EMDB, as listed in Table 1,
and shown in Figure 1. For experiments that involved aligning a pair of distinct maps, we also used the
corresponding pair of structures taken from the PDB [15], and used MM-align [3] to set a ground truth
alignment, before converting the structures into EM maps using the function molmap in Chimera X [16]. All
the datasets used in this study are available at this OSF page.
3
Results
Formulation of the alignment problem for EM maps with point cloud represen-
tation
We present here a new procedure, called AlignOT, that aligns 3D density maps from cryo-EM. Assuming
that the EM maps are represented by two 3D point clouds A={a1, . . . , an}and B={b1, . . . , bn}, and for a
given distance function ddefined over the space of point clouds, the problem of aligning these maps consists
of finding a rigid body transformation that minimizes the objective function
Ld(R, T ) = d(moveR,T (A),B)2,(7)
where Ld(R, T ) is defined over rotation matrices RSO(3) and translation vectors TR3, and the operator
moveR,T (A) is defined as
moveR,T (A) = {Rai+T|aiA}.(8)
As the choice of dinfluences both the accuracy and the computational cost of the solution to the rigid body
alignment problem, we here use the 2-Wasserstein distance, associated with the theory of Optimal Transport
[9]. This distance can be used to compute distances between probability distributions, and is applied here
more specifically for two distributions of 3D point clouds of same size. To efficiently evaluate this distance,
we consider a regularized version (see equation (4) in the Methods section ), denoted W2,. Besides, given the
centers of mass ¯a=1
nPn
i=1 aiand ¯
b=1
nPn
i=1 bi, and the centered point clouds Ac={aci=ai¯a|aiA}
and Bc={bci=bi¯
b|biB}, we can show that the optimal translation of the objective function (7) is
Topt =¯
bRopt¯a, (9)
where
Ropt = argmin
RSO(3)
W2,(R(Ac),Bc).(10)
Thus, the search for an optimal rigid body transformation in (10) can be simplified to rotations after
matching the centers of mass of A,B, leading to equation (6) of the Methods section . We provide a detailed
proof of this result in Appendix A.
AlignOT optimization procedure and complexity
To search for an optimal rotation that minimizes the Wasserstein distance between two 3D point clouds, we
use an iterative Stochastic Gradient Descent algorithm, detailed in Algorithm 1. Basically, the algorithm
aims to improve the alignment at each iteration, by rotating a map according to an assignment provided
by the evaluation of the Wasserstein distance, with quaternions used to represent the 3D rotations (see
Appendix B for a full description). Since EM maps are defined on a 3D voxelized grid, we also first convert
the maps into point clouds, using the TRN algorithm [12]. We cover the application of the TRN application,
the Wasserstein distance and the optimization procedure, as well as its implementation in the Material and
Methods sections .
The complexity of our method (summarized in Algorithm 1) can be evaluated as follows: Assume that
Lis the maximum number of iterations, nthe size of the point cloud, and the regularization parameter of
the W2, distance. Each iteration of this algorithm consists of three steps. First, rotating one point cloud,
second, computing the OT plan matrix, and last, sampling a random point and computing the gradient.
Rotating a point cloud requires computing the coordinates of each point after rotation and takes O(n) time,
where nrepresents the number of points. To compute the OT plan matrix we use the Sinkhorn algorithm
[11] which solves this problem in O(n2log n3) time, where is the regularization parameter. Finally, the
gradient step takes O(1) time. Overall, the most time-consuming part of this algorithm is computing the OT
plan matrix at each iteration. Therefore, the overall time complexity of the algorithm is O(n2Llog n3).
4
Alignment between maps can be obtained by minimizing the Wasserstein dis-
tance
To evaluate our method, we first tested AlignOT on aligning two point clouds that differ by a rotation only.
To do so, we used an experimental map of a ribosome from EMDB 1717, shown in Figure 1a (ribosome
structures are broadly studied in cryo-EM [2,17]). First, we sampled two clouds of 500 points, and applied
a rotation defined in its axis-angle representation by an arbitrary axis, and an angle θ= 20. Figure 2a
illustrates how the moving point cloud gets closer to the targeted one over the iterations of the algorithm,
until the convergence criterion is reached. To confirm this visual impression, we repeated the procedure with
different initial angles θ∈ {10,30,50,70}. The corresponding Wasserstein distance obtained across the
iterations is shown in Figure 2b, with all the four trajectories converging to the same value and resulting in a
successful alignment. However, we also observed that as θincreases, it takes more iterations for the algorithm
to converge, with longer periods of slow variations at the beginning of the procedure, suggesting that this
alignment can only be achieved within a certain range of θ. Aside from the initial angle, we also studied in
Figure 2c how the size of the point cloud can affect the convergence plot. For θ= 50, decreasing the point
cloud size from 500 to 250 leads to a potential loss of precision (with the Wasserstein distance between two
aligned point clouds increasing from 115 to 130), but with faster convergence from the algorithm (from
200 to 50 iterations to converge), indicating some trade-off between accuracy and speed.
To interpret these results, we further plotted in Figure 2d how the Wasserstein distance varies on average
(after sampling different point clouds of same size 250 and 500), as a function of θ(and same rotation axis).
In addition to the global minimum achieved for θ= 0, another local minimum was detected at θ= 180,
separated by a peak around 90. While the sharp decrease observed towards the global minimum suggests
that the optimization procedure can converge well for θ60, it is also possible that the algorithm does not
converge to the global minimum above this range. Besides, the higher variability obtained from sampling
point clouds of size 250, compared with 500, suggests that the final alignment may be less accurate in this
case. While the same fixed rotation axis was considered in the previous experiments (with different values
of θ), we finally evaluated the probability to successfully align the maps for initial rotations of fixed angle θ
(45, 60, 75 and 90), and across different axes covering half of the sphere S2. Upon mapping the axes on the
planar disk in Figure 2e, we found local regions of poorer alignment that match with local minima of the
Wasserstein distance. These results also confirm the existence of a limiting range within which the method
can align two maps. While a successful alignment is overall obtained for θ= 45, the maps get partially
aligned in different regions of the disks for 60(see Figure 2e), with the performance worsening as θincreases
(see Figure 3).
Benchmarking performance and accuracy of AlignOT with a pair of identical
maps
We next focused on point cloud size and the range of convergence, and quantified their impact on the
accuracy and computational cost of AlignOT. We first considered the same map and alignment task with
θ= 20as previously, with different point cloud sizes n(from 50 to 1000). The results obtained with a
standard workstation are shown in Table 2, and confirm the existence of a trade-off between accuracy and
speed, with the alignment improving as nincreases, but with a larger runtime that goes from a few seconds
for n= 50,100,200 to approximately a minute for n= 1000. While the accuracy of the alignment remains
poor for n= 50 and 100 (with an average error of 12.6and 7.72respectively), it significantly improves for
n= 500 with 2.35±1.23error observed, with a runtime suggesting that AlignOT can be used in practice on
standard density maps. We also noted that AlignOT runs slower in comparison with Chimera and Chimera
X’s alignment function fitmap, which performs a steepest ascent optimization to align maps according to
their overlapping score [5,16] (0.3 s on average). On the other hand, we show in our next experiments that
AlignOT outperforms fitmap in accuracy as θincreases.
More precisely, we determined the range of θwithin which the method converges, and compared our
method with fitmap. As shown in Figure 2c, we found that AlignOT can cover a wider range that extends
to 75, while fitmap starts failing at approximately 45(see also Figure 6for a visualization of some
representative alignments obtained). Beyond this value, AlignOT leads to some variable results up until
100, due to the stochasticity of the algorithm. It then converges towards the other local minimum found
5
摘要:

AlignOT:Anoptimaltransportbasedalgorithmforfast3DalignmentwithapplicationstocryogenicelectronmicroscopydensitymapsAryanTajmirRiahi1,Geo reyWoollard1,FredericPoitevin2,AnneCondon1,andKhanhDaoDuc1;3;1DepartmentofComputerScience,UniversityofBritishColumbia,Vancouver,BCV6T1Z4,Canada2SLACNationalAccel...

展开>> 收起<<
AlignOT An optimal transport based algorithm for fast 3D alignment with applications to cryogenic electron microscopy density maps.pdf

共21页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:21 页 大小:7.05MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 21
客服
关注