Efficient AlphaFold2 Training using Parallel Evoformer and Branch Parallelism Guoxia Wang Zhihua Wu Xiaomin Fang

2025-08-18 0 0 730.72KB 9 页 10玖币
侵权投诉
Efficient AlphaFold2 Training using Parallel Evoformer and Branch Parallelism
Guoxia Wang*, Zhihua Wu*, Xiaomin Fang
Yingfei Xiang, Yiqun Liu, Dianhai Yu, Yanjun Ma
Baidu Inc.
mingzilaochongtu@gmail.com
{Wuzhihua02,fangxiaomin01,xiangyingfei01,liuyiqun01,yudianhai,mayanjun02}@baidu.com
Abstract
The accuracy of AlphaFold2, a frontier end-to-end structure
prediction system, is already close to that of the experimen-
tal determination techniques. Due to the complex model ar-
chitecture and large memory consumption, it requires lots
of computational resources and time to train AlphaFold2
from scratch. Efficient AlphaFold2 training could acceler-
ate the development of life science. In this paper, we pro-
pose a Parallel Evoformer and Branch Parallelism to speed
up the training of AlphaFold2. We conduct sufficient exper-
iments on UniFold implemented in PyTorch and HelixFold
implemented in PaddlePaddle, and Branch Parallelism can
improve the training performance by 38.67% and 36.93%,
respectively. We also demonstrate that the accuracy of Par-
allel Evoformer could be on par with AlphaFold2 on the
CASP14 and CAMEO datasets. The source code is available
on https://github.com/PaddlePaddle/PaddleFleetX.
1 Introduction
Proteins are exceptionally critical for life science, as it plays
a wide range of functions in organisms. A protein comprises
a chain of amino acid residues and folds into a 3D structure
to play its functions. Since the 3D structure determines the
protein’s functions, studying the 3D structure helps to under-
stand the mechanism of the protein’s activities. However, it
is time-consuming and complex to study protein structure
determination through experimental technologies, e.g., X-
ray crystallography and nuclear magnetic resonance (NMR).
Until now, the experimental methods have determined about
two hundred thousand protein structures (Sussman et al.
1998; Burley et al. 2020), only a fairly small portion of hun-
dreds of millions of publicly available amino acid sequences
(The UniProt Consortium 2016). Therefore, efficient protein
structure estimation methods are in great demand.
Many institutions (Jumper et al. 2021; Yang et al. 2015;
Du et al. 2021; Baek et al. 2021; Peng and Xu 2011) made
their efforts to develop AI-based protein structure predic-
tion systems due to the efficiency and the capacity of the
deep neural networks. In particular, thanks to the fantastic
performance in the challenging 14th Critical Assessment of
Protein Structure Prediction (CASP14) (Kryshtafovych et al.
*These authors contributed equally. This work was completed
on August 15, 2022.
MSA
representation
(b, s, r, cm)
pair
representation
(b, r, r, cz)
MSA
representation
(b, s, r, cm)
pair
representation
(b, r, r, cz)
Row-wise
gated
self-
attention
with pair
bias
Column-
wise
gated
self-
attention
Tran-
sition
Outer
product
mean
Triangle
update
using
outgoing
edges
Triangle
update
using
incoming
edges
Triangle
self-
attention
around
starting
node
Triangle
self-
attention
around
ending
node
Tran-
sition
MSA
representation
(b, s, r, cm)
pair
representation
(b, r, r, cz)
MSA
representation
(b, s, r, cm)
pair
representation
(b, r, r, cz)
Row-wise
gated
self-
attention
with pair
bias
Column-
wise
gated
self-
attention
Tran-
sition
Outer
product
mean
Triangle
update
using
outgoing
edges
Triangle
update
using
incoming
edges
Triangle
self-
attention
around
starting
node
Triangle
self-
attention
around
ending
node
Tran-
sition
MSA
representation
(b, s, r, cm)
pair
representation
(b, r, r, cz)
MSA
representation
(b, s, r, cm)
pair
representation
(b, r, r, cz)
Row-wise
gated
self-
attention
with pair
bias
Column-
wise
gated
self-
attention
Tran-
sition
Outer
product
mean
Triangle
update
using
outgoing
edges
Triangle
update
using
incoming
edges
Triangle
self-
attention
around
starting
node
Triangle
self-
attention
around
ending
node
Tran-
sition
(a) Evoformer in AlphaFold2
(b) Evoformer in AlphaFold-Multimer
(c) Parallel Evoformer in Ours
Figure 1: Various Evoformer block. (a) The original Evo-
former block in AlphaFold2. (b) Modified Evoformer block
in AlphaFold-Multimer. (c) The Parallel Evoformer block
proposed in this paper. The main difference is that the outer
product mean cross-communication happens at different po-
sition within the block.
2021a), AlphaFold2 (Jumper et al. 2021) from DeepMind
has attracted lots of public attention. The accuracy of Al-
phaFold2 approaches that of the experimental determination
technologies. AlphaFold2 is an end-to-end protein estima-
tion pipeline that directly estimates the 3D coordinates of
all the atoms in the proteins. A novel and well-designed
architecture is proposed to promote the estimation accu-
racy, which jointly models multiple sequence alignments
(MSAs) for evolutionary relationships and pairwise relations
between the amino acids to learn the spatial relations.
Although the accuracy of AlphaFold2 is satisfactory for
protein structure prediction, it also takes 11 days to train
arXiv:2211.00235v1 [cs.DC] 1 Nov 2022
Input
Sequence
Genetic
database
search
Structure
database
search
template_pair_feat
template_angle_feat
msa_feat
target_feat
residue_index
extra_msa_feat embed
extra MSA
representation
(b, se, r, ce)
embed
pair
representation
(b, r, r, cz)
MSA
representation
(b, s, r, cm)
embed
embed
TemplatePairStack
(2 blocks)
concat
Extra
MSA
Stack
(4 blocks)
pair
representation
(b, r, r, cz)
MSA
representation
(b, s, r, cm)
Evoformer
Stack
(48 blocks)
pair
representation
(b, r, r, cz)
MSA
representation
(b, s, r, cm)
Structure
Module
(8 blocks)
Recycling (1 ~ 3 times)
preprocess embedding
encoder decoder
recycle
Template Embedding
(4 templates)
Figure 2: Overall framework of AlphaFold2. Dimension names: b: mini-batchs, s: clustered MSA sequences, se: extra MSA
sequences, r: residues, c: channels. The Extra MSA stack is composed of Evoformer, so AlphaFold2 has 52 Evoformer blocks.
end-to-end on 128 TPUv3 cores from scratch, limiting its
wide usage. The structure of the AlphaFold2 is complex,
as shown in Figure 2, which leads to high training over-
head. Specifically, there are three main reasons: First, the
AlphaFold2 is relatively deep, and the Evoformer block has
two computing branches and cannot be calculated in paral-
lel. Second, the official open-source implemented total batch
size is limited to 128, and each device has only 1 batch size,
which cannot be extended to more devices in parallel to ac-
celerate training through data parallelism. Third, although
the parameters of AlphaFold2 are only 93M, the number of
parameter tensors reaches 4630. The time overhead of ac-
cessing these small tensors in different training stages of
each iteration is not negligible.
To this end, this paper proposes two optimization tech-
niques for two of the above three problems to achieve effi-
cient AlphaFold2 training under the premise of fully align-
ing hyperparameters (network model configuration and to-
tal batchsize of 128 with 1 protein sample per device).
First, inspired by AlphaFold-mutimer (Evans et al. 2021),
we modify the two serial computing branches in the Evo-
former block into a parallel computing structure, named Par-
allel Evoformer, as shown in Figure 1. Second, we propose
a novel Branch Parallelism (BP) for Parallel Evoformer,
which can break the barrier of parallel acceleration that can-
not be scaled to more devices through data parallelism due
to a batch size of 1 on each device.
The method proposed in this paper to efficiently train Al-
phaFold2 models is general and not limited to deep learning
frameworks and the version of re-implemented AlphaFold2.
We perform extensive experimental verification on Uni-
Fold implemented in PyTorch and HelixFold implemented
in PaddlePaddle. Extensive experimental results show that
Branch Parallelism can achieve similar training performance
improvements on both UniFold and HelixFold, which are
38.67% and 36.93% higher, respectively. We also demon-
strate that the accuracy of Parallel Evoformer could be on
par with AlphaFold2 on the CASP14 and CAMEO datasets.
The main contributions of this paper can be summarized
as follows:
We improve the Evoformer in AlphaFold2 to Parallel
Evoformer, which breaks the computational dependency
of MSA and pair representation, and experiments show
that this does not affect the accuracy.
We propose Branch Parallelism for Parallel Evoformer,
which splits different computing branches across more
devices in parallel to speed up training efficiency. This
breaks the limitation of data parallelism in the official
implementation of AlphaFold2.
We reduce the end-to-end training time of AlphaFold2
to 4.18 days on UniFold and 4.88 days on HelixFold,
improving the training performance by 38.67% and
36.93%, respectively. It achieves efficient AlphaFold2
training, saving R&D economic costs for biocomputing
research.
2 Background
2.1 Overview of AlphaFold2
Comparing to the traditional protein structure prediction
model which usually consists of multiple steps, AlphaFold2
processes the input protein sequence and predicts the 3D
protein structure through an end-to-end procedure. In gen-
eral, AlphaFold2 takes the amino acid sequence as input and
then search against protein databases to obtain MSAs and
similar templates. By using MSA information, we can de-
tect correlations between the parts of similar sequences that
are more likely to mutate. The templates with regards to the
input sequence, on the other hand, provide structural infor-
mation for the model to predict the final structure.
The overall framework of AlphaFold2 can be divided into
five parts: Preprocess, Embedding, Encoder, Decoder, and
Recycle, which is shown in Figure 2. The Preprocess part
mainly parses the input raw sequence and generates MSA-
related and template-related features via genetic database
search and structure database search. The features are then
embedded into MSA representation, pair representation and
extra MSA representation during Embedding part. These
representations contain sufficient co-evolutionary informa-
tion among similar sequences and geometric information of
摘要:

EfcientAlphaFold2TrainingusingParallelEvoformerandBranchParallelismGuoxiaWang*,ZhihuaWu*,XiaominFangYingfeiXiang,YiqunLiu,DianhaiYu,YanjunMaBaiduInc.mingzilaochongtu@gmail.comfWuzhihua02,fangxiaomin01,xiangyingfei01,liuyiqun01,yudianhai,mayanjun02g@baidu.comAbstractTheaccuracyofAlphaFold2,afrontier...

展开>> 收起<<
Efficient AlphaFold2 Training using Parallel Evoformer and Branch Parallelism Guoxia Wang Zhihua Wu Xiaomin Fang.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:730.72KB 格式:PDF 时间:2025-08-18

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注