Efﬁcient AlphaFold2 Training using Parallel Evoformer and Branch Parallelism Guoxia Wang Zhihua Wu Xiaomin Fang

2025-08-18 0 0 730.72KB 9 页 10玖币

侵权投诉

Efﬁcient AlphaFold2 Training using Parallel Evoformer and Branch Parallelism

Guoxia Wang*, Zhihua Wu*, Xiaomin Fang

Yingfei Xiang, Yiqun Liu, Dianhai Yu, Yanjun Ma

Baidu Inc.

mingzilaochongtu@gmail.com

{Wuzhihua02,fangxiaomin01,xiangyingfei01,liuyiqun01,yudianhai,mayanjun02}@baidu.com

Abstract

The accuracy of AlphaFold2, a frontier end-to-end structure

prediction system, is already close to that of the experimen-

tal determination techniques. Due to the complex model ar-

chitecture and large memory consumption, it requires lots

of computational resources and time to train AlphaFold2

from scratch. Efﬁcient AlphaFold2 training could acceler-

ate the development of life science. In this paper, we pro-

pose a Parallel Evoformer and Branch Parallelism to speed

up the training of AlphaFold2. We conduct sufﬁcient exper-

iments on UniFold implemented in PyTorch and HelixFold

implemented in PaddlePaddle, and Branch Parallelism can

improve the training performance by 38.67% and 36.93%,

respectively. We also demonstrate that the accuracy of Par-

allel Evoformer could be on par with AlphaFold2 on the

CASP14 and CAMEO datasets. The source code is available

on https://github.com/PaddlePaddle/PaddleFleetX.

1 Introduction

Proteins are exceptionally critical for life science, as it plays

a wide range of functions in organisms. A protein comprises

a chain of amino acid residues and folds into a 3D structure

to play its functions. Since the 3D structure determines the

protein’s functions, studying the 3D structure helps to under-

stand the mechanism of the protein’s activities. However, it

is time-consuming and complex to study protein structure

determination through experimental technologies, e.g., X-

ray crystallography and nuclear magnetic resonance (NMR).

Until now, the experimental methods have determined about

two hundred thousand protein structures (Sussman et al.

1998; Burley et al. 2020), only a fairly small portion of hun-

dreds of millions of publicly available amino acid sequences

(The UniProt Consortium 2016). Therefore, efﬁcient protein

structure estimation methods are in great demand.

Many institutions (Jumper et al. 2021; Yang et al. 2015;

Du et al. 2021; Baek et al. 2021; Peng and Xu 2011) made

their efforts to develop AI-based protein structure predic-

tion systems due to the efﬁciency and the capacity of the

deep neural networks. In particular, thanks to the fantastic

performance in the challenging 14th Critical Assessment of

Protein Structure Prediction (CASP14) (Kryshtafovych et al.

*These authors contributed equally. This work was completed

on August 15, 2022.

MSA

representation

(b, s, r, cm)

pair

representation

(b, r, r, cz)

MSA

representation

(b, s, r, cm)

pair

representation

(b, r, r, cz)

Row-wise

gated

self-

attention

with pair

bias

Column-

wise

gated

self-

attention

Tran-

sition

Outer

product

mean

Triangle

update

using

outgoing

edges

Triangle

update

using

incoming

edges

Triangle

self-

attention

around

starting

node

Triangle

self-

attention

around

ending

node

Tran-

sition

MSA

representation

(b, s, r, cm)

pair

representation

(b, r, r, cz)

MSA

representation

(b, s, r, cm)

pair

representation

(b, r, r, cz)

Row-wise

gated

self-

attention

with pair

bias

Column-

wise

gated

self-

attention

Tran-

sition

Outer

product

mean

Triangle

update

using

outgoing

edges

Triangle

update

using

incoming

edges

Triangle

self-

attention

around

starting

node

Triangle

self-

attention

around

ending

node

Tran-

sition

MSA

representation

(b, s, r, cm)

pair

representation

(b, r, r, cz)

MSA

representation

(b, s, r, cm)

pair

representation

(b, r, r, cz)

Row-wise

gated

self-

attention

with pair

bias

Column-

wise

gated

self-

attention

Tran-

sition

Outer

product

mean

Triangle

update

using

outgoing

edges

Triangle

update

using

incoming

edges

Triangle

self-

attention

around

starting

node

Triangle

self-

attention

around

ending

node

Tran-

sition

(a) Evoformer in AlphaFold2

(b) Evoformer in AlphaFold-Multimer

Figure 1: Various Evoformer block. (a) The original Evo-

former block in AlphaFold2. (b) Modiﬁed Evoformer block

in AlphaFold-Multimer. (c) The Parallel Evoformer block

proposed in this paper. The main difference is that the outer

product mean cross-communication happens at different po-

sition within the block.

2021a), AlphaFold2 (Jumper et al. 2021) from DeepMind

has attracted lots of public attention. The accuracy of Al-

phaFold2 approaches that of the experimental determination

technologies. AlphaFold2 is an end-to-end protein estima-

tion pipeline that directly estimates the 3D coordinates of

all the atoms in the proteins. A novel and well-designed

architecture is proposed to promote the estimation accu-

racy, which jointly models multiple sequence alignments

(MSAs) for evolutionary relationships and pairwise relations

between the amino acids to learn the spatial relations.

Although the accuracy of AlphaFold2 is satisfactory for

protein structure prediction, it also takes 11 days to train

arXiv:2211.00235v1 [cs.DC] 1 Nov 2022

Input

Sequence

Genetic

database

Structure

database

template_pair_feat

template_angle_feat

msa_feat

target_feat

residue_index

extra_msa_feat embed

extra MSA

representation

(b, se, r, ce)

embed

pair

representation

(b, r, r, cz)

MSA

representation

(b, s, r, cm)

embed

TemplatePairStack

(2 blocks)

concat

Extra

MSA

Stack

(4 blocks)

pair

representation

(b, r, r, cz)

MSA

representation

(b, s, r, cm)

Evoformer

Stack

(48 blocks)

pair

representation

(b, r, r, cz)

MSA

representation

(b, s, r, cm)

Structure

Module

(8 blocks)

Recycling (1 ~ 3 times)

preprocess embedding

encoder decoder

recycle

Template Embedding

(4 templates)

Figure 2: Overall framework of AlphaFold2. Dimension names: b: mini-batchs, s: clustered MSA sequences, se: extra MSA

sequences, r: residues, c: channels. The Extra MSA stack is composed of Evoformer, so AlphaFold2 has 52 Evoformer blocks.

end-to-end on 128 TPUv3 cores from scratch, limiting its

wide usage. The structure of the AlphaFold2 is complex,

as shown in Figure 2, which leads to high training over-

head. Speciﬁcally, there are three main reasons: First, the

AlphaFold2 is relatively deep, and the Evoformer block has

two computing branches and cannot be calculated in paral-

lel. Second, the ofﬁcial open-source implemented total batch

size is limited to 128, and each device has only 1 batch size,

which cannot be extended to more devices in parallel to ac-

celerate training through data parallelism. Third, although

the parameters of AlphaFold2 are only 93M, the number of

parameter tensors reaches 4630. The time overhead of ac-

cessing these small tensors in different training stages of

each iteration is not negligible.

To this end, this paper proposes two optimization tech-

niques for two of the above three problems to achieve efﬁ-

cient AlphaFold2 training under the premise of fully align-

ing hyperparameters (network model conﬁguration and to-

tal batchsize of 128 with 1 protein sample per device).

First, inspired by AlphaFold-mutimer (Evans et al. 2021),

we modify the two serial computing branches in the Evo-

former block into a parallel computing structure, named Par-

allel Evoformer, as shown in Figure 1. Second, we propose

a novel Branch Parallelism (BP) for Parallel Evoformer,

which can break the barrier of parallel acceleration that can-

not be scaled to more devices through data parallelism due

to a batch size of 1 on each device.

The method proposed in this paper to efﬁciently train Al-

phaFold2 models is general and not limited to deep learning

frameworks and the version of re-implemented AlphaFold2.

We perform extensive experimental veriﬁcation on Uni-

Fold implemented in PyTorch and HelixFold implemented

in PaddlePaddle. Extensive experimental results show that

Branch Parallelism can achieve similar training performance

improvements on both UniFold and HelixFold, which are

38.67% and 36.93% higher, respectively. We also demon-

strate that the accuracy of Parallel Evoformer could be on

par with AlphaFold2 on the CASP14 and CAMEO datasets.

The main contributions of this paper can be summarized

as follows:

• We improve the Evoformer in AlphaFold2 to Parallel

Evoformer, which breaks the computational dependency

of MSA and pair representation, and experiments show

that this does not affect the accuracy.

• We propose Branch Parallelism for Parallel Evoformer,

which splits different computing branches across more

devices in parallel to speed up training efﬁciency. This

breaks the limitation of data parallelism in the ofﬁcial

implementation of AlphaFold2.

• We reduce the end-to-end training time of AlphaFold2

to 4.18 days on UniFold and 4.88 days on HelixFold,

improving the training performance by 38.67% and

36.93%, respectively. It achieves efﬁcient AlphaFold2

training, saving R&D economic costs for biocomputing

research.

2 Background

2.1 Overview of AlphaFold2

Comparing to the traditional protein structure prediction

model which usually consists of multiple steps, AlphaFold2

processes the input protein sequence and predicts the 3D

protein structure through an end-to-end procedure. In gen-

eral, AlphaFold2 takes the amino acid sequence as input and

then search against protein databases to obtain MSAs and

similar templates. By using MSA information, we can de-

tect correlations between the parts of similar sequences that

are more likely to mutate. The templates with regards to the

input sequence, on the other hand, provide structural infor-

mation for the model to predict the ﬁnal structure.

The overall framework of AlphaFold2 can be divided into

ﬁve parts: Preprocess, Embedding, Encoder, Decoder, and

Recycle, which is shown in Figure 2. The Preprocess part

mainly parses the input raw sequence and generates MSA-

related and template-related features via genetic database

search and structure database search. The features are then

embedded into MSA representation, pair representation and

extra MSA representation during Embedding part. These

representations contain sufﬁcient co-evolutionary informa-

tion among similar sequences and geometric information of

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EfcientAlphaFold2TrainingusingParallelEvoformerandBranchParallelismGuoxiaWang*,ZhihuaWu*,XiaominFangYingfeiXiang,YiqunLiu,DianhaiYu,YanjunMaBaiduInc.mingzilaochongtu@gmail.comfWuzhihua02,fangxiaomin01,xiangyingfei01,liuyiqun01,yudianhai,mayanjun02g@baidu.comAbstractTheaccuracyofAlphaFold2,afrontier...

展开>> 收起<<

Efﬁcient AlphaFold2 Training using Parallel Evoformer and Branch Parallelism Guoxia Wang Zhihua Wu Xiaomin Fang.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Efﬁcient AlphaFold2 Training using Parallel Evoformer and Branch Parallelism Guoxia Wang Zhihua Wu Xiaomin Fang

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: