What cleaves Is proteasomal cleavage prediction reaching a ceiling Ingo Ziegler1Bolei Ma1Ercong Nie1

2025-05-06 0 0 329.68KB 15 页 10玖币

侵权投诉

What cleaves? Is proteasomal cleavage prediction

reaching a ceiling?

Ingo Ziegler, 1Bolei Ma,1Ercong Nie,1

Bernd Bischl,2,3David Rügamer,2Benjamin Schubert,4,5Emilio Dorigatti2,4

1Center for Information and Language Processing, LMU Munich,

2Department of Statistics, LMU Munich,

3Munich Center For Machine Learning,

4Institute of Computational Biology, Helmholtz Zentrum München,

5Department of Mathematics, TUM Munich

{ziegler.ingo, bolei.ma}@campus.lmu.de, nie@cis.lmu.de,

{bernd.bischl, david.ruegamer, emilio.dorigatti}@stat.uni-muenchen.de

benjamin.schubert@helmholtz-muenchen.de

Abstract

Epitope vaccines are a promising direction to enable precision treatment for cancer,

autoimmune diseases, and allergies. Effectively designing such vaccines requires

accurate prediction of proteasomal cleavage in order to ensure that the epitopes

in the vaccine are presented to T cells by the major histocompatibility complex

(MHC). While direct identiﬁcation of proteasomal cleavage in vitro is cumbersome

and low throughput, it is possible to implicitly infer cleavage events from the

termini of MHC-presented epitopes, which can be detected in large amounts thanks

to recent advances in high-throughput MHC ligandomics. Inferring cleavage

events in such a way provides an inherently noisy signal which can be tackled

with new developments in the ﬁeld of deep learning that supposedly make it

possible to learn predictors from noisy labels. Inspired by such innovations, we

sought to modernize proteasomal cleavage predictors by benchmarking a wide

range of recent methods, including LSTMs, transformers, CNNs, and denoising

methods, on a recently introduced cleavage dataset. We found that increasing

model scale and complexity appeared to deliver limited performance gains, as

several methods reached about 88.5% AUC on C-terminal and 79.5% AUC on

N-terminal cleavage prediction. This suggests that the noise and/or complexity

of proteasomal cleavage and the subsequent biological processes of the antigen

processing pathway are the major limiting factors for predictive performance rather

than the speciﬁc modeling approach used. While biological complexity can be

tackled by more data and better models, noise and randomness inherently limit the

maximum achievable predictive performance. All our datasets and experiments are

available at https://github.com/ziegler-ingo/cleavage_prediction.

1 Introduction

Proteasomal cleavage digestion of antigens is a major step of the antigen processing pathway, as by

cleaving proteins in smaller peptides it determines what may be subsequently presented by the major

histocompatibility complex (MHC) to T cells, potentially triggering an immune response [Blum

et al., 2013]. Therefore, an important task for computational design of epitope vaccines (EV) is the

prediction of this cleavage process, so that this information can be used by existing computational

approaches [Dorigatti and Schubert, 2020a,b] to improve the efﬁcacy of the vaccine.

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.12991v2 [q-bio.QM] 25 Oct 2022

Due to the difﬁculty of collecting large quantities of data in vitro, proteasomal cleavage events are

usually inferred implicitly from MHC ligandomics data [Purcell et al., 2019] by matching eluted

ligands to their progenitor protein to recover sequence information surrounding the terminals [Ke¸smir

et al., 2002]. This procedure, however, does not give an indication of which amino acid sequences

cannot result in a cleavage event, since missed cleavage sites are not observed in MHC ligands.

Therefore, decoy negative samples are usually generated synthetically either by randomly shufﬂing

the amino acids in a short window around the cleavage site or by considering artiﬁcial negative sites

located around observed cleavage events [Calis et al., 2014]. Even though such negative samples are

not entirely reliable, the growing availability of this kind of data Vita et al. [2018] spurred continuous

development and improvement of proteasomal cleavage predictors Ke¸smir et al. [2002], Kuttler et al.

[2000], Dönnes and Kohlbacher [2005], Nielsen et al. [2005] which have been recently revised in

light of new innovations in the deep learning ﬁeld [Amengual-Rigo and Guallar, 2021a, Dorigatti

et al., 2022, Weeder et al., 2021, Amengual-Rigo and Guallar, 2021b].

As a consequence of these developments, we implemented and tested several binary classiﬁca-

tion methods on a proteasomal cleavage prediction task, carefully benchmarking a wide choice of

architectures, embeddings, and training regimes.

2 Methods

In this benchmark study we consider three main axis of variation: the initial embedding of amino

acids, the neural architecture of the predictor, and their training regime via noise handling and data

augmentations.

2.1 Embedding

The choice of embedding is crucial as it inﬂuences what intrinsic information a model can exploit

for classiﬁcation [Ibtehaz and Kihara, 2021]; we thus consider various embeddings in our analysis,

while keeping the base architecture equal. Speciﬁcally, we analyze the performance of a randomly

initialized embedding layer that is optimized in conjunction with the loss function of the whole

model, and the dedicated Prot2Vec [Asgari and Mofrad, 2015] embeddings trained with the well-

established Word2Vec [Mikolov et al., 2013a,b] algorithm. Analogous to natural language, we design

sequence embeddings by concatenating independently trained forward and backward amino acid

representations of each input [Heigold et al., 2016].

Trainable tokenizers learn to form a given number of complex intra-token splits. This leads to a

setting where the vocabulary size is now a tunable hyperparameter and thus has a direct impact on the

size and quality of subsequently trained embedding representations. We extend our experiment with a

vocabulary size

1000

and a vocabulary size

50 000

version of the byte-level byte pair encoding [Sen-

nrich et al., 2016, BBPE], as well as a vocabulary size

50 000

version of the WordPiece [Schuster

and Nakajima, 2012, WP] algorithm.

2.2 Neural architectures

Recurrent:

Bidirectional long short-term memory networks (BiLSTM) [Graves and Schmidhuber,

2005] are well suited for a wide range of text classiﬁcation tasks, thus we based nine of 12 model

architectures around BiLSTMs. The fundamental structure for our BiLSTMs is built around the

architecture proposed by Ozols et al. [Ozols et al., 2021], in which multiple sequential BiLSTMs are

followed by a hidden and an output layer. For eight of our nine BiLSTM-related experiments, we

choose two sequential BiLSTMs, where sequence dimensionality is reduced by taking the maximum

value of the depth-wise per-residue output of the last layer. For the hidden layer, we used the Gaussian

Error Linear Units (GELU) [Hendrycks and Gimpel, 2016] activation function. We additionally

include an adjusted ﬁve BiLSTM version of a residual architecture between LSTM blocks, which

aims to combat the shallow layer problem of deep LSTM architectures while also trying to improve

the decoder quality with attention [Liu and Gong, 2019].

Transformers:

Besides RNNs, the attention mechanism introduced by Vaswani et al. enabled a

whole new architecture capable of processing sequences: the transformer [Vaswani et al., 2017]. We,

therefore, integrated ProtTrans’ T5-XL encoder-only model [Elnaggar et al., 2022] featuring 1.2

billion parameters, as well as ESM2 transformer [Lin et al., 2022] in its 150 million parameter version.

Additionally, we include a ﬁne-tuning performance of ESM2 by adding a linear layer projection from

its vocabulary-sized per-residue Roberta Language Model Head [Liu et al., 2019a, Rives et al., 2021]

to our binary classiﬁcation target.

Convolutional and Perceptron:

We take the DeepCleave [Li et al., 2019] attention-enhanced

convolutional neural network [LeCun et al., 1998, CNN] architecture into our benchmark analysis.

Furthermore, stacking fully connected layers without any convolutional or recurrent features, e.g., in

DeepCalpain [Liu et al., 2019b] or Terminitor [Yang et al., 2020], has also been successfully applied

to protein data. As baseline, we include a single hidden layer perceptron [Rumelhart et al., 1986]

with Rectiﬁed Linear Units [Agarap, 2018] as activation function into the analysis.

2.3 Training

Dataset:

We used the dataset introduced in [Dorigatti et al., 2022], which contains

229 163

and

222 181

N- and C-terminals cleavage sites respectively. Each cleavage site is captured into a window

comprising six amino acids to its left and four to its right, and is associated with six decoy negative

samples obtained by considering the three residues preceding and following it, resulting in a total of

1 434 989

and

1 419 501

samples after deduplication for N- and C-terminals. As the decoy negatives

are situated in close proximity to real cleavage sites and due to the probabilistic nature of proteasomal

cleavage, some of the negative samples are likely to be actual, unmeasured cleavage sites, and may

inﬂuence the performance of predictors trained using such data.

Noisy labels:

To reduce the impact of asymmetric label noise on the performance of our classiﬁers,

we take ﬁve recent deep learning-speciﬁc denoising approaches into consideration: a noise adaptation

layer, which attempts to learn the noise distribution in the data [Goldberger and Ben-Reuven, 2017],

co-teaching, where two models are trained simultaneously by deciding for the respective other network

which samples from a mini-batch to use for training [Han et al., 2018], and co-teaching-plus [Yu et al.,

2019], which updates co-teaching with the disagreement learning approach of decoupling [Malach

and Shalev-Shwartz, 2017]. We additionally consider a joint training method with co-regularization

(JoCoR) [Wei et al., 2020] and DivideMix [Li et al., 2020a] for benchmarking. DivideMix is a

holistic approach originally developed for computer vision and integrates multiple frameworks,

such as co-teaching and MixMatch [Berthelot et al., 2019], into one. As MixMatch builds upon

MixUp [Zhang et al., 2018], which was developed for image data, we adjust it for sequential data by

mixing up the embedded sequence representation [Guo et al., 2019] instead of the pixel input in the

data loading process.

Data augmentation:

For all models, we apply data augmentation directly on the input sequences to

combat overﬁtting and improve generalizability by masking a random amino acid per sequence as un-

known [Shen et al., 2021]. All predictors except ESM2 ﬁne-tuning use adaptive momentum [Kingma

and Ba, 2015] as their optimization technique, whereas ESM2 ﬁne-tuning uses adaptive momentum

with decoupled weight decay [Loshchilov and Hutter, 2017]. All models without denoising techniques

use (binary) cross-entropy loss [Cox, 1958], while all denoising models calculate dedicated losses.

3 Experimental protocol

Evaluation:

As previously mentioned, some negative samples may actually result in a proteasomal

cleavage event in vivo due to the way these negative samples are generated. For this reason, traditional

binary classiﬁcation metrics such as accuracy, precision, recall, etc. are misleading and model

evaluation should instead be based on the AUC [Menon et al., 2015]. We reserved a random 10% of

each terminal dataset as test dataset used for the ﬁnal evaluation of the best hyperparameters.

Hyperparameter optimization:

Due to computational limitations, we split up the hyperparameter

search into three priority groups: group one used Ray Tune’s [Moritz et al., 2018] implementation

of the asynchronous hyperband algorithm [Li et al., 2020b] and evaluated each conﬁguration in a

ten-folds cross-validation (CV), while for groups two and three we chose hyperparameters manually

and evaluated each conﬁguration with ﬁve-folds CV (group two) or a single run on a held-out

validation set (group three). We then used the best hyperparameter combination to train each

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Whatcleaves?Isproteasomalcleavagepredictionreachingaceiling?IngoZiegler,1BoleiMa,1ErcongNie,1BerndBischl,2;3DavidRügamer,2BenjaminSchubert,4;5EmilioDorigatti2;41CenterforInformationandLanguageProcessing,LMUMunich,2DepartmentofStatistics,LMUMunich,3MunichCenterForMachineLearning,4InstituteofComputati...

展开>> 收起<<

What cleaves Is proteasomal cleavage prediction reaching a ceiling Ingo Ziegler1Bolei Ma1Ercong Nie1.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

What cleaves Is proteasomal cleavage prediction reaching a ceiling Ingo Ziegler1Bolei Ma1Ercong Nie1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: