Opening Amyloid-Windows to the Secondary Structure of Proteins The Amyloidogenecity Increases Tenfold Inside Beta-Sheets

2025-04-27 0 0 1.18MB 12 页 10玖币
侵权投诉
Opening Amyloid-Windows to the Secondary Structure
of Proteins: The Amyloidogenecity Increases Tenfold
Inside Beta-Sheets
Krist´of Tak´acsa, B´alint Vargaa, Viktor Farkasc, Andr´as Perczelc,d, Vince
Grolmusza,b,
aPIT Bioinformatics Group, E¨
otv¨
os University, H-1117 Budapest, Hungary
bUratim Ltd., H-1118 Budapest, Hungary
cELKH-ELTE Protein Modeling Research Group, H-1117 Budapest, Hungary
dLaboratory of Structural Chemistry and Biology, E¨
otv¨
os University, H-1117, Budapest,
Hungary
Abstract
Methods from artificial intelligence (AI), in general, and machine learning, in
particular, have kept conquering new territories in numerous areas of science.
Most of the applications of these techniques are restricted to the classification
of large data sets, but new scientific knowledge can seldom be inferred from
these tools. Here we show that an AI-based amyloidogenecity predictor can
strongly differentiate the border- and the internal hexamers of β-pleated sheets
when screening all the Protein Data Bank-deposited homology-filtered protein
structures. Our main result shows that more than 30% of internal hexamers of β
sheets are predicted to be amyloidogenic, while just outside the border regions,
only 3% are predicted as such. This result may elucidate a general protection
mechanism of proteins against turning into amyloids: if the borders of β-sheets
were amyloidogenic, then the whole βsheet could turn more easily into an
insoluble amyloid-structure, characterized by periodically repeated parallel β-
sheets. We also present that no analogous phenomenon exists on the borders of
α-helices or randomly chosen subsequences of the studied protein structures.
Introduction
The amyloid status of proteins are studied for a long time because of its
relevance in human diseases, including neurodegenerative ailments [1, 2, 3, 4],
amyloidoses [5] and other diseases, listed in Table 1 of [2]. Recently, the amyloids
gained increased attention as possible antiviral agents [6], or in one of the first
successes in the human in vivo CRISPR-Cas9-based gene editing therapy of the
transthyretin amyloidosis [7].
Corresponding author
1
arXiv:2210.11842v1 [q-bio.BM] 21 Oct 2022
The insoluble amyloid state differs from the unstructured protein aggregates;
it has a well-defined structure characterized by repetitive parallel β-sheets [8, 9].
Numerous globular proteins may turn into insoluble amyloid structures in
certain pH and temperature combinations. It is observed that the transition to
the amyloid state is similar to the contagious prion-formation: it is related to
a core-forming structure [2] for starting the amyloid-transition, and the subse-
quent propagation of misfolded, insoluble, parallel β-sheets [10, 11]. The amy-
loid propagation is related to the spatially accessible, neighboring β-pleated
sheet secondary structural elements of proteins [12, 3, 4].
Naturally occurring, globular, soluble proteins need to possess certain struc-
tural properties which prevent them from turning into insoluble amyloid aggre-
gates. Experimental studies, together with in silico methods able to predict
aggregation-prone regions (APRs) in protein sequences. In the literature, cer-
tain residues acting as “gatekeepers” are described as preventing those confor-
mational changes to amyloid structures: aspartic acid and glycine in bacterial
curlin subunits CsgA and CsgB [13], lysine in transthyretin [14] or in peptides
Trpzip1 and Trpzip2 [15].
Our hypothesis is that the borders of β-sheets in globular, soluble proteins in
general (and not only in the specific examples listed above) need to be protected
against turning to amyloid structures; otherwise the whole polypeptide chain
bordering the β-sheet subsequences would be transformed to amyloids. We
examine this hypothesis in the present work by screening the starting and ending
subsequences, or in other terminology, the prefixes, and suffixes of the β-sheets
of the homology-filtered Protein Data Bank (PDB) entries.
Previous work
In [8], a clean geometrical formulation was given for the characterization
of the amyloid-like structures in length, distance, and parallelism of their β-
sheets, and by the search of these geometric properties, an automatically up-
dated list of the PDB-entries was published and maintained at the address
https://pitgroup.org/amyloid/.
In [9], the amyloid-precursor molecules of https://pitgroup.org/amyloid/
were classified by the prefixes of their parallel β-sheet subsequences, and these
prefixes were also searched for in the whole Protein Data Bank [16]. The remark-
able findings include the major histocompatibility complexes MHC-1 and MHC-
2, the p53 tumor suppressor protein, and an anti-coagulant peptide molecule.
With a prediction tool, based on artificial intelligence, in [17], we parti-
tioned all the possible hexapeptides into two classes: “amyloidogenic” and “non-
amyloidogenic” with more than 84% correctness. The prediction tool, which
applies a linear Support Vector Machine (SVM) [18], was trained by the Waltz
dataset of 514 amyloidogenic and 901 non-amyloidogenic hexapeptides [19, 20].
The corresponding web-based tool is available as the Budapest Amyloid Predic-
tor at https://pitgroup.org/bap.
As a surprising application of the Budapest Amyloid Predictor, numerous
amyloid- and non-amyloid hexapeptide patterns were identified in [21]. For ex-
2
ample, it was shown that for all independent amino acid substitutions for the po-
sitions marked by “x”, the CxFLWx or FxFLFx patterns describe amyloidogenic
hexamers, while the patterns PxDxxx or xxKxEx describe non-amyloidogenic
hexamers. Note that each pattern with two x’s describes 202= 400, while those
with four x’s describe 204= 160,000 different hexapeptides succinctly if for the
positions x, we can substitute the 20 amino acid residues.
In the present work, we study the prefixes and suffixes of the β-sheet sub-
sequences of the PDB-deposited structures and apply the Budapest Amyloid
Predictor to the hexamers of the border regions on these β-sheets. As we show
below, the number of the amyloidogenic hexapeptides of the border regions of the
β-sheets is one-tenth of the same number in the inner parts of those sequences
by screening the homology-filtered Protein Data Bank entries. We believe that
this finding is a very strong evidence for our hypothesis.
We remark that applying the artificial intelligence-based Budapest Amyloid
Predictor for the 64 million (= 206) possible hexapeptides is a crucial step
here: the largest experimental dataset that labels the hexapeptides by their
amyloidogenic propensity, the Waltz dataset, contains only 1415 hexapeptides
[19, 20], while we analyzed the prefixes and suffixes of more than 110,000 β-
sheets from the homology-filtered PDB. Gaining experimental data for that
many hexapeptides is intractable today.
Methods
For the statistical analysis involving the entries of the Protein Data Bank
(PDB) [16], usually some “non-redundant” homology-filtered PDB-subsets are
applied because otherwise, the multiple depositions of more important or more
researched protein structures (e.g., several thousand copies of the HIV-1 pro-
tease) would impact the results. In this study, we have made use of the represen-
tative polypeptide chains of the 30% homology-filtered set, which was available
at the URL https://www.rcsb.org/pdb/rest/representatives?cluster=30
(downloaded on May 21, 2020).
Technical note for reproducibility of the present work: At the time of writing
this section, we were informed that the administrators of the RCSB PDB have
removed the link target indicated above, but one can still generate the similar
non-redundant set with an advanced search feature of the RCSB PDB, namely
https: // search. rcsb. org/ #group-by-context , by choosing 30% sequence
identity.
The structure of almost all representative members of this non-redundant
set was identified in a soluble state, so almost all proteins in the set are water-
soluble (see Figure S3 in the supporting material).
From the representative sequences of the non-redundant set, we have identi-
fied the maximal contiguous subsequences corresponding to β-sheets by applying
the “SHEET” records in the PDB file. In other words, all the non-expandable
contiguous β-sheet subsequences were identified. Those subsequences whose
lengths are less than 6 were discarded. This way, we have identified as many as
117,467 subsequences.
3
摘要:

OpeningAmyloid-WindowstotheSecondaryStructureofProteins:TheAmyloidogenecityIncreasesTenfoldInsideBeta-SheetsKristofTakacsa,BalintVargaa,ViktorFarkasc,AndrasPerczelc,d,VinceGrolmusza,b,aPITBioinformaticsGroup,EotvosUniversity,H-1117Budapest,HungarybUratimLtd.,H-1118Budapest,HungarycELKH-ELTEPr...

展开>> 收起<<
Opening Amyloid-Windows to the Secondary Structure of Proteins The Amyloidogenecity Increases Tenfold Inside Beta-Sheets.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:1.18MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注