Conformational variability in proteins bound to single-stranded DNA a new benchmark for new docking perspectives Protein-ssDNA benchmark and its analysis

2025-05-02 0 0 813.07KB 8 页 10玖币
侵权投诉
Conformational variability in proteins bound to single-stranded DNA:
a new benchmark for new docking perspectives
Protein-ssDNA benchmark and its analysis
MIAS-LUCQUIN Dominique1*, CHAUVOT DE BEAUCHENE Isaure1
1 Universite de Lorraine, CNRS, Inria, LORIA, Nancy, France
* dominique.mias-lucquin@loria.fr
ABSTRACT
We explored the Protein Data-Bank (PDB) to collect protein-ssDNA structures and create a multi-
conformational docking benchmark including both bound and unbound protein structures. Due to
ssDNA high flexibility when not bound, no ssDNA unbound structure is included in the benchmark.
For the 91 sequence-identity groups identified as bound-unbound structures of the same protein, we
studied the conformational changes in the protein induced by the ssDNA binding. Moreover, based on
several bound or unbound protein structures in some groups, we also assessed the intrinsic
conformational variability in either bound or unbound conditions, and compared it to the supposedly
binding-induced modifications. To illustrate a use case of this benchmark, we performed docking
experiments using ATTRACT docking software. This benchmark is, to our knowledge, the first one
made to peruse available structures of ssDNA-protein interactions to such an extent, aiming to
improve computational docking tools dedicated to this kind of molecular interactions.
KEYWORDS: Single-Stranded DNA, Single-Stranded DNA-Binding Protein, Molecular Docking
Analysis, Benchmark,
Introduction
While originally described by Watson and Crick1 as a
double helix, composed of two strands bonded together
by hydrogen-bonds, DNA is often found in a transient
single-stranded state (ssDNA) during its processing,
such as genome replication2, or horizontal gene
transfer3, and bound to proteins. These complexes
(ribosomes4, ICE-relaxase5, replication fork complex6,
…) are potential therapeutic targets in diseases7,8.
The structural analysis of these complexes can help to
understand how they achieve their function9. For
example in can reveal the conformational changes
undergone by the protein during nucleic acids (NA)
binding, by comparing protein structures with and
without bound NA10.
While very informative, high resolution experimental
structures of ssNA-protein complexes are expensive and
may be difficult, or even impossible, to obtain, due to
the inherent poor ordering of NA, especially ssNA11,12.
Several software systems have tried to implement
accurate ssRNA-protein docking, including:
ATTRACT13 uses a fragment-based approach, with
the need of some knowledge about some protein-
RNA contacts;
RNP-denovo14, based on Rosetta15, performs folding
and docking of the RNA on the protein
simultaneously, but requiring the exact coordinates
of few nucleotides;
RNA-lim16, models a rough coarse-grained RNA
structure (one non-oriented bead per nucleotide)
restrained by a set of known binding-sites on the
protein surface;
While all these methods advertise a prediction
precision from 2 to 10 Å of RMSD between predicted
and experimental ssRNA location, none of them was
tested yet on ssDNA-protein docking. To our
knowledge, no benchmark is available for ssDNA-
protein docking. Moreover, while it is possible to
query ssDNA-protein complexes with the Nucleic
Acid Database17 (NDB), it seems to find none after
2013, thus limiting the scope of a NDB-derived
benchmark. In turn, docking algorithms need
experimental ground truth to validate and compare
methods. Thus, docking benchmarks based on
experimentally resolved structures of complexes are
needed. Such docking benchmarks exist for protein-
protein18, membrane protein-protein19, protein-
RNA20,21, and dsDNA-protein22 complexes. And while
some works studied ssDNA-protein interactions from
few structures in the PDB23, none seems to be as
exhaustive as possible, with a primary goal to improve
ssDNA-Protein docking.
Here, we present a ssDNA-protein docking benchmark
based on structures extracted from the PDB, that
contains 91 sequence-identity groups of bound-unbound
protein chains, created to evaluate ssDNA-protein
docking. Due to the high flexibility of unstructured
ssNA, it is not relevant to use their unbound forms in
the context of macromolecular docking. This is also the
reason why the docking programs presented earlier do
not require a known unbound ssRNA structure. In
consequence, the main aim of this dataset is to provide
bound and unbound structures of the proteins but only
bound structures of ssDNA, from ssDNA-protein
complexes. When possible, we provide several
structures for both bound and unbound states, allowing
to differentiate binding-specific from binding-
independent conformational changes.
Docking experiments were performed to show a use
case for this benchmark. It underlines the relevance of
using several bound structures as ground truth and to
tolerate a minimum conformational deviation from
ground truth when evaluating docking results.
Material and methods
All analysis were performed using Python 3.7.
Databases were queried on August 18th 2021.
Processing steps are summarized in Figure 1.
RCSB Protein Data Bank (PDB) query
We identified the structures containing simultaneously
DNA (non hybrid) and proteins by querying the RCSB
PDB24 using their search (https://search.rcsb.org) and
data (https://data.rcsb.org) APIs. Another query was
performed to extract the PDB ID of all structures
containing only proteins without DNA.
These two PDB ID lists were compared to the weekly
100% sequence identity clustering
(“clusterNumber100”) of protein chains in the PDB
(ftp://resources.rcsb.org/sequence/clusters/) to extract
identity groups containing chains being part of both
DNA-protein and protein-only structures.
Structure alignment, processing and identification of
Interacting ssDNA
For each structure containing DNA and protein, the
asymmetric units and all biological assemblies (if any)
were downloaded from the PDB24. We only kept
“ATOM” records describing atoms belonging to
proteins or nucleic acids. Nucleic residues involved in
double strands are located with 3DNA (script
find_pair25) in the asymmetric unit and in each
biological assembly. A DNA residue is considered
single-stranded only if it is not found as double-
stranded in any of these structures. Biological
assemblies allow the identification of cases where a
double strand is formed by the repetition of the
asymmetric unit (such as PDB ID:3HZI), while the
asymmetric unit eases the processing if the assembly is
constituted by the repetition of chains having the same
identifier (also like in PDB ID:3HZI). Then VMD-
python (Humphrey et al., 1996,
https://github.com/Eigenstate/vmd-python) was used
to compute distances between ssDNA nucleotides and
protein residues; ssDNA nucleotides are bound if they
are found at less than 5 Å from any protein residue.
This bound ssDNA list is processed to only keep the
protein chains interacting with a DNA chain of at least
4 consecutive bound and single-stranded nucleotides.
For multi-framed asymmetric units (often encountered
with NMR models), only the first frame is used. In this
case, we assumeda limited variation of conformation
between frames, with no impact on DNA 2D state.
Fig.1: Protocol to create the benchark
摘要:

Conformationalvariabilityinproteinsboundtosingle-strandedDNA:anewbenchmarkfornewdockingperspectivesProtein-ssDNAbenchmarkanditsanalysisMIAS-LUCQUINDominique1*,CHAUVOTDEBEAUCHENEIsaure11UniversitedeLorraine,CNRS,Inria,LORIA,Nancy,France*dominique.mias-lucquin@loria.frABSTRACTWeexploredtheProteinData-...

展开>> 收起<<
Conformational variability in proteins bound to single-stranded DNA a new benchmark for new docking perspectives Protein-ssDNA benchmark and its analysis.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:813.07KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注