Conformational variability in proteins bound to single-stranded DNA a new benchmark for new docking perspectives Protein-ssDNA benchmark and its analysis

2025-05-02 0 0 813.07KB 8 页 10玖币

侵权投诉

Conformational variability in proteins bound to single-stranded DNA:

a new benchmark for new docking perspectives

Protein-ssDNA benchmark and its analysis

MIAS-LUCQUIN Dominique1*, CHAUVOT DE BEAUCHENE Isaure1

1 Universite de Lorraine, CNRS, Inria, LORIA, Nancy, France

* dominique.mias-lucquin@loria.fr

ABSTRACT

We explored the Protein Data-Bank (PDB) to collect protein-ssDNA structures and create a multi-

conformational docking benchmark including both bound and unbound protein structures. Due to

ssDNA high flexibility when not bound, no ssDNA unbound structure is included in the benchmark.

For the 91 sequence-identity groups identified as bound-unbound structures of the same protein, we

studied the conformational changes in the protein induced by the ssDNA binding. Moreover, based on

several bound or unbound protein structures in some groups, we also assessed the intrinsic

conformational variability in either bound or unbound conditions, and compared it to the supposedly

binding-induced modifications. To illustrate a use case of this benchmark, we performed docking

experiments using ATTRACT docking software. This benchmark is, to our knowledge, the first one

made to peruse available structures of ssDNA-protein interactions to such an extent, aiming to

improve computational docking tools dedicated to this kind of molecular interactions.

KEYWORDS: Single-Stranded DNA, Single-Stranded DNA-Binding Protein, Molecular Docking

Analysis, Benchmark,

Introduction

While originally described by Watson and Crick1 as a

double helix, composed of two strands bonded together

by hydrogen-bonds, DNA is often found in a transient

single-stranded state (ssDNA) during its processing,

such as genome replication2, or horizontal gene

transfer3, and bound to proteins. These complexes

(ribosomes4, ICE-relaxase5, replication fork complex6,

…) are potential therapeutic targets in diseases7,8.

The structural analysis of these complexes can help to

understand how they achieve their function9. For

example in can reveal the conformational changes

undergone by the protein during nucleic acids (NA)

binding, by comparing protein structures with and

without bound NA10.

While very informative, high resolution experimental

structures of ssNA-protein complexes are expensive and

may be difficult, or even impossible, to obtain, due to

the inherent poor ordering of NA, especially ssNA11,12.

Several software systems have tried to implement

accurate ssRNA-protein docking, including:

•ATTRACT13 uses a fragment-based approach, with

the need of some knowledge about some protein-

RNA contacts;

•RNP-denovo14, based on Rosetta15, performs folding

and docking of the RNA on the protein

simultaneously, but requiring the exact coordinates

of few nucleotides;

•RNA-lim16, models a rough coarse-grained RNA

structure (one non-oriented bead per nucleotide)

restrained by a set of known binding-sites on the

protein surface;

While all these methods advertise a prediction

precision from 2 to 10 Å of RMSD between predicted

and experimental ssRNA location, none of them was

tested yet on ssDNA-protein docking. To our

knowledge, no benchmark is available for ssDNA-

protein docking. Moreover, while it is possible to

query ssDNA-protein complexes with the Nucleic

Acid Database17 (NDB), it seems to find none after

2013, thus limiting the scope of a NDB-derived

benchmark. In turn, docking algorithms need

experimental ground truth to validate and compare

methods. Thus, docking benchmarks based on

experimentally resolved structures of complexes are

needed. Such docking benchmarks exist for protein-

protein18, membrane protein-protein19, protein-

RNA20,21, and dsDNA-protein22 complexes. And while

some works studied ssDNA-protein interactions from

few structures in the PDB23, none seems to be as

exhaustive as possible, with a primary goal to improve

ssDNA-Protein docking.

Here, we present a ssDNA-protein docking benchmark

based on structures extracted from the PDB, that

contains 91 sequence-identity groups of bound-unbound

protein chains, created to evaluate ssDNA-protein

docking. Due to the high flexibility of unstructured

ssNA, it is not relevant to use their unbound forms in

the context of macromolecular docking. This is also the

reason why the docking programs presented earlier do

not require a known unbound ssRNA structure. In

consequence, the main aim of this dataset is to provide

bound and unbound structures of the proteins but only

bound structures of ssDNA, from ssDNA-protein

complexes. When possible, we provide several

structures for both bound and unbound states, allowing

to differentiate binding-specific from binding-

independent conformational changes.

Docking experiments were performed to show a use

case for this benchmark. It underlines the relevance of

using several bound structures as ground truth and to

tolerate a minimum conformational deviation from

ground truth when evaluating docking results.

Material and methods

All analysis were performed using Python 3.7.

Databases were queried on August 18th 2021.

Processing steps are summarized in Figure 1.

RCSB Protein Data Bank (PDB) query

We identified the structures containing simultaneously

DNA (non hybrid) and proteins by querying the RCSB

PDB24 using their search (https://search.rcsb.org) and

data (https://data.rcsb.org) APIs. Another query was

performed to extract the PDB ID of all structures

containing only proteins without DNA.

These two PDB ID lists were compared to the weekly

100% sequence identity clustering

(“clusterNumber100”) of protein chains in the PDB

(ftp://resources.rcsb.org/sequence/clusters/) to extract

identity groups containing chains being part of both

DNA-protein and protein-only structures.

Structure alignment, processing and identification of

Interacting ssDNA

For each structure containing DNA and protein, the

asymmetric units and all biological assemblies (if any)

were downloaded from the PDB24. We only kept

“ATOM” records describing atoms belonging to

proteins or nucleic acids. Nucleic residues involved in

double strands are located with 3DNA (script

find_pair25) in the asymmetric unit and in each

biological assembly. A DNA residue is considered

single-stranded only if it is not found as double-

stranded in any of these structures. Biological

assemblies allow the identification of cases where a

double strand is formed by the repetition of the

asymmetric unit (such as PDB ID:3HZI), while the

asymmetric unit eases the processing if the assembly is

constituted by the repetition of chains having the same

identifier (also like in PDB ID:3HZI). Then VMD-

python (Humphrey et al., 1996,

https://github.com/Eigenstate/vmd-python) was used

to compute distances between ssDNA nucleotides and

protein residues; ssDNA nucleotides are bound if they

are found at less than 5 Å from any protein residue.

This bound ssDNA list is processed to only keep the

protein chains interacting with a DNA chain of at least

4 consecutive bound and single-stranded nucleotides.

For multi-framed asymmetric units (often encountered

with NMR models), only the first frame is used. In this

case, we assumeda limited variation of conformation

between frames, with no impact on DNA 2D state.

Fig.1: Protocol to create the benchark

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Conformationalvariabilityinproteinsboundtosingle-strandedDNA:anewbenchmarkfornewdockingperspectivesProtein-ssDNAbenchmarkanditsanalysisMIAS-LUCQUINDominique1*,CHAUVOTDEBEAUCHENEIsaure11UniversitedeLorraine,CNRS,Inria,LORIA,Nancy,France*dominique.mias-lucquin@loria.frABSTRACTWeexploredtheProteinData-...

展开>> 收起<<

Conformational variability in proteins bound to single-stranded DNA a new benchmark for new docking perspectives Protein-ssDNA benchmark and its analysis.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Conformational variability in proteins bound to single-stranded DNA a new benchmark for new docking perspectives Protein-ssDNA benchmark and its analysis

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: