Conformational variability in proteins bound to single-stranded DNA:
a new benchmark for new docking perspectives
Protein-ssDNA benchmark and its analysis
MIAS-LUCQUIN Dominique1*, CHAUVOT DE BEAUCHENE Isaure1
1 Universite de Lorraine, CNRS, Inria, LORIA, Nancy, France
* dominique.mias-lucquin@loria.fr
ABSTRACT
We explored the Protein Data-Bank (PDB) to collect protein-ssDNA structures and create a multi-
conformational docking benchmark including both bound and unbound protein structures. Due to
ssDNA high flexibility when not bound, no ssDNA unbound structure is included in the benchmark.
For the 91 sequence-identity groups identified as bound-unbound structures of the same protein, we
studied the conformational changes in the protein induced by the ssDNA binding. Moreover, based on
several bound or unbound protein structures in some groups, we also assessed the intrinsic
conformational variability in either bound or unbound conditions, and compared it to the supposedly
binding-induced modifications. To illustrate a use case of this benchmark, we performed docking
experiments using ATTRACT docking software. This benchmark is, to our knowledge, the first one
made to peruse available structures of ssDNA-protein interactions to such an extent, aiming to
improve computational docking tools dedicated to this kind of molecular interactions.
KEYWORDS: Single-Stranded DNA, Single-Stranded DNA-Binding Protein, Molecular Docking
Analysis, Benchmark,
Introduction
While originally described by Watson and Crick1 as a
double helix, composed of two strands bonded together
by hydrogen-bonds, DNA is often found in a transient
single-stranded state (ssDNA) during its processing,
such as genome replication2, or horizontal gene
transfer3, and bound to proteins. These complexes
(ribosomes4, ICE-relaxase5, replication fork complex6,
…) are potential therapeutic targets in diseases7,8.
The structural analysis of these complexes can help to
understand how they achieve their function9. For
example in can reveal the conformational changes
undergone by the protein during nucleic acids (NA)
binding, by comparing protein structures with and
without bound NA10.
While very informative, high resolution experimental
structures of ssNA-protein complexes are expensive and
may be difficult, or even impossible, to obtain, due to
the inherent poor ordering of NA, especially ssNA11,12.
Several software systems have tried to implement
accurate ssRNA-protein docking, including:
•ATTRACT13 uses a fragment-based approach, with
the need of some knowledge about some protein-
RNA contacts;
•RNP-denovo14, based on Rosetta15, performs folding
and docking of the RNA on the protein
simultaneously, but requiring the exact coordinates
of few nucleotides;
•RNA-lim16, models a rough coarse-grained RNA
structure (one non-oriented bead per nucleotide)
restrained by a set of known binding-sites on the
protein surface;
While all these methods advertise a prediction
precision from 2 to 10 Å of RMSD between predicted
and experimental ssRNA location, none of them was
tested yet on ssDNA-protein docking. To our
knowledge, no benchmark is available for ssDNA-
protein docking. Moreover, while it is possible to
query ssDNA-protein complexes with the Nucleic
Acid Database17 (NDB), it seems to find none after
2013, thus limiting the scope of a NDB-derived
benchmark. In turn, docking algorithms need
experimental ground truth to validate and compare
methods. Thus, docking benchmarks based on
experimentally resolved structures of complexes are
needed. Such docking benchmarks exist for protein-
protein18, membrane protein-protein19, protein-
RNA20,21, and dsDNA-protein22 complexes. And while
some works studied ssDNA-protein interactions from