AUTOPOLYPLOIDY ALLOPOLYPLOIDY AND PHYLOGENETIC NETWORKS WITH HORIZONTAL ARCS KATHARINA T. HUBER AND LIAM J. MAHER

2025-04-27 0 0 589.89KB 26 页 10玖币
侵权投诉
AUTOPOLYPLOIDY, ALLOPOLYPLOIDY, AND PHYLOGENETIC
NETWORKS WITH HORIZONTAL ARCS
KATHARINA T. HUBER AND LIAM J. MAHER
ABSTRACT. Polyploidization is an evolutionary process by which a species acquires mul-
tiple copies of its complete set of chromosomes. The reticulate nature of the signal left
behind by it means that phylogenetic networks offer themselves as a framework to recon-
struct the evolutionary past of species affected by it. The main strategy for doing this is to
first construct a so called multiple-labelled tree and to then somehow derive such a network
from it. The following question therefore arises: How much can be said about that past
if such a tree is not readily available? By viewing a polyploid dataset as a certain vec-
tor which we call a ploidy (level) profile we show that, among other results, there always
exists a phylogenetic network in the form of a beaded phylogenetic tree with additional
arcs that realizes a given ploidy profile. Intriguingly, the two end vertices of almost all of
these additional arcs can be interpreted as having co-existed in time thereby adding bio-
logical realism to our network, a feature that is, in general, not enjoyed by phylogenetic
networks. In addition, we show that our network may be viewed as a generator of ploidy
profile space, a novel concept similar to phylogenetic tree space that we introduce to be
able to compare phylogenetic networks that realize one and the same ploidy profile. We
illustrate our findings in terms of a publicly available Viola dataset.
1. INTRODUCTION
Polyploidization is an evolutionary phenomenon thought to be one of the key players in
plant evolution. It has, however, also been observed in fish [22], and fungi [1] and arises
when a species acquires multiple copies of its full set of chromosomes. This can be the
result of, for example, a species undergoing whole genome duplication (autopolyploidiza-
tion) or through acquisition of a further complete set of chromosomes via interbreeding
with a different, usually closely related, species (allopolyploidization) [1] (see also [5]
who point out that the definitions of allopolyploidy and autopolyploidy are controversial).
Examples of autopolyploids include crop potato [34] and bananas and watermelon [38]
and examples of allopolyploids include bread wheat [25] and oilseed rape. Understanding
better how polyploids have arisen (and still arise) therefore has potentially far reaching
consequences.
Many tools for shedding light into the evolutionary past of a polyploid data set such
as PADRE [23] and AlloPPnet [20] start with a multiple-labelled tree, sometimes also
called a MUL-tree or a multi-labelled tree. These types of trees differ from the standard
phylogenetic trees by allowing two or more leaves to be labelled with the same species. In
the case of PADRE a (phylogenetic) network is then produced from such a tree by folding
it up as described in, for example, [14]. Referring the interested reader to Figure 1(i) for
Date: February 21, 2023.
1
arXiv:2210.05269v2 [q-bio.PE] 19 Feb 2023
2 KATHARINA T. HUBER AND LIAM J. MAHER
FIGURE 1. (i) One of potentially many phylogenetic networks that re-
alize the ploidy levels 14,12,12,10 of a set X={x1,x2,x3,x4}of taxa
where 14 is the ploidy level of x1, the ploidy level of x2and x3is 12,
respectively and the ploidy level of x4is 10. To improve clarity of ex-
position, we always assume that unless indicated otherwise, arcs are di-
rected away from the root (which is always at the top). (ii) The network
in (i) represented in such a way that every reticulation vertex (indicated
throughout the paper by a square and defined below) has precisely one
incoming horizontal arc implying that the end vertices of such an arc
represent ancestral species that have existed at the same point in time.
In both (i) and (ii) the phylogenetic network resulting from deleting the
dashed bead and its dashed outgoing arc realizes the ploidy profile ~m=
(7,6,6,5).
an example and below for definitions, it suffices to say at this stage that a phylogenetic
network is a directed graph with leaf set a set of taxa (e.g. species) of interest, a single
root (usually drawn at the top), and no directed cycles. Note that to be able to account
for autopolyploidy, we deviate from the standard definition of a phylogenetic network (see
e. g. [33]) by also allowing it to contain beads, that is, pairs of parallel arcs, as is the
case in the networks depicted in Figure 1. Polyploidization events are represented in such
networks as reticulation vertices, that is, vertices with more than one arc coming into them.
For clarity of exposition, we indicate reticulation vertices throughout the paper in terms of
squares. Although PADRE is generally fast and not constrained by an upper limit on the
ploidy levels in a dataset of interest, its underlying assumptions imply that it is highly
susceptible to noise in the multiple-labelled tree from which the network is constructed. In
the case of AlloPPnet, a phylogenetic network is inferred using, among other techniques,
the multispecies coalescent to account for incomplete lineage sorting. The computational
demands of AlloPPnet however mean that it can only be applied on relatively small data
sets that contain only diploids and tetraploids [29].
One approach to obtain an input multiple-labelled tree for PADRE is to try and con-
struct it as a consensus multiple-labelled tree from a set of multiple-labelled gene trees.
This task is relatively straight-forward for phylogenetic trees by applying, for example,
AUTOPOLYPLOIDY, ALLOPOLYPLOIDY, AND PHYLOGENETIC NETWORKS WITH HORIZONTAL ARCS 3
some kind of consensus approach to the collection of clusters induced by the trees. The
corresponding approach for constructing a consensus multiple-labelled tree from a collec-
tion of multiple-labelled gene trees however gives rise to a computationally hard decision
problem [10]. A consensus multiple-labelled tree might therefore not always be readily
available for a dataset. The following question therefore arises: How much can we say
about the reticulate evolutionary past of a polyploid dataset if a multiple-labelled tree is
not readily available? Since one of the signatures left by polyploidization is the ploidy
level of a species (i. e. the number of copies of the complete set of chromosomes of that
species), we address this question in terms of a dataset’s ploidy levels or more precisely the
ploidy levels of the taxa that make up the dataset using phylogenetic networks as a frame-
work. Interpreting the ploidy level of a species as the number of directed paths from the
root of a phylogenetic network Nto the leaf in Nthat represents that species, Figure 1(i)
implies that, in general, ploidy levels do not preserve the topology of the phylogenetic
network that induced them. For example, swapping x2with x3in that network results in
a phylogenetic network that induces the same ploidy levels on {x1,...,x4}as the network
pictured in Figure 1(i). We are therefore interested in understanding to what extent a phy-
logenetic network representing the evolutionary past of a polyploid dataset can be derived
solely from the ploidy levels of the species that make up the dataset.
Note that since polyploidization events are assumed to be rare, we are particularly in-
terested in phylogenetic networks that enjoy this property and also aim to minimize the
number of reticulation vertices. From the perspective of reducing the complexity of our
mathematical arguments this immediately implies that we may assume the ploidy level of
a taxon to not be even. Indeed, if we have a dataset where every ploidy level is of the form
m=2m0, some positive integer m0, then since polyploidization events are assumed to be
rare, we may assume the last common ancestor of the dataset’s taxa to have undergone au-
topolyploidization. The root of a phylogenetic network Nthat represents the evolutionary
past of the dataset is therefore contained in a bead and that bead accounts for the factor 2
in m. Thus, the phylogenetic network obtained by removing this bead and the arc that joins
it to the rest of Nis a phylogenetic network that represents the factor m0of min terms of
numbers of directed paths from the root to the leaves.
In view of the above, we call any (finite) vector of positive integers that is indexed
by a (finite, non-empty) set Xaploidy profile (on X). Although related to the recently
introduced ancestral profiles [7] (but also see [2]) ploidy profiles differ from them by only
recording the number of directed paths from the root of a phylogenetic network Nto every
leaf of N. Ancestral profiles on the other hand record the number of directed paths from
every non-leaf vertex in Nto all the leaves below that vertex. In particular, a ploidy profile
is an element of an ancestral profile of a phylogenetic network.
To help motivate our approach for addressing our question, consider the phylogenetic
network Nwith leaf set X={x1,x2,x3,x4}depicted in Figure 1(i) where the square vertices
at the end of each pair of two parallel arcs represent autopolyploidization and the remaining
four square vertices represent allopolyploidization. Then taking for each taxon xin Xthe
number of directed paths from the root of the network to xresults in the ploidy profile
~m= (14,12,12,10)where the first component is indexed by x1, the second by x2and so
on. Each component in ~mis of the form 2m, some positive integer m, and the phylogenetic
network rooted at robtained by removing the dashed bead together with the dashed arc
coming into rrepresents the ploidy profile ~
m0= (7,6,6,5)in terms of numbers of directed
4 KATHARINA T. HUBER AND LIAM J. MAHER
paths from rto the leaves. With this in mind, we say that a ploidy profile ~m= (m1,...,mn),
n1 on X={x1,...,xn}is realized by a phylogenetic network Nwith leaf set Xif, for all
1in, the number of directed paths from the root of Nto xiis mi. For example, both
phylogenetic networks pictured in Figure 1realize the ploidy profile ~m= (14,12,12,10)
indexed by X={x1,x2,x3,x4}.
Contributing to the emerging field of Polyploid Phylogenetics [29], a first inroad into our
question was made in [11] by studying the hybrid number of a ploidy profile ~m, that is, the
minimal number of polyploidization events required by a phylogenetic network to realize
~m. As it turns out, the arguments underlying the results in [11] largely rely on a certain
iteratively constructed network that realizes ~m. Denoting for a choice Cof initializing
network the generated network by N(~m) = NC(~m)and changing the network initializing
that construction in a way that does not affect the main findings in [11] (see below for
details), we show that even more can be said about ploidy profiles. For example, our first
result (Proposition 4.1) shows that N(~m)may be thought of as a generator of ploidy profile
space (defined in a similar way as phylogenetic tree space) in the sense that any realization
of ~mcan be reached from N(~m)via a number of multiple-labelled tree editing operations
and reticulation vertex splitting operations. As an immediate consequence of this we obtain
a distance measure for phylogenetic networks that realize one and the same ploidy profile.
On a more speculative level it might be interesting to see if N(~m)lends itself as a useful
prior for a Bayesian method along the lines as described in [35].
Our second result (Theorem 6.1) shows that a key concept introduced in [11] called the
simplification sequence of a ploidy profile ~mis in fact closely related to the notion of a
cherry reduction sequence [7] for N(~m), also called a cherry picking sequence in [19]. In
case autopolyploidy is not suspected to have played a role in the evolution of a dataset, this
implies that the network N(~m)can also be reconstructed from phylogenetic networks on
three leaves called trinets [32]. These can be obtained from a dataset using, for example,
the TriLoNet software [26].
Exemplified in terms of the phylogenetic network depicted in Figure 1(ii) for the ploidy
profile (14,12,12,10), our third result (Theorem 6.2) implies that for any ploidy profile
we can always find a phylogenetic network realizing it in the form of a phylogenetic tree
that potentially contains beads to which additional arcs have been added and at most one
of those arcs is not horizontal. In the context of this it is important to note that, in general,
a phylogenetic network cannot be thought of as a phylogenetic tree with additional arcs let
alone horizontal ones. The reason for this is that horizontal arcs imply that the ancestral
taxa joined by such an arc must have existed at the same time (see also [33, Section 10.3.3]
for more on this and the Viola dataset below for an example).
The remainder of the paper is organized as follows. In the next section, we review rel-
evant basic terminology surrounding graphs, phylogenetic networks and ploidy profiles.
For a ploidy profile ~m, we outline the construction of the network N(~m)in Section 3. This
includes the definition of the simplification sequence for ~m. Subsequent to this, we intro-
duce ploidy profile space in Section 4and also establish Proposition 4.1 in that section.
Sections 6is concerned with establishing Theorems 6.1 and 6.2. To do this, we use The-
orem 5.1 which we establish in Section 5. That theorem is underpinned by the concept
of a so called HGT-consistent labelling introduced in [37], a notion that we extend to our
types of phylogenetic networks here. In the last but one section, we employ a simplified
AUTOPOLYPLOIDY, ALLOPOLYPLOIDY, AND PHYLOGENETIC NETWORKS WITH HORIZONTAL ARCS 5
version of a Viola dataset from [24] to help explain our findings within the context of a real
biological dataset. We conclude with some potential directions of further research in the
last section.
2. PRELIMINARIES
We start with introducing basic concepts surrounding phylogenetic networks. Through-
out the paper, we assume that Xis a (finite) set that contains at least one element. Also, we
denote the number of elements in Xby n.
2.1. Graphs. Suppose for the following that Gis a directed acyclic graph with a single
root which might contain parallel arcs but no loops. We denote an arc starting at a vertex
uand ending in a vertex vby (u,v). If there exist two arcs from uto vthen we refer to the
pair of arcs from uto vas a bead of G.
Suppose vis a vertex of G. Then we refer to the number of arcs coming into vas the
indegree of vin Gand denote it by indeg(v). Similarly, we call the number of outgoing
arcs of vthe outdegree of vin Gand denote it by outdeg(v). We call vthe root of G, if
indeg(v) = 0, and we call valeaf of Gif indeg(v) = 1 and outdeg(v) = 0. We denote the
set of vertices of Gby V(G)and the set of leaves of Gby L(G). We call vatree vertex
if outdeg(v) = 2 and indeg(v) = 1, and we call vareticulation vertex if indeg(v) = 2 and
outdeg(v) = 1. If wis also a vertex in Gthen we say that wis below v if either v=wor
there exists a directed path from the root of Gto wthat crosses v. If wis below vand v6=w
then we say that wis strictly below v. A parent of a vertex vis the vertex connected to von
the path to the root. A child of a vertex vis a vertex of which vis the parent.
Suppose aand bare two distinct leaves of G. Then the set {a,b}is called a cherry of G
if the parent paof ais also the parent of b. If the parent pbof bis a reticulation vertex and
there is an arc (pa,pb)from pato pbthen the set {a,b}is called a reticulate cherry. In this
case, the arc (pa,pb)is called a reticulation arc of Gand the leaf bis called a reticulation
leaf of G.
For example, x1is the reticulate leaf of the reticulate cherry {x1,x2}in the graph de-
picted in Figure 1(i). The parent of x2is a tree vertex and the parent of x1is a reticulation
vertex. The vertices uand vform a bead.
2.2. Phylogenetic networks and trees. Suppose Gis a graph as described above. If G
contains at least three vertices then we call Ga(phylogenetic) network (on X ) if the out-
degree of the root ρof Gis 2, the leaf set of Gis X, and every vertex other than ρor
a leaf is a tree vertex or a reticulation vertex. Note that our definition of a phylogenetic
network differs from the standard definition of such an object (see e.g. [33]) by allowing
the network to contain beads and Xto contain a single element. To distinguish between
our type of phylogenetic networks and the standard type of phylogenetic networks we refer
to the latter as beadless phylogenetic networks. We call a phylogenetic network (on X) a
phylogenetic tree (on X ) if it does not contain any reticulation vertices.
摘要:

AUTOPOLYPLOIDY,ALLOPOLYPLOIDY,ANDPHYLOGENETICNETWORKSWITHHORIZONTALARCSKATHARINAT.HUBERANDLIAMJ.MAHERABSTRACT.Polyploidizationisanevolutionaryprocessbywhichaspeciesacquiresmul-tiplecopiesofitscompletesetofchromosomes.Thereticulatenatureofthesignalleftbehindbyitmeansthatphylogeneticnetworksofferthems...

展开>> 收起<<
AUTOPOLYPLOIDY ALLOPOLYPLOIDY AND PHYLOGENETIC NETWORKS WITH HORIZONTAL ARCS KATHARINA T. HUBER AND LIAM J. MAHER.pdf

共26页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:26 页 大小:589.89KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 26
客服
关注