AUTOPOLYPLOIDY ALLOPOLYPLOIDY AND PHYLOGENETIC NETWORKS WITH HORIZONTAL ARCS KATHARINA T. HUBER AND LIAM J. MAHER

2025-04-27 1 0 589.89KB 26 页 10玖币

侵权投诉

AUTOPOLYPLOIDY, ALLOPOLYPLOIDY, AND PHYLOGENETIC

NETWORKS WITH HORIZONTAL ARCS

KATHARINA T. HUBER AND LIAM J. MAHER

ABSTRACT. Polyploidization is an evolutionary process by which a species acquires mul-

tiple copies of its complete set of chromosomes. The reticulate nature of the signal left

behind by it means that phylogenetic networks offer themselves as a framework to recon-

struct the evolutionary past of species affected by it. The main strategy for doing this is to

ﬁrst construct a so called multiple-labelled tree and to then somehow derive such a network

from it. The following question therefore arises: How much can be said about that past

if such a tree is not readily available? By viewing a polyploid dataset as a certain vec-

tor which we call a ploidy (level) proﬁle we show that, among other results, there always

exists a phylogenetic network in the form of a beaded phylogenetic tree with additional

arcs that realizes a given ploidy proﬁle. Intriguingly, the two end vertices of almost all of

these additional arcs can be interpreted as having co-existed in time thereby adding bio-

logical realism to our network, a feature that is, in general, not enjoyed by phylogenetic

networks. In addition, we show that our network may be viewed as a generator of ploidy

proﬁle space, a novel concept similar to phylogenetic tree space that we introduce to be

able to compare phylogenetic networks that realize one and the same ploidy proﬁle. We

illustrate our ﬁndings in terms of a publicly available Viola dataset.

1. INTRODUCTION

Polyploidization is an evolutionary phenomenon thought to be one of the key players in

plant evolution. It has, however, also been observed in ﬁsh [22], and fungi [1] and arises

when a species acquires multiple copies of its full set of chromosomes. This can be the

result of, for example, a species undergoing whole genome duplication (autopolyploidiza-

tion) or through acquisition of a further complete set of chromosomes via interbreeding

with a different, usually closely related, species (allopolyploidization) [1] (see also [5]

who point out that the deﬁnitions of allopolyploidy and autopolyploidy are controversial).

Examples of autopolyploids include crop potato [34] and bananas and watermelon [38]

and examples of allopolyploids include bread wheat [25] and oilseed rape. Understanding

better how polyploids have arisen (and still arise) therefore has potentially far reaching

consequences.

Many tools for shedding light into the evolutionary past of a polyploid data set such

as PADRE [23] and AlloPPnet [20] start with a multiple-labelled tree, sometimes also

called a MUL-tree or a multi-labelled tree. These types of trees differ from the standard

phylogenetic trees by allowing two or more leaves to be labelled with the same species. In

the case of PADRE a (phylogenetic) network is then produced from such a tree by folding

it up as described in, for example, [14]. Referring the interested reader to Figure 1(i) for

Date: February 21, 2023.

arXiv:2210.05269v2 [q-bio.PE] 19 Feb 2023

2 KATHARINA T. HUBER AND LIAM J. MAHER

FIGURE 1. (i) One of potentially many phylogenetic networks that re-

alize the ploidy levels 14,12,12,10 of a set X={x1,x2,x3,x4}of taxa

where 14 is the ploidy level of x1, the ploidy level of x2and x3is 12,

respectively and the ploidy level of x4is 10. To improve clarity of ex-

position, we always assume that unless indicated otherwise, arcs are di-

rected away from the root (which is always at the top). (ii) The network

in (i) represented in such a way that every reticulation vertex (indicated

throughout the paper by a square and deﬁned below) has precisely one

incoming horizontal arc implying that the end vertices of such an arc

represent ancestral species that have existed at the same point in time.

In both (i) and (ii) the phylogenetic network resulting from deleting the

dashed bead and its dashed outgoing arc realizes the ploidy proﬁle ~m=

(7,6,6,5).

an example and below for deﬁnitions, it sufﬁces to say at this stage that a phylogenetic

network is a directed graph with leaf set a set of taxa (e.g. species) of interest, a single

root (usually drawn at the top), and no directed cycles. Note that to be able to account

for autopolyploidy, we deviate from the standard deﬁnition of a phylogenetic network (see

e. g. [33]) by also allowing it to contain beads, that is, pairs of parallel arcs, as is the

case in the networks depicted in Figure 1. Polyploidization events are represented in such

networks as reticulation vertices, that is, vertices with more than one arc coming into them.

For clarity of exposition, we indicate reticulation vertices throughout the paper in terms of

squares. Although PADRE is generally fast and not constrained by an upper limit on the

ploidy levels in a dataset of interest, its underlying assumptions imply that it is highly

susceptible to noise in the multiple-labelled tree from which the network is constructed. In

the case of AlloPPnet, a phylogenetic network is inferred using, among other techniques,

the multispecies coalescent to account for incomplete lineage sorting. The computational

demands of AlloPPnet however mean that it can only be applied on relatively small data

sets that contain only diploids and tetraploids [29].

One approach to obtain an input multiple-labelled tree for PADRE is to try and con-

struct it as a consensus multiple-labelled tree from a set of multiple-labelled gene trees.

This task is relatively straight-forward for phylogenetic trees by applying, for example,

AUTOPOLYPLOIDY, ALLOPOLYPLOIDY, AND PHYLOGENETIC NETWORKS WITH HORIZONTAL ARCS 3

some kind of consensus approach to the collection of clusters induced by the trees. The

corresponding approach for constructing a consensus multiple-labelled tree from a collec-

tion of multiple-labelled gene trees however gives rise to a computationally hard decision

problem [10]. A consensus multiple-labelled tree might therefore not always be readily

available for a dataset. The following question therefore arises: How much can we say

about the reticulate evolutionary past of a polyploid dataset if a multiple-labelled tree is

not readily available? Since one of the signatures left by polyploidization is the ploidy

level of a species (i. e. the number of copies of the complete set of chromosomes of that

species), we address this question in terms of a dataset’s ploidy levels or more precisely the

ploidy levels of the taxa that make up the dataset using phylogenetic networks as a frame-

work. Interpreting the ploidy level of a species as the number of directed paths from the

root of a phylogenetic network Nto the leaf in Nthat represents that species, Figure 1(i)

implies that, in general, ploidy levels do not preserve the topology of the phylogenetic

network that induced them. For example, swapping x2with x3in that network results in

a phylogenetic network that induces the same ploidy levels on {x1,...,x4}as the network

pictured in Figure 1(i). We are therefore interested in understanding to what extent a phy-

logenetic network representing the evolutionary past of a polyploid dataset can be derived

solely from the ploidy levels of the species that make up the dataset.

Note that since polyploidization events are assumed to be rare, we are particularly in-

terested in phylogenetic networks that enjoy this property and also aim to minimize the

number of reticulation vertices. From the perspective of reducing the complexity of our

mathematical arguments this immediately implies that we may assume the ploidy level of

a taxon to not be even. Indeed, if we have a dataset where every ploidy level is of the form

m=2m0, some positive integer m0, then since polyploidization events are assumed to be

rare, we may assume the last common ancestor of the dataset’s taxa to have undergone au-

topolyploidization. The root of a phylogenetic network Nthat represents the evolutionary

past of the dataset is therefore contained in a bead and that bead accounts for the factor 2

in m. Thus, the phylogenetic network obtained by removing this bead and the arc that joins

it to the rest of Nis a phylogenetic network that represents the factor m0of min terms of

numbers of directed paths from the root to the leaves.

In view of the above, we call any (ﬁnite) vector of positive integers that is indexed

by a (ﬁnite, non-empty) set Xaploidy proﬁle (on X). Although related to the recently

introduced ancestral proﬁles [7] (but also see [2]) ploidy proﬁles differ from them by only

recording the number of directed paths from the root of a phylogenetic network Nto every

leaf of N. Ancestral proﬁles on the other hand record the number of directed paths from

every non-leaf vertex in Nto all the leaves below that vertex. In particular, a ploidy proﬁle

is an element of an ancestral proﬁle of a phylogenetic network.

To help motivate our approach for addressing our question, consider the phylogenetic

network Nwith leaf set X={x1,x2,x3,x4}depicted in Figure 1(i) where the square vertices

at the end of each pair of two parallel arcs represent autopolyploidization and the remaining

four square vertices represent allopolyploidization. Then taking for each taxon xin Xthe

number of directed paths from the root of the network to xresults in the ploidy proﬁle

~m= (14,12,12,10)where the ﬁrst component is indexed by x1, the second by x2and so

on. Each component in ~mis of the form 2m, some positive integer m, and the phylogenetic

network rooted at robtained by removing the dashed bead together with the dashed arc

coming into rrepresents the ploidy proﬁle ~

m0= (7,6,6,5)in terms of numbers of directed

4 KATHARINA T. HUBER AND LIAM J. MAHER

paths from rto the leaves. With this in mind, we say that a ploidy proﬁle ~m= (m1,...,mn),

n≥1 on X={x1,...,xn}is realized by a phylogenetic network Nwith leaf set Xif, for all

1≤i≤n, the number of directed paths from the root of Nto xiis mi. For example, both

phylogenetic networks pictured in Figure 1realize the ploidy proﬁle ~m= (14,12,12,10)

indexed by X={x1,x2,x3,x4}.

Contributing to the emerging ﬁeld of Polyploid Phylogenetics [29], a ﬁrst inroad into our

question was made in [11] by studying the hybrid number of a ploidy proﬁle ~m, that is, the

minimal number of polyploidization events required by a phylogenetic network to realize

~m. As it turns out, the arguments underlying the results in [11] largely rely on a certain

iteratively constructed network that realizes ~m. Denoting for a choice Cof initializing

network the generated network by N(~m) = NC(~m)and changing the network initializing

that construction in a way that does not affect the main ﬁndings in [11] (see below for

details), we show that even more can be said about ploidy proﬁles. For example, our ﬁrst

result (Proposition 4.1) shows that N(~m)may be thought of as a generator of ploidy proﬁle

space (deﬁned in a similar way as phylogenetic tree space) in the sense that any realization

of ~mcan be reached from N(~m)via a number of multiple-labelled tree editing operations

and reticulation vertex splitting operations. As an immediate consequence of this we obtain

a distance measure for phylogenetic networks that realize one and the same ploidy proﬁle.

On a more speculative level it might be interesting to see if N(~m)lends itself as a useful

prior for a Bayesian method along the lines as described in [35].

Our second result (Theorem 6.1) shows that a key concept introduced in [11] called the

simpliﬁcation sequence of a ploidy proﬁle ~mis in fact closely related to the notion of a

cherry reduction sequence [7] for N(~m), also called a cherry picking sequence in [19]. In

case autopolyploidy is not suspected to have played a role in the evolution of a dataset, this

implies that the network N(~m)can also be reconstructed from phylogenetic networks on

three leaves called trinets [32]. These can be obtained from a dataset using, for example,

the TriLoNet software [26].

Exempliﬁed in terms of the phylogenetic network depicted in Figure 1(ii) for the ploidy

proﬁle (14,12,12,10), our third result (Theorem 6.2) implies that for any ploidy proﬁle

we can always ﬁnd a phylogenetic network realizing it in the form of a phylogenetic tree

that potentially contains beads to which additional arcs have been added and at most one

of those arcs is not horizontal. In the context of this it is important to note that, in general,

a phylogenetic network cannot be thought of as a phylogenetic tree with additional arcs let

alone horizontal ones. The reason for this is that horizontal arcs imply that the ancestral

taxa joined by such an arc must have existed at the same time (see also [33, Section 10.3.3]

for more on this and the Viola dataset below for an example).

The remainder of the paper is organized as follows. In the next section, we review rel-

evant basic terminology surrounding graphs, phylogenetic networks and ploidy proﬁles.

For a ploidy proﬁle ~m, we outline the construction of the network N(~m)in Section 3. This

includes the deﬁnition of the simpliﬁcation sequence for ~m. Subsequent to this, we intro-

duce ploidy proﬁle space in Section 4and also establish Proposition 4.1 in that section.

Sections 6is concerned with establishing Theorems 6.1 and 6.2. To do this, we use The-

orem 5.1 which we establish in Section 5. That theorem is underpinned by the concept

of a so called HGT-consistent labelling introduced in [37], a notion that we extend to our

types of phylogenetic networks here. In the last but one section, we employ a simpliﬁed

AUTOPOLYPLOIDY, ALLOPOLYPLOIDY, AND PHYLOGENETIC NETWORKS WITH HORIZONTAL ARCS 5

version of a Viola dataset from [24] to help explain our ﬁndings within the context of a real

biological dataset. We conclude with some potential directions of further research in the

last section.

2. PRELIMINARIES

We start with introducing basic concepts surrounding phylogenetic networks. Through-

out the paper, we assume that Xis a (ﬁnite) set that contains at least one element. Also, we

denote the number of elements in Xby n.

2.1. Graphs. Suppose for the following that Gis a directed acyclic graph with a single

root which might contain parallel arcs but no loops. We denote an arc starting at a vertex

uand ending in a vertex vby (u,v). If there exist two arcs from uto vthen we refer to the

pair of arcs from uto vas a bead of G.

Suppose vis a vertex of G. Then we refer to the number of arcs coming into vas the

indegree of vin Gand denote it by indeg(v). Similarly, we call the number of outgoing

arcs of vthe outdegree of vin Gand denote it by outdeg(v). We call vthe root of G, if

indeg(v) = 0, and we call valeaf of Gif indeg(v) = 1 and outdeg(v) = 0. We denote the

set of vertices of Gby V(G)and the set of leaves of Gby L(G). We call vatree vertex

if outdeg(v) = 2 and indeg(v) = 1, and we call vareticulation vertex if indeg(v) = 2 and

outdeg(v) = 1. If wis also a vertex in Gthen we say that wis below v if either v=wor

there exists a directed path from the root of Gto wthat crosses v. If wis below vand v6=w

then we say that wis strictly below v. A parent of a vertex vis the vertex connected to von

the path to the root. A child of a vertex vis a vertex of which vis the parent.

Suppose aand bare two distinct leaves of G. Then the set {a,b}is called a cherry of G

if the parent paof ais also the parent of b. If the parent pbof bis a reticulation vertex and

there is an arc (pa,pb)from pato pbthen the set {a,b}is called a reticulate cherry. In this

case, the arc (pa,pb)is called a reticulation arc of Gand the leaf bis called a reticulation

leaf of G.

For example, x1is the reticulate leaf of the reticulate cherry {x1,x2}in the graph de-

picted in Figure 1(i). The parent of x2is a tree vertex and the parent of x1is a reticulation

vertex. The vertices uand vform a bead.

2.2. Phylogenetic networks and trees. Suppose Gis a graph as described above. If G

contains at least three vertices then we call Ga(phylogenetic) network (on X ) if the out-

degree of the root ρof Gis 2, the leaf set of Gis X, and every vertex other than ρor

a leaf is a tree vertex or a reticulation vertex. Note that our deﬁnition of a phylogenetic

network differs from the standard deﬁnition of such an object (see e.g. [33]) by allowing

the network to contain beads and Xto contain a single element. To distinguish between

our type of phylogenetic networks and the standard type of phylogenetic networks we refer

to the latter as beadless phylogenetic networks. We call a phylogenetic network (on X) a

phylogenetic tree (on X ) if it does not contain any reticulation vertices.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AUTOPOLYPLOIDY,ALLOPOLYPLOIDY,ANDPHYLOGENETICNETWORKSWITHHORIZONTALARCSKATHARINAT.HUBERANDLIAMJ.MAHERABSTRACT.Polyploidizationisanevolutionaryprocessbywhichaspeciesacquiresmul-tiplecopiesofitscompletesetofchromosomes.Thereticulatenatureofthesignalleftbehindbyitmeansthatphylogeneticnetworksofferthems...

展开>> 收起<<

AUTOPOLYPLOIDY ALLOPOLYPLOIDY AND PHYLOGENETIC NETWORKS WITH HORIZONTAL ARCS KATHARINA T. HUBER AND LIAM J. MAHER.pdf

共26页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

AUTOPOLYPLOIDY ALLOPOLYPLOIDY AND PHYLOGENETIC NETWORKS WITH HORIZONTAL ARCS KATHARINA T. HUBER AND LIAM J. MAHER

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: