DEEPREINFORCEMENT LEARNING FOR STABILIZATION OFLARGE -SCALE PROBABILISTIC BOOLEAN NETWORKS A P REPRINT

2025-05-06 1 0 1.19MB 20 页 10玖币
侵权投诉
DEEP REINFORCEMENT LEARNING FOR STABILIZATION
OF LARGE-SCALE PROBABILISTIC BOOLEAN NETWORKS
A PREPRINT
Sotiris Moschoyiannis, Evangelos Chatzaroulas, Vytenis Sliogeris, Yuhu Wu
October 26, 2022
ABSTRACT
The ability to direct a Probabilistic Boolean Network (PBN) to a desired state is important to ap-
plications such as targeted therapeutics in cancer biology. Reinforcement Learning (RL) has been
proposed as a framework that solves a discrete-time optimal control problem cast as a Markov Deci-
sion Process. We focus on an integrative framework powered by a model-free deep RL method that
can address different flavours of the control problem (e.g., with or without control inputs; attractor
state or a subset of the state space as the target domain). The method is agnostic to the distribution
of probabilities for the next state, hence it does not use the probability transition matrix. The time
complexity is only linear on the time steps, or interactions between the agent (deep RL) and the envi-
ronment (PBN), during training. Indeed, we explore the scalability of the deep RL approach to (set)
stabilization of large-scale PBNs and demonstrate successful control on large networks, including a
metastatic melanoma PBN with 200 nodes.
1 Introduction
Recent efforts to produce tools to effectively control the dynamics of complex networked systems draw from control
theory, numerical methods and, more recently, network science and machine learning. A dynamical system is control-
lable if by intervening on the state of individual nodes the system as a whole can be driven from any initial state to a
desirable state, within finite time [1]. This notion of control finds application in engineered systems but also biological
networks, and is often referred to as stabilization.
Probabilistic Boolean Networks (PBNs) were introduced in [2] for modelling Gene Regulatory Networks (GRNs) as
complex dynamical systems. Nodes represent genes in one of two possible states of activity; 0 (not expressed), 1
(expressed). Edges indicate that genes act on each other, by means of rules represented by Boolean functions. The
state of the network at each time step comprises the states of the individual nodes. PBNs extend Kauffman’s Boolean
Networks (BNs) [3] by associating more than one function with each node, one of which executes at each time step
with a certain probability. This stochasticity in the network model accounts for uncertainty in gene interaction, which
is inherent not only in data collection but also in cell function. Both PBNs and BNs have been extensively used to
model well-known regulatory networks, see seminal work in [4]. The dynamics of PBNs adhere to Markov chain
theory [5] and dictate that the network, from any initial state, will eventually settle down to one of a limited set of
steady states, the so-called attractors (fixed points or cyclic), that it cannot leave without intervention [6].
Genes in biological systems experience sudden emergence of ordered collective behaviour, which corresponds to
steady state behaviour in PBNs and can be modelled via attractor theory [7], e.g., the emergence of heterogeneous small
cell lung cancer phenotypes [8]. Therapeutic strategies may involve switching between attractors, e.g., proliferation,
apoptosis in cancerous cells [9], or directing the network to a (set of) steady states that are more desirable than others,
e.g., exhibit lower levels of resistance to antibiotics [10].
Sotiris Moschoyiannis, Evangelos Chatzaroulas, and Vytenis Sliogeris are with the School of Computer Science & Electronic
Engineering, University of Surrey, GU2 7XH, UK (e-mail: s.moschoyiannis@surrey.ac.uk). Sotiris Moschoyiannis and Vytenis
Sliogeris have been partly funded by UKRI Innovate UK, grant 77032. Yuhu Wu is with the School of Control Science and
Engineering, Dalian University of Technology, 116024, Dalian, China (e-mail: wuyuhu@dlut.edu.cn).
arXiv:2210.12229v2 [cs.LG] 25 Oct 2022
Deep Reinforcement Learning for Stabilization
of Large-scale Probabilistic Boolean Networks A PREPRINT
While steady states are understood in a small set of experimental GRNs in the literature, they are not generally available
and it is computationally intractable to compute them in larger networks. However, attractors with larger basins of
attraction tend to be more stable. Therefore, the long-term dynamical behavior can be described by a steady-state
distribution (SSD) [11,12] which effectively captures the time spent at each network state. Hence, the control problem
in large-scale PBNs takes the form of set stabilization of the network to a pre-assigned subset of network states. It
transpires that the ability to direct the network to a specific attractor, or a subset of network states, by means of
intervention on individual nodes (genes), is central to GRNs and targeted therapeutics.
In this context, intervention at a certain time takes the form of effecting a perturbation on the state of a node, which
in GRNs translates to knocking-out a gene (switch to 0) or activating it (to 1). A lot of changes in cancer are of a
regulatory epigenetic nature [13] and thus cells are expected to be re-configured to achieve the same outcome, e.g.,
inputs to gene regulatory elements by one signalling pathway can be substituted by another signalling pathway [14].
Hence, cancer biology suggests that perturbations may be transient, e.g., see single-step perturbations in [15]. More
concretely, this means that the network dynamics may change the state of a perturbed node in subsequent time steps.
Existing work in control systems engineering typically restricts perturbations to a subset of nodes, typically the control
nodes of a network [1, 16] or nodes that translate biologically, e.g., pirin and WNT5A driven induction of an invasive
phenotype in melanoma cells [11,17, 18]. The general case where perturbations are considered on the full set of nodes
is less studied, with the exception of [19], even though it is relevant in contexts where control nodes are not available,
or computationally intractable to obtain. Further, motivated by the biological properties found in various target states,
different approaches perturb individual nodes’ states in a PBN in order to either drive it to some attractor within a
finite number of steps (horizon), or change the network’s long-run behaviour by affecting its steady-state distribution
(by increasing the mass probability of the target states).
Seminal work by Cheng et al. on an algebraic state space representation (ASSR) of Boolean networks, based on semi-
tensor product (STP) [20], stimulated an avalanche of research on such networks, including controllability [21–24].
However, ASSR linearises logical functions by enumerating their state spaces, hence it is model-based, and such
methods require estimating 2N×2Nprobabilities, which quickly becomes intractable for a large number of nodes N.
Attempts to overcome this barrier include work on a polynomial time algorithm to identify pinned nodes [25, 26],
which led to subsequent developments in pinning control to rely on local neighbourhoods rather than global state
information [27], with perturbations taking the form of deleting edges, to generate an acyclic subgraph of the network.
Control of larger BNs (not PBNs which are stochastic, rather than deterministic) has been attempted in [28, 29].
However, no guarantees are provided that the original network and its acyclic version, which is eventually controlled,
have the same dynamics. In addition, there are concerns over how such changes in the network topology translate in
biology since cycles are inherent in most biological pathways, e.g., see [30].
The pinning control strategy has been applied to PBNs by Lin et al. [31] but the approach is only demonstrated on a
real PBN with N=9 nodes, which is hardly large-scale. Moreover, the target domain is an attractor (cyclic attractor
comprising 2 states) and not a subset of the state space, validated by a favourable shift in the steady state distribution
(SSD) of the PBN, which is amenable to larger networks. Further, the complexity of this most recent approach is
O(N2+N2d), hence exponential on the largest in-degree dof the pinned nodes in the acyclic version of the original
PBN.
It transpires that the main challenge in dealing with the Boolean paradigm, as the computational counterpart of gene
regulatory networks, has to do with the sheer scale of the network state space, i.e., the state transition graph or
probability transition matrix, which provides a model of the system’s dynamics, grows exponentially on the number
of nodes. The primary objective is to develop optimal control methods that can systematically derive the series of
perturbations required to direct the PBN from its current state to a target state or subset of states, thus rendering
the system stabilizable at state(s) where the cell exhibits desired biological properties. In this paper we demonstrate
stabilization of a Melanoma PBN within a space of 2200 states (Section 4).
Reinforcement Learning (RL) [32], by inception, addresses sequential decision-making by means of maximising a
cumulative reward signal (indicator of how effective a decision was at the previous step), where the choice of action
at each step influences not only the immediate reward but also subsequent states, and by virtue of that, also future
rewards. Most importantly, RL provides a model-free framework and solves a discrete-time optimal control problem
cast as an MDP.
In the context of stabilization of PBNs model-free means that the distribution of probabilities of the next state, and
that from each state, is not known. The RL agent learns to optimise its reward through continuous interaction with
the environment - the PBN, here - from an initial state towards a terminal condition (e.g., reaching the target domain
or exceeding the horizon), by following some policy on selecting actions at each state along the way (which of the
mNnodes’ state to flip). Such an episodic framework can feature in Q-Learning, which combines learning from
2
Deep Reinforcement Learning for Stabilization
of Large-scale Probabilistic Boolean Networks A PREPRINT
experience (like Monte Carlo methods) with bootstrapping (like Dynamic Programming). This means there is no need
to wait until the end of the episode to update the estimates of the action-values at each state (Q function). Estimates of
the current step are based on estimates of the next time step, until the policy converges to the optimal policy. In this
way, the PBN dynamics are learned, by means of learning the reward function.
Previous work on control that utilises Q-Learning includes the work of Karlsen, et al. [33] on rule-based RL, in the
form of an eXtended Classifier System (XCS) which was also applied to the yeast cell cycle BN (N=11) in [34]. The
stabilization of PBNs with Q-Learning is studied in [35] but that work also only address a small apoptosis PBN (N=9).
It transpires that Q-Learning RL struggles to converge to an optimal solution in complex and high-dimensional MDPs.
Naturally, the interest shifts towards combining Q-Learning (and its model-free promise) with deep learning for scal-
ability. Papagiannis & Moschoyiannis in 2019 [36] first proposed a control method based on Deep Q-Learning with
experience replay, namely DDQN with Prioritized Experience Replay (PER). This was applied to control of BNs
in [36] and then to PBNs (synthetic N=20 and a real Melanoma N=7) in [19]. Subsequently, this deep RL method
was applied to solve the output tracking problem in a reduced version of the T-cell receptor kinetics model (PBCN with
N=28) in [37]. Batch-mode RL has been used in [38] to control the Melanoma PBN (N=28) also studied here. Nev-
ertheless, the advantages of combining Q-Learning with neural network function approximation to provide efficiently
scalable RL control methods applicable to large-scale PBNs remain largely unexplored.
In this article, we take this stream of research a step further by addressing large-scale PBNs through the application of
model-free Deep Reinforcement Learning. In comparison to previous work [19, 37], we present an integrated control
framework for set stabilization of large PBNs, based on model-free deep RL (DDQN with PER) which (i) can address
different flavours of the control problem, with regard to the control inputs as well as the target domain, (ii) can validate
successful control in large PBNs, where computing attractors is not feasible, and (iii) has time complexity which is
linearly dependent on the number of time steps and not the number of nodes in the PBN, hence a clear advancement
on polynomial [26] and exponential (on largest in-degree) [28] complexity.
As such, the main contributions of this paper are as follows:
1. We show that a model-free Deep RL control method for directing a PBN to a target state is scalable, with
time complexity only linearly dependent on the number of time steps during training
2. We show the method to be versatile in that it can address:
(a) control input nodes (when known) but also consider the full set of nodes (when not known)
(b) the target domain for control to be a specific attractor (stabilization) but also a pre-assigned subset of the
network state space (set stabilization)
3. We demonstrate the approach in successfully determining a control policy for PBNs and PBCNs, including
stabilization of a Melanoma PBN with 200 nodes.
The rest of this paper is structured as follows: Section 2 sets out key concepts behind PBNs, formulates the control
problem, and outlines Deep Reinforcement Learning, focusing on DDQN with PER. The method for deriving series
of perturbations (control policies) for stabilization of PBNs is developed in Section 3. The main results of applying
the control method to large PBNs are presented in Section 4, including comparison and discussion. Finally, Section 5
presents some concluding remarks and possible extensions.
2 Preliminaries
2.1 Probabilistic Boolean Networks (PBNs)
PBNs are a class of discrete dynamical systems characterised by interactions over a set of Nnodes, each taking a
Boolean value xiin D={0,1}. Hence, xi(t), i [1, N], denotes the state of the i-th node at time instance t, and
represents the expression level of the i-th gene in the GRN being modelled. The update rule of xi(t)is determined
by the Boolean function fi([xj(t)]jNi), and the value of fiis assigned to next state of node xi, the set Ni[1, N ]
contains the subscript indices of in-neighbours, and fi:DNi D is the logical function chosen for node iat time
step t. The state of a PBN at time t, is denoted by Xt= [x1(t), x2(t), ..., xN(t)]>. Then the evolution (dynamics) of
the BN is represented by the following vector form:
Xt+1 =
x1(t+ 1) = f1([xj(t)]jN1)
x2(t+ 1) = f2([xj(t)]jN2)
.
.
.
xN(t+ 1) = fN([xj(t)]jNN)
∈ DN(1)
3
Deep Reinforcement Learning for Stabilization
of Large-scale Probabilistic Boolean Networks A PREPRINT
Each logical function fihas lipossibilities and is chosen from the finite set of Boolean functions Fi=
{f1
i, f2
i, . . . , fli
i}(hence, |Fi|=li) that the node is associated with.
Each function fk
iFi,k[1, li]is chosen with probability P r[fi=fk
i] = pk
iwith Pli
k=1 pk
i= 1. In this article,
we assume that the assignment of logical functions for each node iis independent. Hence, the probability of Boolean
function selections, over Nnodes, is given by the product pµ1
1·pµ2
2· · ··pµN
N, where µi[1, li]. Different fk
iselections
lead to different PBN realizations, which occur under different probabilities, resulting in stochastic state evolution of
the network. Consequently, the possible realizations of a PBN are defined as R=QN
i=1 li.
Thus, the probability PXt,Xt+1of transitioning from state Xt→ Xt+1at the next time step is: P[x1(t+ 1),x2(t+
1), . . . , xN(t+1)|x1(t), x2(t), . . . , xN(t)] = PXt,Xt+1where P[x1, x2, . . . , xn|y1, y2, . . . , yn]denotes the joint prob-
ability of x1, x2, . . . , xnconditioned to y1, y2, . . . , yn. We can now construct the transition probability matrix Pcom-
prised of 2N×2Nentries where entry Pm,n indicates the probability of transitioning from current state mto possible
next state n.
2.2 Control problem formulation
In the context of PBNs, and consequently GRNs, control takes the form of discovering policies, or series of interven-
tions (perturbations) to the state of a node (gene), aiming to drive the network from its current state to a desirable state,
where the network exhibits desirable biological properties.
Definition 1. Consider a PBN at state Xt. Then, define intervention I(Xt, it),0iN,as the process of flipping
the binary value xi(t)associated with node i, at time t.
it= 0 denotes no intervention at time t.
Since an intervention strategy in gene therapies should be the least intrusive to the GRN, only a single I(Xt, it)is
allowed at each time step t. This means an intervention is followed by a natural network evolution step according to
the PBN internal transition dynamics2. Hence, we resist operating in a more aggressive intervention mode although it
is favourable from a computational viewpoint.
Definition 2. Consider I(Xt, it),and intervention horizon HZ+.Define S=
{I(X1, i1),I(X2, i2), . . . I(XhH, ih)}, where 0itN, and hHZ+. Again, it= 0 denotes no
intervention, at time t.
The objective is to obtain the sequence of interventions Sthat directs the network from the current state - sampled
from a uniform distribution of all states p(X0)- to a desirable state.
When control input nodes are not known, Sis formed by perturbations effected on any node, hence there are N+
1actions for the RL agent at each state. The objective is to determine the sequence Swithin a finite number of
interventions, the horizon H, assuming that the MDP is ergodic. The experiments reported in the next section show
this not to be a restricting assumption.
When control input nodes are known, or can be computed, Sis formed by perturbations effected only on the control
nodes. The objective here is to determine the sequence Srequired to increase the steady-state probability mass of
desirable network states. In the study of control of the melanoma PBN [11, 38] these are the states where the gene
WNT5A, which is central to the induction of an invasive phenotype in melanoma cells, is OFF. Perturbations are only
allowed on the pirin gene, as in [11,38]. Since the target domain does not assume knowledge of attractors, this allows
addressing larger networks (cf. we demonstrate control of the Melanoma PBN N=200).
2.3 Deep Reinforcement Learning
The central task of Reinforcement Learning [32] is to solve sequential decision problems by optimising a cumulative
future reward. This can be achieved by learning estimates for the optimal value of each action which is typically
defined as the sum of future rewards when taking that action and following the optimal policy afterwards.
The strategy that determines which action to take is called a policy. Hence, an optimal policy results from selecting
the actions that maximize the future cumulative reward.
2We stress that in this article the probability distribution of successor states, from each state, is unknown when learning the
control policy.
4
摘要:

DEEPREINFORCEMENTLEARNINGFORSTABILIZATIONOFLARGE-SCALEPROBABILISTICBOOLEANNETWORKSAPREPRINTSotirisMoschoyiannis,EvangelosChatzaroulas,VytenisSliogeris,YuhuWuOctober26,2022ABSTRACTTheabilitytodirectaProbabilisticBooleanNetwork(PBN)toadesiredstateisimportanttoap-plicationssuchastargetedtherapeuticsin...

展开>> 收起<<
DEEPREINFORCEMENT LEARNING FOR STABILIZATION OFLARGE -SCALE PROBABILISTIC BOOLEAN NETWORKS A P REPRINT.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:1.19MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注