slow species only. By its nature the QSSA is only applicable to systems with a clear separation of
timescales between species, the existence of which cannot always be established. The QSSA for
stochastic systems generally requires more stringent conditions than in the deterministic case, but the
exact validity conditions are not well-understood [9–14].
Similar to the QSSA is the Quasiequilibrium Approximation (QEA), which was first considered
in [15,16] for stochastic reaction networks. Here the reaction network is decomposed into ‘slow’
and ‘fast’ reactions, and the fast reactions are assumed to equilibrate rapidly on the timescale of
the slow reactions. Similar to the QSSA, the QEA can be used to reduce the number of species
and reactions in a system, but it relies on the existence of a clear timescale separation between
reactions, which is not always present for large systems with many distinct reactions. Much like the
QSSA, the validity of the QEA for systems without the appropriate timescale separation has not been
generally established, and from the asymptotic nature of the descriptions it is not usually possible
to quantify the approximation error. Despite this, both the QSSA and the QEA are by far the most
commonly used model-reduction technique for chemical reaction networks owing to their physical
interpretability and analytical tractability, most famously in the Michaelis-Menten model of enzyme
kinetics.
A distinct approach for model reduction with the Chemical Master Equation is state-space lump-
ing, which originates from the theory of finite Markov chains, see e.g. [17]. Here different states
in a system are lumped together such that the coarse-grained system is still Markovian and can be
modelled using the CME. For a typical biochemical reaction network it may not be possible to per-
form any nontrivial lumping while preserving Markovianity, whence approximate lumping methods
have been considered e.g. in [18–20]. Here the coarse-grained system is approximated by a Markov
process, and the approximation error quantified using the KL divergence between the original model
and a lift of the approximation to the state space of the original model. State-space lumping for the
CME often occurs in the context of the QEA, as states related by fast reactions are effectively lumped
together, or averaged [21–23]. For this reason we will not consider this approach separately, although
many of our considerations, such as the optimal form of the lumped propensity functions, extend to
state-space lumping.
Finally, a more recent model reduction technique specifically for gene expression systems is the
Linear Mapping Approximation (LMA) [24]. The LMA replaces bimolecular reactions of the form
G+P→(. . .), where Gis a binary species such as a gene, by a linear approximation where Pis
replaced by its mean conditioned on [G]=1. While the LMA does not reduce the number of species
or reactions, reaction networks with linear reactions are generally easier to analyse: their moments
can be computed exactly, and closed-form solutions are known for many cases [25–28].
At a first glance these approaches - timescale separation, state space lumping and the Linear
Mapping Approximation - seem rather disparate, and it is unclear what, if any, relationships there are
between these approaches. In this paper we show that all of these methods can be viewed as minimis-
ing a natural information theoretic quantity, the Kullback-Leibler (KL) divergence, between the full
and reduced models. In particular they can be seen as maximal likelihood approximations of the full
model, and one can assess the quality of the approximation in terms of the resulting KL divergence.
Based on these results we show how the KL divergence can be estimated and minimised numerically,
therefore providing an automated method to choose between different candidate reductions of a given
model in situations where none of the above model reduction techniques are classically applicable.
The KL divergences we consider in this paper are computed on the space of trajectories, and as
such include both static information and dynamical information, in contrast to purely distribution-
matching approaches. The KL divergence and similar information-theoretic measures between continuous-
time Markov chains have previously been considered in [29,30] in the context of variational inference
2