for performance objectives like maximizing population
cell growth [14] or maximizing production of a speci-
fied metabolite [15]. Only a fraction of the genes have
a strong influence on the desired performance objec-
tive [16–18]. This raises the question of how to identify
a critical set of genes that have the strongest influence
on given performance objective function.
For a linear system, the performance objective can be
treated as the output and an observable subspace decom-
position results in the minimal system dynamics that
drives the output [19]. Equivalent results have been de-
veloped for nonlinear systems using differential geom-
etry for analytical systems where the governing equa-
tions are known prior [7]. However, the dynamics of bi-
ological systems are not known prior and are typically
learned from data. Hence, observable subspace decom-
position methods cannot be used directly to learn the
minimal gene expression dynamics in biological systems
that drive a desired output phenotype.
In biological systems, the typical approach to identify
genes that impact a phenotype is to look for genes
that exhibit significant differences in their steady-state
responses [20–22] across varying initial conditions. By
considering initial conditions where the output (per-
formance metric) response is vastly different, the genes
with the highest differential steady state response are
deemed to impact the output. This is a classical em-
pirical approach that disregards both gene-to-gene
interactions as well as gene-to-phenotype (output) in-
teractions. Our ultimate goal is to model these various
nonlinear dynamical interactions from data and then
find genes that drive a desired output which can later
be used to optimize the performance of that output.
Koopman operator theory is an increasingly popular ap-
proach to learn and analyze nonlinear system dynamics,
specifically due to a growing suite of numerical meth-
ods that can be applied in a data-driven setting [23,24].
Koopman models are promising because they construct
a set of state functions called Koopman observables that
embed the nonlinear dynamics of a physical system in
a high-dimensional space where the dynamics become
linear [25]. Koopman models are typically learned from
data using a dimensionality reduction algorithm called
dynamic mode decomposition (DMD), which was de-
veloped by Schmid [26]. Extensive research has enabled
Koopman models to increase their predictive accuracy
and decrease their computational complexity. Koopman
models serve as a bridge between nonlinear systems and
high-dimensional linear models, making them particu-
larly helpful for extending linear notions to nonlinear
systems in applications such as modal analysis [27–29],
construction of observers [9, 30–33] and development of
controllers [23, 34–37].
The study of observability of nonlinear systems using
Koopman operators is a growing area of research; Koop-
man operators have been augmented with output equa-
tions for applications like observer synthesis [30–32], op-
timal sensor placement [38, 39] and quantifying observ-
ability of nonlinear systems [9]. They all work under the
assumption that the outputs lie in the span of Koopman
observables but there is no theory on when that assump-
tion holds. There are no algorithms to learn such output-
inclusive Koopman models from data as Koopman mod-
els typically constitute a state equation learned either
by using direct state measurements [40–42] or delay-
embedded output measurements [43–45]. Moreover, how
to use Koopman operator models learnt from data to
estimate the observable decomposition of the nonlinear
system is yet to be established.
Here, we extend the theory of Koopman operators to
nonlinear systems with a measurable output perfor-
mance and develop the notion of observable subspaces
for such nonlinear systems using linear Koopman oper-
ator theory. Through our investigation, we:
(i) developed a theory that maps the observable sub-
space of a nonlinear system to a linear output-
inclusive Koopman model defined on that observ-
able subspace (Theorems 3 and 4),
(ii) identified the conditions under which the observ-
able subspace of an output-inclusive Koopman
model maps to the observable subspace of the
nonlinear system (Theorem 5)
(iii) developed a new algorithm that learns such observ-
able, output-inclusive Koopman models using deep
learning and dynamic mode decomposition (Corol-
lary 2),
(iv) showed that the new data-driven Koopman mod-
els can estimate the essential genes that drive the
growth phenotype of a biological system in the or-
der of their importance (Simulation Example 1),
and
(v) showed that the gene dynamics in the observable
subspace of each output of an interconnected ge-
netic circuit constitute the significant genes that
drive that output performance measure of the cir-
cuit (Simulation Example 2).
The paper is organized as follows. Section 2 introduces
the problem statement in detail and Section 3 briefly
introduces the required mathematical preliminaries. In
Section 4, we discuss the main theoretical results per-
taining to observability of Koopman operators and the
methods to see them in practice. We consider two sim-
ulated gene circuits in Section 5 and demonstrate how
the theory is used to find genes that drive each output
of the system. Conclusions are drawn in Section 6.
2 Problem Formulation
We formulate the mathematical problem in more depth
and describe how solving it benefits biological systems.
2